<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Eks on nanta - Data Engineering</title><link>https://nanta-data.dev/en/tags/eks/</link><description>Recent content in Eks on nanta - Data Engineering</description><generator>Hugo -- gohugo.io</generator><language>en</language><copyright>© 2026 nanta</copyright><lastBuildDate>Thu, 05 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://nanta-data.dev/en/tags/eks/index.xml" rel="self" type="application/rss+xml"/><item><title>EKS AI Serving Node Cost Reduction: Instance Diversification, Consolidation, and Scheduled Scaling</title><link>https://nanta-data.dev/en/posts/eks-ai-platform-cost-saving/</link><pubDate>Thu, 05 Mar 2026 00:00:00 +0000</pubDate><guid>https://nanta-data.dev/en/posts/eks-ai-platform-cost-saving/</guid><description>An AI platform team&amp;rsquo;s serving API was running 500 fixed pods on only two on-demand instance types (c6i.2xlarge, m6i.2xlarge). We applied instance type diversification and Karpenter consolidation as phase 1, then KEDA cron-trigger scheduled scaling as phase 2. Key finding: consolidation alone has limited effect when pod count is fixed — it needs scale-in to actually reduce node count.</description></item><item><title>Adding Access Control to EMR-on-EKS Spark Jobs: LakeFormation PoC Through 10 Issues</title><link>https://nanta-data.dev/en/posts/emr-on-eks-lakeformation-poc/</link><pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate><guid>https://nanta-data.dev/en/posts/emr-on-eks-lakeformation-poc/</guid><description>We needed to add job-level access control to EMR-on-EKS Spark jobs. Ranger was ruled out due to EMR-on-EKS&amp;rsquo;s structural limitations — no master node, no plugin installation path. We chose LakeFormation, and hit 10 issues during PoC: service label selector mismatches, FGAC blocking RDD operations/UDFs/synthetic types, cross-account Glue restrictions, and more. Here&amp;rsquo;s how we identified each cause and found workarounds.</description></item><item><title>Flink on EKS In-place Scaling: Scaling TaskManagers Without Restarting the Job</title><link>https://nanta-data.dev/en/posts/flink-in-place-scaling/</link><pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate><guid>https://nanta-data.dev/en/posts/flink-in-place-scaling/</guid><description>Our recommendation system&amp;rsquo;s Flink application required sub-1-minute latency, which prevented us from using autoscaling or spot instances. Autoscaling or spot reclamation triggered full Flink restarts that took 2-3 minutes. Using Flink 1.18&amp;rsquo;s adaptive scheduler and K8s Operator 1.8, we enabled in-place scaling — reducing consumer lag peaks to 1/5 and lag duration from 5-7 minutes to 2-3 minutes during scale events.</description></item><item><title>EKS Topology Aware Hints: Why They Had No Effect on Our Cluster</title><link>https://nanta-data.dev/en/posts/eks-topology-aware-hints/</link><pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate><guid>https://nanta-data.dev/en/posts/eks-topology-aware-hints/</guid><description>We evaluated Kubernetes Topology Aware Hints to reduce cross-AZ network costs on EKS. Hints were correctly applied to EndpointSlices, but had no actual effect. AWS Load Balancer Controller&amp;rsquo;s IP target mode bypasses kube-proxy entirely, and our primary internal workloads — Spark, Trino, Airflow — are all single-zone or stateful, meaning the traffic paths where hints get referenced simply don&amp;rsquo;t exist in our environment.</description></item><item><title>EMR on EKS VPA Review: When an Official AWS Feature Doesn't Work</title><link>https://nanta-data.dev/en/posts/emr-on-eks-vpa-review/</link><pubDate>Fri, 27 Feb 2026 00:00:00 +0000</pubDate><guid>https://nanta-data.dev/en/posts/emr-on-eks-vpa-review/</guid><description>We tried using AWS&amp;rsquo;s built-in VPA integration for EMR on EKS to auto-optimize Spark executor resources. After about a month of intensive PoC work, multiple AWS support cases, and a custom manifest bundle rebuild, the operator still didn&amp;rsquo;t work. We abandoned it.</description></item></channel></rss>