Autoscaling on nanta - Data Engineering

Autoscaling on nanta - Data Engineeringhttps://nanta-data.dev/en/tags/autoscaling/Recent content in Autoscaling on nanta - Data EngineeringHugo -- gohugo.ioen© 2026 nantaThu, 05 Mar 2026 00:00:00 +0000EKS AI Serving Node Cost Reduction: Instance Diversification, Consolidation, and Scheduled Scalinghttps://nanta-data.dev/en/posts/eks-ai-platform-cost-saving/Thu, 05 Mar 2026 00:00:00 +0000https://nanta-data.dev/en/posts/eks-ai-platform-cost-saving/An AI platform team’s serving API was running 500 fixed pods on only two on-demand instance types (c6i.2xlarge, m6i.2xlarge). We applied instance type diversification and Karpenter consolidation as phase 1, then KEDA cron-trigger scheduled scaling as phase 2. Key finding: consolidation alone has limited effect when pod count is fixed — it needs scale-in to actually reduce node count.Flink on EKS In-place Scaling: Scaling TaskManagers Without Restarting the Jobhttps://nanta-data.dev/en/posts/flink-in-place-scaling/Tue, 03 Mar 2026 00:00:00 +0000https://nanta-data.dev/en/posts/flink-in-place-scaling/Our recommendation system’s Flink application required sub-1-minute latency, which prevented us from using autoscaling or spot instances. Autoscaling or spot reclamation triggered full Flink restarts that took 2-3 minutes. Using Flink 1.18’s adaptive scheduler and K8s Operator 1.8, we enabled in-place scaling — reducing consumer lag peaks to 1/5 and lag duration from 5-7 minutes to 2-3 minutes during scale events.EMR on EKS VPA Review: When an Official AWS Feature Doesn't Workhttps://nanta-data.dev/en/posts/emr-on-eks-vpa-review/Fri, 27 Feb 2026 00:00:00 +0000https://nanta-data.dev/en/posts/emr-on-eks-vpa-review/We tried using AWS’s built-in VPA integration for EMR on EKS to auto-optimize Spark executor resources. After about a month of intensive PoC work, multiple AWS support cases, and a custom manifest bundle rebuild, the operator still didn’t work. We abandoned it.