← Back to Insights

Cloud & Infrastructure Feb 17, 2026 ⏱ 14 min read

Kubernetes Cost Mistakes: Why Your K8s Bill Is 3x What It Should Be

Kubernetes was supposed to optimize infrastructure costs through efficient resource scheduling. Instead, most enterprises discover their K8s spend is 2-3x higher than the VMs they replaced — because containerization doesn't automatically equal optimization.

The Scale of the Problem

Research from the FinOps Foundation consistently shows that Kubernetes clusters run at 20-35% average utilization. That means 65-80% of the compute capacity you're paying for is sitting idle. For a mid-size enterprise running 50 nodes at $500/month each, that's $16,000-$20,000/month in waste.

68%

Avg Cluster Over-Provisioned

$4.2M

Avg Annual K8s Waste

47%

Dev Clusters Running 24/7

23%

Orphaned Resources

Mistake #1: Over-Provisioning Requests and Limits

The most expensive mistake: developers set CPU and memory requests based on fear, not data. They request 2 CPU cores and 4GB RAM for a service that actually uses 0.3 cores and 512MB. The scheduler reserves those resources, even though they'll never be used.

# What developers write (fear-based):
resources:
  requests:
    cpu: "2000m"      # 2 full cores reserved
    memory: "4Gi"     # 4GB reserved
  limits:
    cpu: "4000m"
    memory: "8Gi"

# What the service actually uses:
#   CPU: avg 280m, p99 620m
#   Memory: avg 480Mi, peak 720Mi

# The fix (data-driven):
resources:
  requests:
    cpu: "350m"       # ~1.25x average usage
    memory: "600Mi"   # ~1.25x average usage
  limits:
    cpu: "1000m"      # ~1.5x p99
    memory: "1Gi"     # ~1.4x peak

The Right-Sizing Formula

Requests = 1.25× average usage. Limits = 1.5× P99 usage. Measure for 7 days including peak traffic before setting. Use VPA (Vertical Pod Autoscaler) in recommendation mode to get data-driven suggestions.

Mistake #2: Non-Production Clusters Running 24/7

Your production cluster needs to run 24/7. Your dev, staging, and QA clusters do not. Engineers work ~10 hours/day, 5 days/week. Those dev clusters run 168 hours/week but are actively used for ~50 hours.

Schedule non-prod clusters to scale down at 7pm and scale up at 7am weekdays
Ephemeral preview environments — spin up per pull request, tear down on merge
Namespace quotas — prevent developers from running 20 replicas in dev
Savings: typically 50-60% of non-production compute costs

Mistake #3: Wrong Node Sizing

The choice of node instance type dramatically affects packing efficiency. Too large, and you waste capacity. Too small, and Kubernetes daemon overhead (kubelet, kube-proxy, OS) eats a disproportionate share.

Node Size	Total Resources	System Reserved	Available for Pods	Overhead %
t3.medium (2 vCPU, 4GB)	2 CPU, 4GB	0.4 CPU, 0.5GB	1.6 CPU, 3.5GB	20%
m5.xlarge (4 vCPU, 16GB)	4 CPU, 16GB	0.5 CPU, 0.7GB	3.5 CPU, 15.3GB	7%
m5.2xlarge (8 vCPU, 32GB)	8 CPU, 32GB	0.6 CPU, 0.9GB	7.4 CPU, 31.1GB	4%

Mistake #4: Ignoring Cross-AZ Data Transfer

In AWS, inter-AZ data transfer costs $0.01/GB in each direction. If your microservices are spread across 3 AZs and chatty, this adds up fast. A service exchanging 10GB/hour with another service in a different AZ costs $144/day in data transfer alone.

The Fix

Use topology-aware routing (topology.kubernetes.io/zone) to prefer same-AZ communication
Co-locate chatty services in the same AZ using pod affinity rules
Compress gRPC/HTTP payloads between services
Monitor cross-AZ traffic with VPC Flow Logs

Mistake #5: Persistent Volume Sprawl

Persistent Volumes (PVs) are provisioned and never cleaned up. Developers create 100GB volumes for a 2GB database "just in case," and when the pod is deleted, the PV remains — still being billed.

Audit PVs monthly — delete unattached volumes
Use reclaimPolicy: Delete for non-critical workloads
Right-size volumes and use volume expansion instead of over-provisioning
Use cheaper storage classes (gp3 instead of io2) where IOPS aren't critical

Mistake #6: Not Using Spot/Preemptible Instances

Spot instances are 60-90% cheaper than on-demand. For stateless, fault-tolerant workloads (web servers, batch processing, CI/CD), there's no reason not to use them. Yet most K8s clusters run 100% on-demand.

Workload Type	Spot-Suitable?	Strategy
Stateless web services	✅ Yes	PodDisruptionBudget + multi-AZ spread
Batch / data processing	✅ Yes	Checkpointing + retry logic
CI/CD runners	✅ Yes	Job queues with retry
Databases (primary)	❌ No	Reserved instances instead
Stateful services	⚠️ Maybe	Only with proper PDB and data replication

Mistake #7: Observability Costs Exceeding Compute

In some organizations, Datadog, New Relic, or Splunk bills exceed the actual infrastructure costs. Logging every request at full fidelity, retaining metrics for 13 months, and tracing 100% of requests is expensive and usually unnecessary.

Sample traces — trace 10% of requests, 100% of errors
Reduce log verbosity — INFO in production, DEBUG only in dev
Shorten retention — 30 days for logs, 90 days for metrics, 7 days for traces
Switch to open-source — OpenTelemetry + Grafana + Loki saves 60-80% vs commercial APM

The K8s Cost Optimization Playbook

A phased approach to reducing Kubernetes spend by 40-60%:

Week 1-2: Quick Wins (20-30% savings)

Install VPA in recommendation mode on all workloads
Schedule non-prod clusters to shut down nights/weekends
Delete unattached PersistentVolumes and orphaned LoadBalancers
Switch to gp3 storage class where applicable

Month 1: Right-Sizing (15-25% additional savings)

Apply VPA recommendations to resource requests/limits
Optimize node instance types for workload profiles
Implement Cluster Autoscaler or Karpenter for dynamic scaling
Enable topology-aware routing

Month 2-3: Advanced Optimization (10-15% additional savings)

Migrate stateless workloads to spot instances (50-70% of fleet)
Implement HPA with custom metrics (CPU/memory + request latency)
Right-size observability stack and sampling rates
Purchase reserved capacity for baseline workloads (1-year or 3-year savings plans)

Garnet Grid Engineering

We help teams right-size their Kubernetes infrastructure — reducing cloud spend by 40-60% while maintaining reliability and performance.

Free K8s Cost Assessment

Send us your cluster metrics and we'll identify your top 5 cost optimization opportunities — with estimated dollar savings for each.

Request K8s Cost Review →