Note di Matteo


#kubernetes

Duolingo's Kubernetes Leap. Non l'ho visto, ma qui Duolingo spiega la migrazione da AWS ECS a EKS.

To give you an idea of the scale that we're looking at, Duolingo currently has over 128 million monthly active users, and we have over 250 courses that you can learn on the app. [...] On the engineering side, we have over 400 engineers and more than 500 backend services, just to give you an idea of the scale of this migration that we're going to be looking at today.

Previously, the 500-plus backend services that I mentioned at the beginning are running on AWS ECS. We have some workloads that are running on different infrastructure, but for the mass majority, they're on ECS. That's what I'm going to be focusing on. We're going to be moving from ECS to EKS. ECS is AWS's container orchestration solution. It's a managed solution. It's very simple and straightforward to use. We've been actually super happy with ECS over the past year, and it served our needs very well. Now as we've grown to a much bigger scale, as I mentioned, Kubernetes just offers a much more feature-rich ecosystem and open-source platform, as well as specific features that ECS does not give us.

#454 /
18 aprile 2026
/
10:25
/ #aws#cloud#kubernetes

Interessante comportamento di Kubernetes con i volumi, in cui è incappata Cloudflare (A one-line Kubernetes fix that saved 600 hours a year):

Remember how I said at the beginning we'd just run out of inodes? In other words, we have a lot of files on this PV. When the PV is mounted, kubelet is running chgrp -R to recursively change the group on every file and folder across this filesystem. No wonder it was taking so long — that's a ton of entries to traverse even on fast flash storage!

The pod's spec.securityContext included fsGroup: 1, which ensures that processes running under GID 1 can access files on the volume. Atlantis runs as a non-root user, so without this setting it wouldn’t have permission to read or write to the PV. The way Kubernetes enforces this is by recursively updating ownership on the entire PV every time it's mounted.

Il fix è ridurre i casi in cui i permessi devono essere aggiornati con fsGroupChangePolicy: OnRootMismatch.

#409 /
28 marzo 2026
/
17:10
/ #kubernetes#cloud