#database

Scaling PostgreSQL to power 800 million ChatGPT users

OpenAI ha pubblicato un articolo in cui spiega meglio la strategia di scaling di PostgreSQL: un server primario read-write e 50 replica read-only per supportare 800 milioni di utenti. Ma è di fatto in corso una migrazione verso database più scalabili come Azure Cosmos DB:

To mitigate these limitations and reduce write pressure, we’ve migrated, and continue to migrate, shardable (i.e. workloads that can be horizontally partitioned), write-heavy workloads to sharded systems such as Azure Cosmos DB, optimizing application logic to minimize unnecessary writes. We also no longer allow adding new tables to the current PostgreSQL deployment. New workloads default to the sharded systems.

Azure Cosmos DB è un database managed globalmente distribuito e a scalabilità essenzialmente illimitata, l'equivalente di DynamoDB di AWS ma, mi sembra di capire, con anche il supporto al modello relazionale e non solo NoSQL/a documenti (Azure Cosmos DB for PostgreSQL). Con garanzie di consistency che possono essere diverse da un classico database relazionale con scritture single-node, in base alla configurazione di sharding dell'estensione Citus.

#309 /

23 gennaio 2026

21:05

/ #database #openai

Nixiesearch

Dopo Quickwit scopro Nixiesearch, essenzialmente un Elasticsearch backed by object storage (S3, ecc.).

#256 /

22 dicembre 2025

20:49

/ #database #storage #cloud

SlateDB. slatedb is an OSS embedded key-value database built on object storage.

#240 /

20 dicembre 2025

14:29

/ #database #cloud #storage

pg_repack. pg_repack is a PostgreSQL extension which lets you remove bloat from tables and indexes, and optionally restore the physical order of clustered indexes. Unlike CLUSTER and VACUUM FULL it works online, without holding an exclusive lock on the processed tables during processing. pg_repack is efficient to boot, with performance comparable to using CLUSTER directly.

#229 /

14 dicembre 2025

11:00

/ #database

Postmortem di Railway, la creazione di un indice PostgreSQL ha tirato giù tutto:

A routine change to this Postgres database introduced a new column with an index to a table containing approximately 1 billion records. This table is critical in our backend API’s infrastructure, used by nearly all API operations.

The index creation did not use Postgres’ CONCURRENTLY option, causing an exclusive lock on the entire table. During the lock period, all queries against the database were queued behind the index operation. [...] Manual intervention attempts to terminate the index creation failed.

Le misure:

We’re going to introduce several changes to prevent errors of this class from happening again:

In CI, we will enforce CONCURRENTLY usage for all index creation operations, blocking non-compliant pull requests before merge.

PgBouncer connection pool limits will be adjusted to prevent overwhelming the underlying Postgres instance's capacity.

Database user connection limits will be configured to guarantee administrative access during incidents, ensuring maintenance operations remain possible under all conditions.

#228 /

14 dicembre 2025

10:58

/ #database #dev #cloud

Quickwit

Numeri sulla migrazione di Mezmo da Elasticsearch a Quickwit.

Con Elasticsearch:

2 PB di storage
275 istanze EC2
35 TB di RAM
7770 core

(800 MB - 2 GB di integestion al secondo)

Con Quickwit (che è pazzesco!):

-80% storage
-40% di instanze EC2
-98% RAM
-93% CPU

#159 /

17 novembre 2025