22 જાન્યુઆરી, 2026

800 મિલિયન ChatGPT વપરાશકર્તાઓને સપોર્ટ કરવા PostgreSQLનું સ્કેલિંગ

લેખક: બોહાન ઝાંગ, ટેક્નિકલ સ્ટાફના સભ્ય

લોડિંગ…

વર્ષોથી, PostgreSQL ChatGPT અને OpenAIની API જેવા મુખ્ય ઉત્પાદનોને શક્તિ આપતી સૌથી મહત્વપૂર્ણ આંતરિક ડેટા સિસ્ટમોમાંની એક રહી છે. અમારી વપરાશકર્તા સંખ્યા ઝડપથી વધતા, અમારી ડેટાબેસ પરની માંગ પણ ગાણિતિક રીતે વધી છે. ગયા એક વર્ષમાં, PostgreSQL પરનો અમારો લોડ 10x કરતાં વધુ વધ્યો છે અને તે હજુ પણ ઝડપથી વધી રહ્યો છે.

આ વૃદ્ધિને ટકાવી રાખવા માટે અમારી પ્રોડક્શન ઇન્ફ્રાસ્ટ્રક્ચરને આગળ વધારવાના પ્રયાસોએ એક નવી સમજ આપી: ઘણા લોકો પહેલાં જેટલું શક્ય માને છે તે કરતાં ઘણાં મોટા read-heavy workloads ને વિશ્વસનીય રીતે સપોર્ટ કરવા PostgreSQL ને સ્કેલ કરી શકાય છે. આ સિસ્ટમ (જે શરૂઆતમાં University of California, Berkeley ના વૈજ્ઞાનિકોની ટીમે બનાવી હતી) એ અમને એક જ primary Azure PostgreSQL flexible server instance⁠(નવી વિન્ડોમાં ખૂલે છે) અને વૈશ્વિક સ્તરે અનેક ક્ષેત્રોમાં ફેલાયેલા લગભગ 50 read replicas સાથે વિશાળ વૈશ્વિક ટ્રાફિક સપોર્ટ કરવા સક્ષમ બનાવ્યું છે. આ OpenAI માં અમે PostgreSQL ને કેવી રીતે સ્કેલ કર્યું તેની કહાની છે, જેથી કડક optimizations અને મજબૂત engineering દ્વારા 800 મિલિયન વપરાશકર્તાઓ માટે પ્રતિ સેકંડ લાખો queries ને સપોર્ટ કરી શકીએ. સાથે જ, આ માર્ગમાં અમને મળેલા મુખ્ય પાઠો પણ શેર કરીશું.

અમારી શરૂઆતની ડિઝાઇનમાં પડેલી તિરાડો

ChatGPT લોન્ચ થયા પછી, ટ્રાફિક અપૂર્વ ગતિએ વધ્યો. તેને સપોર્ટ કરવા માટે અમે એપ્લિકેશન અને PostgreSQL ડેટાબેસ બંને સ્તરે વ્યાપક optimizations ઝડપથી અમલમાં મૂક્યા, instance size વધારીને scale up કર્યું અને વધુ read replicas ઉમેરીને scale out કર્યું. આ architecture એ લાંબા સમય સુધી અમને સારી સેવા આપી છે. સતત સુધારાઓ સાથે, તે ભવિષ્યની વૃદ્ધિ માટે હજી પણ પૂરતું runway આપે છે.

એક single-primary architecture OpenAI જેવી સ્કેલની માંગો પૂરી કરી શકે છે તે આશ્ચર્યજનક લાગી શકે. પરંતુ પ્રયોગમાં આ કાર્યરત બનાવવું સરળ નથી. અમે Postgres overload ના કારણે અનેક SEVs જોયા છે, અને તેમાં ઘણીવાર એક જ પેટર્ન જોવા મળે છે: upstream સમસ્યા ડેટાબેસ લોડમાં અચાનક ઉછાળો લાવે છે, જેમ કે caching layer ની નિષ્ફળતાને કારણે વ્યાપક cache misses, CPU ને saturation સુધી પહોંચાડતા મોંઘા multi-way joins નો વધારો, અથવા નવા feature launch થી write storm. resource utilization વધે તેમ query latency વધે છે અને requests timeout થવા લાગે છે. ત્યારબાદ retries લોડને વધુ વધારી દે છે, જે એક દુષ્ચક્ર શરૂ કરે છે અને આખી ChatGPT તથા API સેવાઓને અસર પહોંચાડી શકે છે.

ભલે PostgreSQL અમારા read-heavy workloads માટે સારી રીતે સ્કેલ થાય છે, ઊંચા write traffic દરમ્યાન અમને હજુ પડકારો મળે છે. તેનું મોટું કારણ PostgreSQL નું multiversion concurrency control (MVCC) implementation છે, જે write-heavy workloads માટે તેને ઓછું કાર્યક્ષમ બનાવે છે. ઉદાહરણ તરીકે, જ્યારે કોઈ query tuple અથવા એક જ field ને update કરે છે, ત્યારે નવી version બનાવવા માટે આખી row ની copy થાય છે. ભારે write loads હેઠળ, આથી નોંધપાત્ર write amplification થાય છે. તે read amplification પણ વધારે છે, કારણ કે queries ને તાજી version મેળવવા માટે અનેક tuple versions (dead tuples) માંથી scan કરવું પડે છે. MVCC વધારાના પડકારો પણ લાવે છે, જેમ કે table અને index bloat, વધેલો index maintenance overhead અને જટિલ autovacuum tuning. (આ મુદ્દાઓ પર તમે Carnegie Mellon University ના Prof. Andy Pavlo સાથે મેં લખેલા The Part of PostgreSQL We Hate the Most⁠(નવી વિન્ડોમાં ખૂલે છે) નામના બ્લોગમાં વિગતવાર વાંચી શકો છો, જેને PostgreSQL Wikipedia પેજ પર સંદર્ભિત⁠(નવી વિન્ડોમાં ખૂલે છે) કરવામાં આવ્યો છે.)

PostgreSQL ને લાખો QPS સુધી સ્કેલ કરવું

આ મર્યાદાઓ ઘટાડવા અને write pressure ઓછું કરવા માટે, અમે shardable (અર્થાત્ આડી રીતે partition કરી શકાય એવા) write-heavy workloads ને Azure Cosmos DB જેવી sharded systems તરફ સ્થળાંતર કર્યા છે અને કરવાનું ચાલુ રાખ્યું છે, સાથે જ અનાવશ્યક writes ઘટાડવા એપ્લિકેશન logic ને optimize કર્યું છે. અમે હાલની PostgreSQL deployment માં નવા tables ઉમેરવાની મંજૂરી પણ આપતા નથી. નવા workloads માટે sharded systems મૂળભૂત પસંદગી છે.

અમારી ઇન્ફ્રાસ્ટ્રક્ચર વિકસતી રહી હોવા છતાં, PostgreSQL unsharded જ રહ્યો છે, જેમાં તમામ writes માટે એક જ primary instance સેવા આપે છે. તેનું મુખ્ય કારણ એ છે કે હાલના application workloads ને shard કરવું અત્યંત જટિલ અને સમયલક્ષી રહેશે, જેના માટે સૈંકડો application endpoints માં ફેરફાર કરવાની જરૂર પડશે અને શક્ય છે કે તેમાં મહિનાઓ કે વર્ષો લાગી જાય. અમારા workloads મુખ્યત્વે read-heavy હોવાથી અને અમે વ્યાપક optimizations અમલમાં મૂક્યા હોવાથી, વર્તમાન architecture સતત ટ્રાફિક વૃદ્ધિને સપોર્ટ કરવા હજી પણ પૂરતું headroom આપે છે. ભવિષ્યમાં PostgreSQL ને shard કરવાની સંભાવનાને અમે નકારતા નથી, પરંતુ હાલ અને નજીકના ભવિષ્ય માટે અમારે પાસે પૂરતું runway હોવાથી તે તાત્કાલિક પ્રાથમિકતા નથી.

આગળના વિભાગોમાં, અમે સામનો કરેલા પડકારો અને તે ઉકેલવા તથા ભવિષ્યના outages અટકાવવા માટે અમલમાં મૂકેલા વ્યાપક optimizations પર વિગતવાર નજર કરીશું, જેથી PostgreSQL ને તેની મર્યાદાઓ સુધી ધકેલીને પ્રતિ સેકંડ લાખો queries (QPS) સુધી સ્કેલ કરી શકાય.

Primary પરનો લોડ ઘટાડવો

પડકાર: માત્ર એક writer હોવાને કારણે, single-primary setup writes ને સ્કેલ કરી શકતું નથી. ભારે write spikes ઝડપથી primary ને overload કરી શકે છે અને ChatGPT તથા અમારી API જેવી સેવાઓને અસર કરી શકે છે.

ઉકેલ: primary પાસે write spikes સંભાળવા પૂરતી ક્ષમતા રહે તે માટે અમે primary પરનો લોડ—reads અને writes બંને—શક્ય તેટલો ઓછો રાખીએ છીએ. શક્ય હોય ત્યાં read traffic replicas પર ખસેડવામાં આવે છે. જોકે, કેટલીક read queries primary પર જ રહેવી પડે છે કારણ કે તે write transactions નો ભાગ હોય છે. એવા કેસમાં, અમે તે કાર્યક્ષમ રહે અને slow queries ટાળી શકાય તેની ખાતરી કરીએ છીએ. write traffic માટે, અમે shardable, write-heavy workloads ને Azure CosmosDB જેવી sharded systems તરફ ખસેડ્યા છે. જે workloads ને shard કરવું મુશ્કેલ છે પરંતુ હજી પણ ઊંચો write volume પેદા કરે છે, તેને સ્થળાંતર કરવા વધુ સમય લાગે છે, અને તે પ્રક્રિયા હજી ચાલુ છે. અમે write load ઘટાડવા માટે અમારી applications ને પણ આક્રમક રીતે optimize કરી છે. ઉદાહરણ તરીકે, redundant writes સર્જતા application bugs સુધાર્યા છે અને જ્યાં યોગ્ય હોય ત્યાં traffic spikes સમતોલ કરવા lazy writes રજૂ કર્યા છે. ઉપરાંત, table fields નું backfilling કરતી વખતે, અતિરિક્ત write pressure અટકાવવા અમે કડક rate limits લાગુ કરીએ છીએ.

Query optimization

પડકાર: અમે PostgreSQL માં કેટલીક મોંઘી queries ઓળખી. પહેલાં, આ queries ના volume માં અચાનક વધારો CPU નો મોટો હિસ્સો વાપરી લેતો, જેના કારણે ChatGPT અને API requests બંને ધીમા પડતા.

ઉકેલ: થોડાક મોંઘા queries, ખાસ કરીને ઘણી tables ને સાથે join કરતી queries, આખી service ને નોંધપાત્ર રીતે ધીમી કરી શકે છે અથવા બંધ પણ પાડી શકે છે. PostgreSQL queries કાર્યક્ષમ રહે અને સામાન્ય Online Transaction Processing (OLTP) anti-patterns ટળે તેની ખાતરી કરવા માટે અમારે સતત optimizations કરવાની જરૂર છે. ઉદાહરણ તરીકે, અમે એક વખત 12 tables ને join કરતી અત્યંત ખર્ચાળ query શોધી હતી, જેમાં આ query ના spikes ભૂતકાળના high-severity SEVs માટે જવાબદાર હતા. શક્ય હોય ત્યાં સુધી જટિલ multi-table joins ટાળવા જોઈએ. જો joins જરૂરી હોય, તો query ને વિભાજિત કરવાની અને જટિલ join logic ને application layer પર ખસેડવાની વિચારણા કરવી જોઈએ. આવી ઘણી સમસ્યાજનક queries Object-Relational Mapping frameworks (ORMs) દ્વારા જનરેટ થાય છે, તેથી તેઓ બનાવે છે તે SQL ને ધ્યાનથી સમીક્ષવું અને તે અપેક્ષા મુજબ વર્તે છે તેની ખાતરી કરવી મહત્વપૂર્ણ છે. PostgreSQL માં લાંબા સમય સુધી ચાલતી idle queries મળવી પણ સામાન્ય છે. autovacuum અવરોધાય નહીં તે માટે idle_in_transaction_session_timeout જેવી timeout settings ગોઠવવી આવશ્યક છે.

Single point of failure mitigation

પડકાર: જો કોઈ read replica બંધ પડે, તો ટ્રાફિકને બીજા replicas તરફ રાઉટ કરી શકાય છે. પરંતુ એક જ writer પર નિર્ભરતા હોવાને કારણે single point of failure સર્જાય છે—જો તે બંધ પડે, તો આખી service અસરગ્રસ્ત થાય છે.

ઉકેલ: મોટાભાગની મહત્વપૂર્ણ requests માં માત્ર read queries જ હોય છે. primary માં રહેલા single point of failure ને ઘટાડવા માટે, અમે એ reads ને writer પરથી replicas પર ખસેડ્યા, જેથી primary બંધ પડે તો પણ તે requests સેવા આપતી રહે. write operations હજી પણ નિષ્ફળ જશે, પરંતુ અસર ઓછી થશે. કારણ કે reads ઉપલબ્ધ રહેશે, હવે તે SEV0 રહેતું નથી.

primary failures ઘટાડવા માટે, અમે primary ને High-Availability (HA) mode માં hot standby સાથે ચલાવીએ છીએ, જે સતત synchronized replica છે અને ટ્રાફિક સંભાળવા માટે હંમેશા તૈયાર રહે છે. primary બંધ પડે અથવા maintenance માટે offline લેવાની જરૂર પડે, ત્યારે downtime ઓછો રહે તે માટે અમે standby ને ઝડપથી promote કરી શકીએ છીએ. Azure PostgreSQL ટીમે બહુ ઊંચા લોડ હેઠળ પણ આ failovers સુરક્ષિત અને વિશ્વસનીય રહે તેની ખાતરી માટે નોંધપાત્ર કામ કર્યું છે. read replica failures સંભાળવા માટે, અમે દરેક region માં પૂરતું capacity headroom રાખીને અનેક replicas deploy કરીએ છીએ, જેથી એક replica ની નિષ્ફળતા regional outage માં ન ફેરવાય.

Workload isolation

પડકાર: અમને વારંવાર એવી પરિસ્થિતિઓનો સામનો કરવો પડે છે જ્યાં કેટલીક requests PostgreSQL instances પર અસંગત પ્રમાણમાં resources વાપરે છે. તેના કારણે એ જ instances પર ચાલતા અન્ય workloads ના performance પર અસર પડે છે. ઉદાહરણ તરીકે, નવા feature launch થી અકાર્યક્ષમ queries આવી શકે છે, જે PostgreSQL CPU નો ભારે ઉપયોગ કરે છે અને અન્ય મહત્વપૂર્ણ features માટેની requests ધીમી કરી દે છે.

Solution: To mitigate the “noisy neighbor” problem, we isolate workloads onto dedicated instances to ensure that sudden spikes in resource-intensive requests don’t impact other traffic. Specifically, we split requests into low-priority and high-priority tiers and route them to separate instances. This way, even if a low-priority workload becomes resource-intensive, it won’t degrade the performance of high-priority requests. We apply the same strategy across different products and services as well, so that activity from one product does not affect the performance or reliability of another.

Connection pooling

Challenge: Each instance has a maximum connection limit (5,000 in Azure PostgreSQL). It’s easy to run out of connections or accumulate too many idle ones. We’ve previously had incidents caused by connection storms that exhausted all available connections.

Solution: We deployed PgBouncer as a proxy layer to pool database connections. Running it in statement or transaction pooling mode allows us to efficiently reuse connections, greatly reducing the number of active client connections. This also cuts connection setup latency: in our benchmarks, the average connection time dropped from 50 milliseconds (ms) to 5 ms. Inter-region connections and requests can be expensive, so we co-locate the proxy, clients, and replicas in the same region to minimize network overhead and connection use time. Moreover, PgBouncer must be configured carefully. Settings like idle timeouts are critical to prevent connection exhaustion.

PostgreSQL પ્રોક્સી આકૃતિ — દરેક રીડ રેપ્લિકા પાસે પોતાનું Kubernetes deployment છે, જેમાં અનેક PgBouncer pods ચાલે છે. અમે એ જ Kubernetes Service પાછળ અનેક Kubernetes deployments ચલાવીએ છીએ, જે pods વચ્ચે ટ્રાફિકનું લોડ-બેલેન્સિંગ કરે છે.

Caching

Challenge: A sudden spike in cache misses can trigger a surge of reads on the PostgreSQL database, saturating CPU and slowing user requests.

Solution: To reduce read pressure on PostgreSQL, we use a caching layer to serve most of the read traffic. However, when cache hit rates drop unexpectedly, the burst of cache misses can push a large volume of requests directly to PostgreSQL. This sudden increase in database reads consumes significant resources, slowing down the service. To prevent overload during cache-miss storms, we implement a cache locking (and leasing) mechanism so that only a single reader that misses on a particular key fetches the data from PostgreSQL. When multiple requests miss on the same cache key, only one request acquires the lock and proceeds to retrieve the data and repopulate the cache. All other requests wait for the cache to be updated rather than all hitting PostgreSQL at once. This significantly reduces redundant database reads and protects the system from cascading load spikes.

Scaling read replicas

Challenge: The primary streams Write Ahead Log (WAL) data to every read replica. As the number of replicas increases, the primary must ship WAL to more instances, increasing pressure on both network bandwidth and CPU. This causes higher and more unstable replica lag, which makes the system harder to scale reliably.

Solution: We operate nearly 50 read replicas across multiple geographic regions to minimize latency. However, with the current architecture, the primary must stream WAL to every replica. Although it currently scales well with very large instance types and high-network bandwidth, we can’t keep adding replicas indefinitely without eventually overloading the primary. To address this, we’re collaborating with the Azure PostgreSQL team on cascading replication⁠(નવી વિન્ડોમાં ખૂલે છે), where intermediate replicas relay WAL to downstream replicas. This approach allows us to scale to potentially over a hundred replicas without overwhelming the primary. However, it also introduces additional operational complexity, particularly around failover management. The feature is still in testing; we’ll ensure it’s robust and can fail over safely before rolling it out to production.

Rate limit

Challenge: A sudden traffic spike on specific endpoints, a surge of expensive queries, or a retry storm can quickly exhaust critical resources such as CPU, I/O, and connections, which causes widespread service degradation.

Solution: We implemented rate-limiting across multiple layers—application, connection pooler, proxy, and query—to prevent sudden traffic spikes from overwhelming database instances and triggering cascading failures. It’s also crucial to avoid overly short retry intervals, which can trigger retry storms. We also enhanced the ORM layer to support rate limiting and when necessary, fully block specific query digests. This targeted form of load shedding enables rapid recovery from sudden surges of expensive queries.

Schema Management

Challenge: Even a small schema change, such as altering a column type, can trigger a full table rewrite⁠(નવી વિન્ડોમાં ખૂલે છે). We therefore apply schema changes cautiously—limiting them to lightweight operations and avoiding any that rewrite entire tables.

Solution: Only lightweight schema changes are permitted, such as adding or removing certain columns that do not trigger a full table rewrite. We enforce a strict 5-second timeout on schema changes. Creating and dropping indexes concurrently is allowed. Schema changes are restricted to existing tables. If a new feature requires additional tables, they must be in alternative sharded systems such as Azure CosmosDB rather than PostgreSQL. When backfilling a table field, we apply strict rate limits to prevent write spikes. Although this process can sometimes take over a week, it ensures stability and avoids any production impact.

Results and the road ahead

This effort demonstrates that with the right design and optimizations, Azure PostgreSQL can be scaled to handle the largest production workloads. PostgreSQL handles millions of QPS for read-heavy workloads, powering OpenAI’s most critical products like ChatGPT and the API platform. We added nearly 50 read replicas, while keeping replication lag near zero, maintained low-latency reads across geo-distributed regions, and built sufficient capacity headroom to support future growth.

This scaling works while still minimizing latency and improving reliability. We consistently deliver low double-digit millisecond p99 client-side latency and five-nines availability in production. And over the past 12 months, we’ve had only one SEV-0 PostgreSQL incident (it occurred during the viral launch⁠(નવી વિન્ડોમાં ખૂલે છે) of ChatGPT ImageGen, when write traffic suddenly surged by more than 10x as over 100 million new users signed up within a week.)

While we’re happy with how far PostgreSQL has taken us, we continue to push its limits to ensure we have sufficient runway for future growth. We’ve already migrated the shardable write-heavy workloads to our sharded systems like CosmosDB. The remaining write-heavy workloads are more challenging to shard—we’re actively migrating those as well to further offload writes from the PostgreSQL primary. We’re also working with Azure to enable cascading replication so we can safely scale to significantly more read replicas.

Looking ahead, we’ll continue to explore additional approaches to further scale, including sharded PostgreSQL or alternative distributed systems, as our infrastructure demands continue to grow.

2026

લેખક

Bohan Zhang

આભારવિધિ

આ પોસ્ટમાં યોગદાન આપનાર Jon Lee, Sicheng Liu, Chaomin Yu અને Chenglong Hao નો વિશેષ આભાર, તેમજ PostgreSQL ને સ્કેલ કરવામાં મદદ કરનાર સમગ્ર ટીમનો પણ આભાર. અમે Azure PostgreSQL ટીમનો તેમની મજબૂત ભાગીદારી માટે પણ આભાર માનીએ છીએ.

વાંચતા રહો

બધું જુઓ

કોર ડમ્પ એપિડેમિયોલોજી: 18 વર્ષ જૂના બગનો ઉકેલ

ઇજનેરી30 જૂન, 2026

Codex સાથે સ્વ-સુધારક ટેક્સ એજન્ટો બનાવવા

ઇજનેરી27 મે, 2026

Windows પર Codex ને સક્ષમ કરવા માટે સુરક્ષિત, અસરકારક સેન્ડબોક્સ બનાવવું

ઇજનેરી13 મે, 2026