What is the best tech community for startups in Kenya?

Orb21 is the premier community for high-intent builders and founders in Kenya. Unlike casual groups, Orb21 focuses on disciplined execution, offering a verified Discord network, finding co-founders, and shipping products.

How do I join a startup team in Nairobi?

You can join the Orb21 Network via our Discord. We connect developers, designers, and operators with ambitious founders building real equity-based startups.

Where can I find a technical co-founder in Africa?

Orb21 specializes in matching non-technical founders with vetted technical talent through our 'Gatekeeper' verification process and active builder community.

Back to Logs

DatabaseScalingPostgreSQL

When Your Database Chokes: Lessons from Black Friday

Jerry Odhiambo

Backend Engineer

Nov 28, 2025 6 min read

When Your Database Chokes: Lessons from Black Friday

The Black Friday Meldtown

It was 9:00 AM on Black Friday. We had prepared for this for months. We scaled our Kubernetes cluster to 50 nodes. We upgraded our load balancers.

But at 9:05 AM, our monitoring dashboard lit up like a Christmas tree. - CPU usage: 100% on the Primary DB. - Active Connections: Maxed out (5000/5000). - Latency: Spiked from 50ms to 5,000ms.

We were down. The app was returning 504 Gateway Timeouts. We were losing $1,000 every minute.

The Culprit: The Silent N+1 Killer

We panicked. changing instance sizes didn't help. Adding read replicas didn't help.

We looked at the slow query log. It wasn't a complex join. It wasn't a DDoS attack.

It was a simple `SELECT *` inside a loop in our order processing code.

```javascript // The code looked innocent: const users = await db.getAllUsers(); users.forEach(async (user) => { const orders = await db.getOrdersForUser(user.id); // <--- THE KILLER // ... process orders }); ```

In local development with 10 test users, this runs instantly. You make 1 query for users, and 10 queries for orders. Total 11 queries.

In production with 100,000 active users, this triggered 100,001 SQL queries immediately. The database was being DDoSed by our own code.

The Fix (War Room Tactics)

We had to fix it live while customers were screaming on Twitter.

Step 1: Stop the Bleeding (Read Replicas) We immediately diverted all read traffic (dashboards, analytics) to a Read Replica. This unblocked the Main Writer instance so it could process transactions (Writes). Impact: CPU dropped to 80%. Site still slow, but not dead.

Step 2: The Redis Shield We realized 90% of the load was people refreshing the "Live Sales Dashboard". We implemented a Redis cache layer for that query with a Time-To-Live (TTL) of 60 seconds. Instead of hitting Postgres 10,000 times a second, we hit Redis. Postgres only saw 1 query per minute for that dashboard. Impact: CPU dropped to 40%. Site usable.

Step 3: Rewrite the Code (Batch Fetching) That night, we rewrote the N+1 loop to use a single `IN` query.

```sql SELECT * FROM orders WHERE user_id IN (1, 2, 3, ... 1000); ```

We fetched data in batches of 1000. *Impact: Total queries dropped from 100,000 to 100.*

Lesson Learned: ORMs are Liars

Your ORM (Hibernate, Prisma, TypeORM) is lying to you. It creates an abstraction layer that hides the cost of data access. It makes `user.orders` look like a free property access, when it is actually a network call.

To scale, you must unlearn the ORM. - Don't just trust the abstraction. - Run `EXPLAIN ANALYZE` on every query. - Understand the physical execution plan. - Monitor query counts per request.

If you don't understand SQL, you are not a Senior Backend Engineer. You are just a framework user.

Share Transmission

Stop reading. Start building.

The content is free. The execution requires a team. Find your co-founder in the Forge today.

Enter The Forge

Back to Logs

DatabaseScalingPostgreSQL

When Your Database Chokes: Lessons from Black Friday

Jerry Odhiambo

Backend Engineer

Nov 28, 2025 6 min read

The Black Friday Meldtown

It was 9:00 AM on Black Friday. We had prepared for this for months. We scaled our Kubernetes cluster to 50 nodes. We upgraded our load balancers.

We were down. The app was returning 504 Gateway Timeouts. We were losing $1,000 every minute.

The Culprit: The Silent N+1 Killer

We panicked. changing instance sizes didn't help. Adding read replicas didn't help.

We looked at the slow query log. It wasn't a complex join. It wasn't a DDoS attack.

It was a simple `SELECT *` inside a loop in our order processing code.

In local development with 10 test users, this runs instantly. You make 1 query for users, and 10 queries for orders. Total 11 queries.

In production with 100,000 active users, this triggered 100,001 SQL queries immediately. The database was being DDoSed by our own code.

The Fix (War Room Tactics)

We had to fix it live while customers were screaming on Twitter.

Step 1: Stop the Bleeding (Read Replicas) We immediately diverted all read traffic (dashboards, analytics) to a Read Replica. This unblocked the Main Writer instance so it could process transactions (Writes). Impact: CPU dropped to 80%. Site still slow, but not dead.

Step 2: The Redis Shield We realized 90% of the load was people refreshing the "Live Sales Dashboard". We implemented a Redis cache layer for that query with a Time-To-Live (TTL) of 60 seconds. Instead of hitting Postgres 10,000 times a second, we hit Redis. Postgres only saw 1 query per minute for that dashboard. Impact: CPU dropped to 40%. Site usable.

Step 3: Rewrite the Code (Batch Fetching) That night, we rewrote the N+1 loop to use a single `IN` query.

```sql SELECT * FROM orders WHERE user_id IN (1, 2, 3, ... 1000); ```

We fetched data in batches of 1000. *Impact: Total queries dropped from 100,000 to 100.*

Lesson Learned: ORMs are Liars

To scale, you must unlearn the ORM. - Don't just trust the abstraction. - Run `EXPLAIN ANALYZE` on every query. - Understand the physical execution plan. - Monitor query counts per request.

If you don't understand SQL, you are not a Senior Backend Engineer. You are just a framework user.

Share Transmission

Stop reading. Start building.

The content is free. The execution requires a team. Find your co-founder in the Forge today.

Enter The Forge

The Black Friday Meldtown

The Culprit: The Silent N+1 Killer

The Fix (War Room Tactics)

Step 1: Stop the Bleeding (Read Replicas) We immediately diverted all *read* traffic (dashboards, analytics) to a Read Replica. This unblocked the Main Writer instance so it could process transactions (Writes). *Impact: CPU dropped to 80%. Site still slow, but not dead.*

Step 3: Rewrite the Code (Batch Fetching) That night, we rewrote the N+1 loop to use a single `IN` query.

Lesson Learned: ORMs are Liars

Stop reading. Start building.

The Black Friday Meldtown

The Culprit: The Silent N+1 Killer

The Fix (War Room Tactics)

Step 1: Stop the Bleeding (Read Replicas) We immediately diverted all *read* traffic (dashboards, analytics) to a Read Replica. This unblocked the Main Writer instance so it could process transactions (Writes). *Impact: CPU dropped to 80%. Site still slow, but not dead.*

Step 3: Rewrite the Code (Batch Fetching) That night, we rewrote the N+1 loop to use a single `IN` query.

Lesson Learned: ORMs are Liars

Stop reading. Start building.

Step 1: Stop the Bleeding (Read Replicas) We immediately diverted all read traffic (dashboards, analytics) to a Read Replica. This unblocked the Main Writer instance so it could process transactions (Writes). Impact: CPU dropped to 80%. Site still slow, but not dead.

Step 1: Stop the Bleeding (Read Replicas) We immediately diverted all read traffic (dashboards, analytics) to a Read Replica. This unblocked the Main Writer instance so it could process transactions (Writes). Impact: CPU dropped to 80%. Site still slow, but not dead.