One Million Tables in MySQL 8.0

Background questions

Q: Why million tables in MySQL? Is it even realistic? How does this happen?

Usually, millions of tables in MySQL is a result of “a schema per customer” Software as a Service (SaaS) approach. For the purposes of customer data isolation (security) and logical data partitioning (performance), each “customer” has a dedicated schema. You can think of a WordPress hosting service (or any CMS based hosting) where each customer has their own dedicated schema. With 10K customers per MySQL server, we could end up with millions of tables.

Q: Should you design an application with >1 million tables? Continue reading One Million Tables in MySQL 8.0

10 Cool SQL Optimisations That do not Depend on the Cost Model

Cost Based Optimisation is the de-facto standard way to optimise SQL queries in most modern databases. It is the reason why it is really really hard to implement a complex, hand-written algorithm in a 3GL (third generation programming language) such as Java that outperforms a dynamically calculated database execution plan, that has been generated from a modern optimiser. I’ve recently delivered a talk about that topic:

Continue reading 10 Cool SQL Optimisations That do not Depend on the Cost Model

Scaling the GitLab database

For a long time GitLab.com used a single PostgreSQL database server and a single replica for disaster recovery purposes. This worked reasonably well for the first few years of GitLab.com’s existence, but over time we began seeing more and more problems with this setup. In this article we’ll take a look at what we did to help solve these problems for both GitLab.com and self-hosted GitLab instances.

For example, the database was under constant pressure, with CPU utilization hovering around 70 percent almost all the time. Not because we used all available resources in the best way possible, but because we were bombarding the server with too many (badly optimized) queries. We realized we needed a better setup that would allow us to balance the load and make GitLab.com more resilient to any problems that may occur on the primary database server.

When tackling these problems using PostgreSQL there are essentially four techniques you can apply: Continue reading Scaling the GitLab database

How to work with dates and time with Python

When trying to make things work with the datetime module, most Python users have faced a point when we resort to guess-and-check until the errors go away. datetime is one of those APIs that seems easy to use, but requires the developer to have a deep understanding of what a few things actually mean. Otherwise, given the complexity of date- and time-related issues, introducing unexpected bugs is easy. Continue reading How to work with dates and time with Python

Understanding Laravel Pipelines

Basically, using laravel pipelines you can pass an object between several classes in a fluid way to perform any type of task and finally return the resulting value once all the “tasks” have been executed.

Here you can learn more about the Laravel pipelines

The most clear example about how pipelines works resides in one of the most used components of the framework itself. I’m talking about middlewares.

Middleware provide a convenient mechanism for filtering HTTP requests entering your application…

Continue reading Understanding Laravel Pipelines

Deployment using Docker

This post covers how we use Docker to build and ship applications from development to a production environment and how it can be used in the various stages of the pipeline. We used the strategy outlined in this post to deploy a Node.js application serving a RESTful API, but is equally applicable to any web-based app. Continue reading Deployment using Docker

Operating OpenStack at Scale

A successful private cloud presents a consistent and reliable facade over the complexities of hyperscale infrastructure. It must simultaneously handle constant organic traffic growth, unanticipated spikes, a multitude of hardware vendors, and discordant customer demands. The depth of this complexity only increases with the age of the business, leaving a private cloud operator saddled with legacy hardware, old network infrastructure, customers dependent on legacy operating systems, and the list goes on. These are the foundations of the horror stories told by grizzled operators around the campfire. Continue reading Operating OpenStack at Scale

Principles of Sharding for Relational Databases

When your database is small (10s of GB), it’s easy to throw more hardware at the problem and scale up. As these tables grows however, you need to think about other ways to scale your database.

In one way, sharding is the best way to scale. Sharding enables you to linearly scale your database’s cpu, memory, and disk resources by separating your database into smaller parts. In other ways, sharding is a controversial topic. The internet is full of advice on sharding, from “essential to scaling your database infrastructure” to “why you never want to shard”. So the question is, whose advice should you take? Continue reading Principles of Sharding for Relational Databases

9 Reasons Why There Are No Foreign Keys in Your Database (Referential Integrity Checks)

I got into an argument with a few DBAs and architects recently, and they were shocked that some databases don’t have foreign keys and claimed that this is a design flaw and shouldn’t take place. And if it does, should be immediately fixed. I would like to argue with that. My experience tells me that many of the databases (the majority of those I have worked with) do not contain foreign keys and that it’s not always a bad thing. In this article, I want to focus on the reasons for why it is so. Continue reading 9 Reasons Why There Are No Foreign Keys in Your Database (Referential Integrity Checks)