Long story short, there is a demand for a multi-master HA database solution that automatically provides provisioning and upgrades with zero downtime.
Long story short, there is a demand for a multi-master HA database solution that automatically provides provisioning and upgrades with zero downtime. The solution must offer easy deployment and smooth maintenance in order to have tens or hundreds of clusters behind all microservices in multiple environments for testing staging and production. It is important for service interruptions to be minimal during hardware crashes or regular maintenance.
Traditional postgresql setup has a PRIMARY database machine, a SECONDARY hot-swappable machine and a BACKUP host or solution.
The Cockroach standard solution would have a 3-zone multimaster setup and backup storage like S3.
We test the solution by bombing the API-s with Jmeter test scenarios which are set up based on real-life statistical usage. This involves creating new accounts and implementing logins and token refreshes.
Jmeter test setup description
Postgres setup with a 2 vCpu database host produced the following results that we can use as a baseline:
When comparing the same tests against our new setup, our results came out better because the load is balanced between 3 masters.
The averages look pretty nice and latency is not too bad.
Since all the data in a cockroach cluster is at least duplicated, we can safely kill one of the database masters without data loss or significant service disruption.
Let’s see what this produces in our error rates on the API endpoint.
In red, I’ve pointed out the minute that we terminated one of the master instances. This was done without any advance notice for the java application.
This is how it appears on the cockroach dashboards:
And we are back at full capacity with all three nodes when kubernetes automatically fixes the missing instance:
As we are using kubernetes now, there is an option to have automatic upscaling in place. We can easily scale java pods horizontally to have, for example, tens of java containers which can respond to jmeter bombing at a rate of thousands of times per second.
Additionally, we can quickly upscale the cockroach cluster within kubernetes. My tests revealed that by simply using different tooling, we can easily have 5 to 10 times more requests processed during peak situations, for example, a re-login burst after a general service outage.
Jmeter results when we upscale our setup:
The following graphs show how scaling up allows for more requests to be handled.
Cockroach dashboard showing doubled traffic:
Cockroach is a multi-master database, this means every committed transaction inside the database must be “agreed” between all hosts. This will make write transactions significantly slower, meaning the java service in front must be designed to allow this.
Microseconds in postgresql can become milliseconds in cockroachdb.
This, of course, depends on how you distribute your masters between availability zones, or however your network is set up.
Comparable results from a real-life postgresql database, where most of the queries are way below 1 millisecond:
To get started quickly with cockroach in kubernetes, you can use a standard helm chart:
Cockroachdb is a one-binary-only executable that needs to be started with proper parameters. It will always continue from where it left off, crashed, stopped or restarted.
For this reason, you can easily automate new clusters or re-provision old ones.
You can stop one host machine and reprovision a new, bigger machine and use the same data disk to catch up with the rest of the cluster.
Reprovisioning of the pods of the cockroach cluster in kubernetes will be done automatically once you have applied new manifests.
Migrating from postgres to cockroachdb is fairly simple. Cockroach uses ANSI SQL standard and postgresql wire protocol. If the database structure is straightforward and uses best practices, there should be no problems during migration. There could be issues if some postgres-specific hacks are used, or, for example, you have a hardcore monolith with years of partitioning.
But we are talking about microservices here, right?
One of the issues that needed actual changes in schema creation sql-s was the problem of creating indexes right after creating tables. Workaround for this is discussed here: https://github.com/cockroachdb/cockroach/issues/24626
The second thing was postgres-specific MATCH SIMPLE at the end of constraint creation:
ALTER TABLE ONLY token ADD CONSTRAINT token_sso_user_fk FOREIGN KEY (sso_id) REFERENCES sso_user(id) MATCH SIMPLE;
Cockroach does not understand “MATCH SIMPLE” so we had to remove it.
Cockroach is pretty usable once you split your monolith and try to scale by standardising and automating everything. Cockroachlabs will, of course, sell you their global scale and regional sharding (which is a topic of its own to look into). The easy-to-start usability, postgresql compatibility and out-of-the-box admin interface are a major bonus for people familiar with database tooling.