alt text

Database Solutions: Evaluating Multi-master CockroachDB for Microservices

mai, 06, 2019 5 minutit lugemist

Tanel Anderson

Infra Guy

Long story short, there is a demand for a multi-master HA database solution that automatically provides provisioning and upgrades with zero downtime.

Long story short, there is a demand for a multi-master HA database solution that automatically provides provisioning and upgrades with zero downtime. The solution must offer easy deployment and smooth maintenance in order to have tens or hundreds of clusters behind all microservices in multiple environments for testing staging and production. It is important for service interruptions to be minimal during hardware crashes or regular maintenance.

In this article, I will explore how this is achievable with the help of KUBERNETES and COCKROACHDB.

The stack that we currently use is JAVA, FLYWAY and POSTGRESQL. Java service used in this example is a Single-Sign-On (SSO) web API that provides the following capabilities:

  • User creation
  • Login
  • Token check
  • Token update

Traditional postgresql setup has a PRIMARY database machine, a SECONDARY hot-swappable machine and a BACKUP host or solution.


Traditional stack setup

The Cockroach standard solution would have a 3-zone multimaster setup and backup storage like S3.


Multi-master stack setup

We test the solution by bombing the API-s with Jmeter test scenarios which are set up based on real-life statistical usage. This involves creating new accounts and implementing logins and token refreshes.

Jmeter test setup description


Jmeter test setup

Postgres setup with a 2 vCpu database host produced the following results that we can use as a baseline:


Baseline stats with postgreSQL setup

When comparing the same tests against our new setup, our results came out better because the load is balanced between 3 masters.


CockroachDB stats

The averages look pretty nice and latency is not too bad.

Killing one of the masters

Since all the data in a cockroach cluster is at least duplicated, we can safely kill one of the database masters without data loss or significant service disruption.

Let’s see what this produces in our error rates on the API endpoint.


Killing one of the masters (moment pointed out in red)

In red, I’ve pointed out the minute that we terminated one of the master instances. This was done without any advance notice for the java application.

This is how it appears on the cockroach dashboards:


Killing one of the masters on cockroachDB stats

And we are back at full capacity with all three nodes when kubernetes automatically fixes the missing instance:


CockroachDB stats after recovery

Scaling the cluster

As we are using kubernetes now, there is an option to have automatic upscaling in place. We can easily scale java pods horizontally to have, for example, tens of java containers which can respond to jmeter bombing at a rate of thousands of times per second.

Additionally, we can quickly upscale the cockroach cluster within kubernetes. My tests revealed that by simply using different tooling, we can easily have 5 to 10 times more requests processed during peak situations, for example, a re-login burst after a general service outage.

Jmeter results when we upscale our setup:


Calls per sec after scaling up

The following graphs show how scaling up allows for more requests to be handled.

Cockroach dashboard showing doubled traffic:


CockroachDB doubled traffic

Response times are not good

Cockroach is a multi-master database, this means every committed transaction inside the database must be “agreed” between all hosts. This will make write transactions significantly slower, meaning the java service in front must be designed to allow this.

Microseconds in postgresql can become milliseconds in cockroachdb.

This, of course, depends on how you distribute your masters between availability zones, or however your network is set up.


CockroachDB response times

Comparable results from a real-life postgresql database, where most of the queries are way below 1 millisecond:


PostgreSQL response times

Multi-cluster setup and provisioning automation

To get started quickly with cockroach in kubernetes, you can use a standard helm chart:

Cockroachdb is a one-binary-only executable that needs to be started with proper parameters. It will always continue from where it left off, crashed, stopped or restarted.

For this reason, you can easily automate new clusters or re-provision old ones.

  • In an oldschool VM based setup:

You can stop one host machine and reprovision a new, bigger machine and use the same data disk to catch up with the rest of the cluster.

  • In kubernetes:

Reprovisioning of the pods of the cockroach cluster in kubernetes will be done automatically once you have applied new manifests.

Migrating from PostgreSQL

Migrating from postgres to cockroachdb is fairly simple. Cockroach uses ANSI SQL standard and postgresql wire protocol. If the database structure is straightforward and uses best practices, there should be no problems during migration. There could be issues if some postgres-specific hacks are used, or, for example, you have a hardcore monolith with years of partitioning.

But we are talking about microservices here, right?

  • Using our SSO service as an example during migration:

One of the issues that needed actual changes in schema creation sql-s was the problem of creating indexes right after creating tables. Workaround for this is discussed here:

The second thing was postgres-specific MATCH SIMPLE at the end of constraint creation:


Cockroach does not understand “MATCH SIMPLE” so we had to remove it.


Cockroach is pretty usable once you split your monolith and try to scale by standardising and automating everything. Cockroachlabs will, of course, sell you their global scale and regional sharding (which is a topic of its own to look into). The easy-to-start usability, postgresql compatibility and out-of-the-box admin interface are a major bonus for people familiar with database tooling.

See original post HERE

eelnevad postitused