The Journey of a DevOps person from a Monolith to Microservices

Author of this post, Rahul Sinha, is a former member of the Synup Engineering team.
“This is the story of my journey at Synup, a fun little startup (that’s getting really big, now!). This is where I got my hands dirty in almost everything and learnt tremendously.”

The goal of this blog post is to take you through the process of what/why/how we went from a monolith architecture to microservices from an operations perspective.

Why Microservices?

It all boils down to scale, doesn’t it? By scale, I mean two things:

  1. Scaling the product to serve customers without any hassle.
  2. Accommodating the increasing team size working on the product, so that more features can be added at the pace we want to progress.

To accomplish this effectively for the organisation and the product, one way is to break down a big app (monolith) into individual scalable services (microservices) to ensure that maintaining or adding a feature to your product can be carried out with utmost ease.

Mono-repo or multi-repo?

The team would create many individual services with clear goals in mind. As a result, this will naturally create many codebases, and we have to decide about how we want to organise these codebases. The usual conundrum is - “Do we go with a mono-repo, or individual repos?”

We went with mono-repo for easy collaboration between different codebases. A pull-request would contain all the changes required across services to ship a feature. This keeps the changeset required for a task in one place, making dependent deployments easy.

This decision also led to our team writing some spectacular tooling (more on this later) which helped us enforce some conventions with ease. In just a bit, I’ll show you the tooling we built that simplified our lives.

Picking a tool to host the microservices

I had to figure out how to host our microservices, and create a consistent environment for the developers (all this while not getting lost myself).

When working on this, many old tools tend to get thrown out of the window or change a lot. I realised that we would need some kind of magical system, that will give the infrastructure setup a well-defined structure and helps run it. I chose Kubernetes as the “orchestrator”. It takes a bunch of YAML as input of desired state of services and then keeps it up and running.

App configuration

A big part of any app configuration system is to make managing different environments easier. Based on The Twelve-Factor App methodology, I was trying to build one unified system for all services and without a consistent configuration standard for all services - making it very hard to achieve this goal. Simply put, when you make tarball or container-image of an application, you should be able to run it on any environment (like staging or production) by providing different values to these configuration, without having to build different images for different environments.

The development and production environments are relatively simple as we would want to run an app with mostly single configuration for them.

On development, each service will run locally on a developer’s machine and they will have complete authority over the configuration. Dependent services will also run locally - like Postgres, Redis, etc. Production is also simple since all services are run under one global configuration and you probably never want to run multiple configurations for it. But the staging environment is a different beast.

On staging, the team will want to run multiple environments for all services, which may have varying configuration, running out of different branches (versions of code). When working on a feature that impacts one or few microservices, we would want:

  1. The service with new code changes running from a specific git branch with any new config.
  2. The other services running out of master branch, with default config.

Translating this to workflow requirements: Trying out changes for service1 on staging, should not require having to deploy all other services.

To handle situations like these, we would want a good configuration system and a tool to make the deployments easy. As you can see, this situation needs to be handled by both people working on the applications and those maintaining the infra.

We decided to use Consul as a key-value store for configuration management. Because besides a key-value store, Consul can also be used for service discovery.

Structuring the configurations

Our configurations are a huge mountain of variables. In most cases, only a few values would get added or changed in a particular staging environment. We provided defaults for all environments, while also writing simple wrappers for each language on top of Consul API.

We used a directory structure for solving the problem of namespacing and implemented these standards in the Consul wrapper library so that services did not have to worry about internal details. Given a particular service foo, for the staging environment, using the master branch, the configuration’s home would look like the following:

/foo/staging/master

And also at the same time, for environments that are spawned on demand, a random string can be used to name the configuration like below:

/foo/staging/83y7fh39c

For more information, look at our wrapper in Python for facilitating these implementations that supports overriding values with environment variables.

One command deployments

Deployment can vastly be simplified/standardised for microservices using Kubernetes. Kubernetes can be daunting and it is not a PaaS solution out of the box.

Quite a lot of things need to be set up before we can start hosting production apps on it - like ingress setup, decide how you’re going to deploy the application, configure logging for the application, monitoring of the application/cluster, etc. We will focus on actually making/running deployments

We adopted a helm chart for deployment config, but why? As soon as I started defining a project in plain yaml, I realized we needed a templating engine that can do variable assignment, conditions, looping, etc. Helm also provides a standard structure to the deployment package and takes care of how it should be organised. We created a charts subfolder in every service and put a deployment config there, this gave us version-controlled deployment scripts which we could bundle with the app (awesome, right?!).

To deploy, we still had to make an image of the app, push it to a container-image repo, and then trigger helm upgrades with this configuration. We added skaffold to our tool set for this. Skaffold builds a docker image of the app, and then to deploy, runs the following command:

kubectl apply /helm upgrade

To deploy an app, all we had to do was run this:

skaffold run -f skaffold-production.yml

Better interface for the team

What we had still did not provide a seamless experience, since we had to run completely different sets of commands for checking logs, restarting/config-reload of application, etc

I did not see this problem coming, but my manager pointed this out. This setup required that all team members working on various parts of the app (even just the frontend), were now required to know various commands to successfully deploy an app to staging. He suggested that we write a wrapper script that gives us simple commands for doing common tasks. We used sub to build a tool called ji. The mono-repo helped us to achieve this without much hassle as code for all services live in same repo. Owing to this, ji knows the relative path of these codebases. The commands have a below structure:

ji <action> <environment> <services> <prefix_string>

To deploy foo app,

we would run the following:

ji up staging foo TD-1366

If we have to deploy another instance of the same service, we would only have to change the namespace mentioned.

Internally, when ji up is run to deploy an application:

Some more utility commands we built

# Attach to container’s terminal for TD-1366-foo
ji attach staging foo TD-1366

# Tail logs of all pods for foo app in TD-1366 namespace
ji logs staging foo TD-1366

# Get new pods for TD-1366-foo
ji restart staging foo TD-1366

# Print the current status
ji info staging foo TD-1366

# Kill the foo app under the namespace TD-1366
ji down staging foo TD-1366

Source code for ji can be found on GitHub at synup/ji.