Become a Fullstack JavaScript Developer, Part 6: The Monorepo

Monorepo has many benefits, many companies are adopting it but the decision to use it or not is quite tricky. After this post you’ll have a deep look at monorepo movement and whether you should follow it.



Monorepo is a source code management strategy when you put all kind of projects (service/app/library) into a single repository. Adoption degree varies from all projects of a company to product related projects only.

Polyrepo is a traditional way of organizing code when each project has its own repository. We assume this is the standard because everyone using it: well supported by current public tools, fine-grained access control, loosely coupled, code ownership, better integrated with third-party services, etc.

Monorepo is defined as monolithic repository but don’t confuse with bad reputation monolithic application architecture. Monorepo works well with microservices deployment strategy.

Motivation

Many tech giants are using monorepo like Google, Facebook, Microsoft, Twitter, you name a few. One thing in common is that they have to invest a lot resources on batching source control system and building tools. Read more on Scaling Mercurial at Facebook, Microsoft adds Virtual File System for Git or Why Google Stores Billions of Lines of Code in a Single Repository.

In JavaScript world we see many open source projects (Jest, Babel, storybook, create-react-app, etc.) are using monorepo with the help of Lerna and Yarn workspaces.

I’ve been using monorepo for 2 years and loved it. Before that I worked in companies at all sizes (5+, 100+, 1000+) which used polyrepo, the experiences were mixed. Things went well when I worked on isolated projects but soon became ugly on high collaborating projects. Many teams tended to keep the code for their own sake, asking for access permissions was pain in the neck!

Convinced yet? Let’s find out what make monorepo great and what hold them back.

Benefits

Following benefits come with huge assumptions that your monorepo size is compatible with your accessible tooling system (version control system, build tools, discovery tools, etc) to provide reasonable developer experiences in term of time and space.

Single source of truth: I found this is the most important benefit with the help of strong CI/CD pipeline. The release was built from source of the same code state, go through all kind of tests and then released to public. In polyrepo companies, I found DevOps engineers keep complex lists of compatible build versions to deploy, the integration tests soon become intrusted.

Easy mass refactoring: This is true to monorepo at all sizes. As solo developer, you are free to refactor everything. As big company, repository maintainers are responsible to keep code in heathy status.

Better code sharing: This is true to company who has open collaborative culture with very few private restricted projects, everyone has access to almost everything, no questions asked. Easy to reuse shared libraries, copy similar solutions and learn others’ coding styles. This ends up lower code duplication and faster to bootstrap new projects.

Drawbacks

High tooling investment: Git or Mercurial have their limitation on big scale repository, commands will take really long time on insane number of files. Tech giants who adopted monorepo had to put a lot of resources on enhancing tools to support their very large scale repository (millions files, billions lines of code, thousands of commits per day, thousands of collaborating users, hundreds of GBs or TBs in size, etc):

Google build their own source control system Piper, build tool Bazel, client code collaboration CitC and many other tools to support code searching, discovery, merging, testing, etc. Facebook patched Mercurial, created build tool Buck. Microsoft added Virtual File System for Git.

High code complexity: Touching the company’s huge repository sounds intimidating initially. You need to setup many guidelines and policies to keep everyone in the same page.

High cost to maintain code health: Monorepo is also nothing without strong CI/CD pipeline, all kind of tests take long time to finish before a merge is approved, a build is created or artifacts are released.

Adoption

Monorepo is not for everyone, it definitely provides many benefits with the right setup but also bring to your team with unpredictable complexity.

Indie maker: Easy decision here, adopting monorepo per product or all products will boost your productivity to the next level. Reversing cost is quite low so you don’t have to think hard here. I found Lerna is awesome to JavaScript Fullstack Developer.

Small team: Also easy to adopt when Git can easily handle your code size, need to invest a bit on CI/CD pipeline and contribution policy to keep everything going well.

Medium & large team: Very tough and brave decision here, there’s is no going back when you already invested a lot resources on tooling system. Think hard bout company culture, percentage of private projects, are you collaborating with outside clients, can Git handle your predicting repository size or you have to use a patched scalable version control system, do you have resources on maintaining code complexity and health, etc.

Key Takeaways

Think really hard when you’re the one to make decision. The cost is very high if you want to move to monorepo or switch back to polyrepo, it can create an annoying moral in your team, you also need to establish a detailed contribution policy and new member onboarding.

What if your team using monorepo? If you already know monorepo and embrace it, that’s awesome. If you are new to this please keep calm and learn bout it a little bit before going mad and blame the stupid things against the standard polyrepo.

Monorepo has a bright future alongside microservices. I found it a solid working experience when maintain a monorepo, run though decent CI/CD pipeline and deploy tested artifacts in a microservices system.