How we automate testing at Front

Laurent Perrin

Laurent Perrin,

CTO & Co-founder at Front

9 December 20210 min read

CTO Laurent Perrin shares some of the solutions the Front engineering team uses to automate testing while still allowing our team to ship updates quickly.

This is the second post in a series where we share how we build Front every day. Read the first post: How we ship code at Front.

I’ve written before how Front requires frequent experimentation, even at our scale, because the problem space is so vast. Like many software companies, we rely on automated tests to ensure we launch features smoothly. And like many software companies, our automated testing is unique and complex. In this article, I’ll dive into the weeds to show you some of the solutions we use to automate testing while still allowing our team to ship updates quickly.

A few words on Front’s architecture

Front uses a distributed architecture of about 150 services connected by message queues. While services do not communicate directly, they can post asynchronous messages to each other. Each service consumes from a single queue. This means that each interaction results in a flurry of chatter between our services. 

For example, below is what happens when a user sends a comment that is mentioning a teammate.

This looks more complicated than it should be: surely, such a simple interaction shouldn’t result in so much work? Unfortunately, it has to be this way for several reasons:

  • Many external APIs: Front interacts with many services (you can connect Front with dozens of communication platforms). These services can fail or misbehave and need to be isolated from each other.

  • Many storage engines: Data can be browsed in many ways in Front. This forces us to use different storage engines that need to be kept in sync with each other. This also happens in independent services.

  • Many moving parts: Finally, Front has many features that interact with each other. This means that simple actions often have unintuitive side effects.

Here are some of the side effects that happen in the above example:

  • “Comment added” is a trigger in our rule engine so we need to evaluate rules.

  • Our analytics app depends on comments so it needs to be updated.

  • Our search engine needs to index the new comment.

  • Admins have access to an audit trail which keep track of low-level events.

  • The conversation will be reopened in the inboxes of teammates who subscribed to it, which affects unread indicators and counter badges.

  • The mobile app can receive push notifications.

  • Finally, Front is a real-time app, and we need to send websocket updates to clients who can see the new content.

There would be a lot more to discuss here, but no matter what, Front is a complex, distributed app. We need to have a good grasp of how services interact with each other. This is why, in addition to robust unit tests, we had to invest a lot in integration tests.

This often involves dozens of components spanning our entire stack. For example, we need a test like: “If Alice mentions Bob, does Bob’s client receive a single websocket event?” In production, this happens across several physical servers.

Relying so much on integration tests can be a problem: they can be slow, unreliable, and hard to troubleshoot. While saying that this hasn’t been an issue in our case wouldn’t be honest, we found a balance that works well for us.

Our developer experience

Before we get to our test infrastructure, let’s talk about how our engineers build Front every day. We built a framework to simplify our developer experience and then leveraged it to also serve as the backbone of our test suite.

The idea is simple: even though Front uses many external services in production (databases, but also services from AWS), you should be able to run our complete stack on a laptop by cloning our repo and running npm install, without any external dependencies, while being offline.

Reproducing our production environment accurately is not possible: we use many AWS services, and even in a minimalistic setup, this would require hundreds of processes.

In-memory connectors

We group all the code that interacts with the outside in a connector library. We provide two implementations for each connector:

  • A production version that uses a real storage engine

  • A development one that uses an in-memory implementation

For example, in production, we use SQS as our main message queue. Locally, we use a thin wrapper around async.queue, which stores queued messages in a simple JavaScript array.

We use similar solutions for all other external services like MySQL, memcached, Elasticsearch, redis, etc. On top of that, all our services live in the same monorepo. This allows us to run all of Front inside a single process, without any external dependencies or containers. A single process that runs multiple internal APIs, hundreds of services, are all connected with thin wrappers.

While these connectors are in-memory implementations of their real-world counterpart, they are not mocks: they support every operation that Front needs to run correctly. We support an abstraction of the original API. This makes it easier to maintain the in-memory version, and it allows us to control the subset that is safe to use at Front. For example, this is how a typical MySQL query looks like in our code:
  [’teammate_id’, ’reaction’],
  {resource_id:, resource_type: ’comment’},
  {sortBy: [’-id’]}

This is just meant to be a light abstraction that gets the job done. In production, this will emit a MySQL query. In our dev and test environments, we will run the query against a JavaScript array.

Related reading: 5 principles that guide our development at Front

Integration tests

The next step was to harness everything in our test suite. We built a system that can quickly start a desired set of services—and later destroy everything:

before(async () => {
  // Start all async services and our private API.
  // Then sign up a company, add 4 teammates and import several messages.
  await reset.restartWith([’processors’, ’api’, ’test_company’, ’test_team’,
after(async () => await reset.wait());

We built a simple HTTP client that can interact with our internal APIs. An added bonus is that this system largely eliminates the need for mocks: we can simply create real objects and expect them to behave exactly like they would in production. The test data is created by having a company sign up and then invite a handful of test teammates, using the same code as our users. Then, writing a test case becomes quite simple:

it(’should mention a teammate’, async () => {
  client.setIdentify(’[email protected]’);
  // Create a comment draft.
  await client.putAsync(
    {      text: ’@bob are you getting this?’,
  // Send it and expect a 201.
  await client.postAsync(
   type: ’comment’,
   comment: {uid: ’123’},

Testing asynchronous services

What’s interesting here is that all of the services were running: any side effect that was supposed to happen did happen. However, there’s a catch: because these services run asynchronously, there’s no guarantee that they’ll have finished running if you test your assertions immediately after the API completes. This is by design: it means that race conditions that happen in production will also happen in this test environment.

Because everything runs in a controlled environment, we were able to instrument our connectors. We can wait until everything has settled (until there are no more active tasks). We built a utility called traceWorker that does exactly this. Because we run in a controlled environment, we can even capture all the interactions between services. This is often how we’ll check that the scenario is correct:

it(’should mention a teammate’, async () => {
  // Wait until all tasks settle, and capture a subset of all internal chatter.
  const trace = await cmd._traceWorker(
    [’activity_created’, ’app_notification_received’],
    async () => {
       // run HTTP requests shown above.
    // Expect to create a low-level event, and notify the mentioned teammate.
      {name: ’activity_created’, meta: {emitter: ’api’}, data: {id: 10, updated: {type: ’mention’}}},
      {name: ’app_notification_received’, meta: {emitter: ’notifications’}, data: {recipient_id: 4}}

Because services typically have complex interactions, we will only capture a subset of interactions. In this case, we want to ensure that an activity (a low-level event) is created and that a notification is relayed to the intended teammate.

Speeding up with snapshots

Because we leverage the same framework in two environments, we’ve been able to think of improvements that benefit both.

A problem we had in our dev environment is that our backend would always start from an empty state. We solved it by introducing snapshots: each connector can dump its state in a JSON document. The backend can then use it as a starting point. We eventually built a suite of tools to browse the internal state of in-memory connectors:

We also use these snapshots in our test suite. Each test requires an initial state. Once this state is constructed once, we snapshot it so we can quickly reuse it if another test requests the same one. This has helped make our test suite efficient. We are able to run 13k integration tests in about 12 minutes—running in parallel across 4 containers.

Evolving our testing to build a better Front

Because our system relies on complex interactions, we’ve always invested a lot in our integration tests. This has made our test suite a bit unusual. It is by no means perfect—but this framework has allowed us to confidently scale Front and execute quickly to build a better product for our customers.

Interested in building Front with us? Check out our open roles

Written by Laurent Perrin

Originally Published: 9 December 2021

Stories that focus on building stronger customer relationships