Most people don’t think about their garbage all that much. It goes in a bag or a bin. That goes on the street, it gets taken away. Out of sight, out of mind! But me, I think about garbage a lot. Not the old banana peels or spent coffee grounds you throw away, but digital garbage. My name’s Gavy Aggarwal, and I’m a software engineer on the Client Platform team at Front.
Because of Front’s shared nature, users can have access to thousands of inboxes which can each contain hundreds of thousands of conversations which ends up equating to terabytes of data. To ensure a great user experience and allow our users to browse quickly, we want to keep a cache of some of this data locally. This poses a new challenge. How do make sure that this local cache remains internally-consistent without growing out of control?
High Memory Footprint
Before we discuss this project, I think it’s important to answer the question: why do we even need a garbage collector? Garbage collection is a mostly-solved problem and should be abstracted away and handled by the programming language we are using. However, our needs are different at Front and to understand that, we must understand our front-end architecture. Powering our React web and React Native mobile apps is a Redux data layer, and as we use Front, we end up fetching conversations, messages, and comments, as well as other metadata for inboxes, channels, contacts, and tags. Whenever we fetch this data, we write it to our Redux data layer, which causes our memory footprint to grow infinitely throughout the user session.
Obviously, this is not good and makes our customers unhappy. Some of them have environments that are fairly restricted in memory, and Front would take away resources for the other apps they need to use. Even those who have ample memory will eventually experience app crashes if we keep hogging memory. This is not an experience we want to provide at Front.
Our Slow Garbage Collector🐢 🐌
Great. We have a garbage collector. Problem solved? Yes, at first. While this garbage collector worked great initially, we have grown significantly since we implemented that, and now our largest customers have thousands of teammates, inboxes, and channels. This means that our garbage collector has to scan through tens of thousands of resources, and this could take several seconds. Our garbage collector runs in one task, so the app is completely frozen. This is also not an experience we want to provide at Front, and it would only get worse and worse as our customers grow.
We’ve been able to make some optimizations, such as using mutable data structures, which has cut down our runtime by a lot. But this still wasn’t enough to solve the problem. That’s why we did a complete rewrite of our garbage collector, and I’ll discuss that rewrite today. 🙂
An Asynchronous Garbage Collector
The key difference in our new garbage collector is that it doesn’t need to run in one task. Instead, it saves its progress and resumes scanning for unused resources later. This means that we can split up the computation into small steps that take tens of milliseconds each and schedule them to run when the browser is idle. Now, the garbage collector chips away between user interactions and the user does not experience any slowness.
However, this introduces another edge case. What if the resources change while we are running the garbage collector and need to be retained? We had to handle this case by detecting resources that are added or changed while the garbage collector is running and rescanning those changes to make sure we don’t purge anything our app depends on.
Congrats! We have this shiny new garbage collector. How can we be sure that it works correctly and won’t introduce any regressions in the app? To be sure, we started a test and silently ran the new garbage collector alongside our old one for a subset of our users. We could then compare the output and address any bugs and inconsistencies if the outputs were different. Finally, when we were confident that the new garbage collector was working properly, we had a smooth migration over to it with no issues.
Performance Wins 🏎 🐎
To evaluate the success of our garbage collector, we measured the P95 blocking time, which was defined as the maximum time user interaction was blocked due to garbage collection on the web or desktop apps.
Clearly, this has led to a big reduction in blocking duration, and while we like to keep user blocking tasks under 100ms, this is a big step forward.
This project solved the biggest bottleneck in garbage collection and improves our user experience — especially for some of our biggest customers. 581ms is still far away from 100ms and the bulk of this time is spent during the phase when we purge the resources and re-render our app, recomputing our reselect selectors along the way. Improving this is next. 😉
While this project only rolled out the new garbage collector on our desktop and web apps, we plan on also porting it over to mobile. Since mobile is even more resource constrained than desktop, we hope that we’ll have at least as much of a win there. 📱
Join our team
This project wouldn’t be what it is without the contributions of the whole team. Our EPD team is constantly working on challenging and exciting projects, and we’re always on the lookout for talented engineers! Take a look at our job board to see our open positions.
Written by Gavy Aggarwal
Originally Published: 21 December 2021