In a long lived and complex application, there often comes a time when some early decisions end up becoming constraints that are difficult or impossible to overcome. At some point in the lifespan of the software, you may be faced with a choice — do the Big Refactor, or find ways to work around the problems. More often than not, the latter route is taken. When you have a lot of immediate priorities, it’s difficult to justify spending a lot of time on a big and potentially risky project.
Sometimes however, the benefits of doing the big refactor greatly outweigh the drawbacks. It takes a cost-benefit analysis specific to your app to determine if that is the case. And recently at GameChanger, the big refactor was worth it, so we did it!
This blog post is about how we at GameChanger successfully executed a huge overhaul to our data model. We removed a foundational database entity, person
, upon which almost every part of our system relied in some way, and replaced it with new entities which enabled the user experience that we knew our customers expected.
We’ll take a look at what user-facing problem we were addressing and the high level adjustment to our data model. We’ll also talk about how we approached such a large project tactically, some of the interesting technical problems we solved along the way, and the benefits we’re reaping a couple months out from shipping.
The Problem 🧩
For a long time, one of our most common complaints from users of Team Manager was the complexity around who can edit players and what information they could edit. The root of the complexity was this: a player could be created without an email address, but, once a player was attached to an email address, a user could not edit that player’s information, like first name or last name, independently of another team. A player’s name and their relationships to other users would carry over from team to team. And that meant in order to avoid situations where a coach on one team could make edits to a player which would be disruptive to another team, we pretty much disabled editing of player information entirely after the player was attached to an email address.
From the launch of Team Manager in 2017 to summer in 2020 when we first started seriously considering doing this project, we received thousands of CX cases that had something to do with this issue. In addition to these CX cases, we had evidence from conducting user tests that there was confusion around how editing players worked. Clearly, this user experience was unbecoming of a best-in-class team management app!
The technical reason for this user experience limitation was relatively simple - a single record in a person
table was the source of truth around player information across all teams. In essence, players were in the “global” scope. So the problem at hand was to adjust our data model so that player information could be scoped to individual teams, allowing coaches to freely edit players without worrying about affecting that player’s information on another team.
The original motivation for building the data model in this way was to support building career stats for players. The idea was that a player’s info could be carried over from team to team, allowing us to easily create a view of a player’s youth sports journey. However, we found that the old model wasn’t actually cleanly supporting this use case. Notably, we had issues with duplicate players. So, we were paying the cost for the complexity of this model, but not getting the benefits we wanted without doing some extra work on our previous data model.
One of our values at GameChanger is to put the customer first, and in this context, we thought putting the customer first meant fixing this UX problem now, despite the potential downside for career stats being tricker to support in the future.
So, the big technical lift was to fix our data model. One major challenge with that was that the concept of a person
was deeply embedded into all three of our clients (iOS, Android, and web) as well as the API powering them.
Thousands upon thousands of lines of code dealing with person
. Not to mention, a busy baseball season soon approaching. How will we update our client apps with the new data model, and cleanly deprecate app versions on the old model? Will there have to be downtime? How will we safely and accurately migrate person
data to new tables that we may create? Is it worth doing this project so close to a busy season? The answer to the latter question we determined was yes – we wanted as many users on the new data model as soon as possible. Why? We expected that Spring 2021 would bring an influx of new Team Manager users, as well as users migrating from the legacy GameChanger app – and the less users that had to have their data migrated from the old model to the new model the better.
Whatever solution we came up with, it was clear the level of effort required to implement it would be immense.
But, this was no deterrence for us on the Athlete Experience team at GameChanger. One of the great things about working here is that teams are empowered to solve problems that are important within our mission - and we understood that fixing this user experience problem would offer long term benefits to our users and our product. And so, we got to work!
Research 🔬
We’d identified at a high level the problem, the ideal solution, and some alternative solutions. The next thing to do was to enumerate the work in detail, write a proposal, and disseminate it for feedback.
Our proposal had to cover a lot of ground. Here are some of the things our proposal discussed:
- What is the dependency graph between the
person
table and the rest of our data model? What high level concepts in our API and in our clients are involved in making ourperson
model work? - What new tables will be created? Which existing tables will require migrations? What new high level concepts will we have and how will the old concepts map to them?
- Exactly how will iOS, Android, and web apps be migrated to use the new data model?
- We use Realm on our mobile apps - so what Realm tables will need migrations?
- How will each high level feature that we support be affected by this data model change? How will it be migrated to the new data model?
- How will we migrate data from the person table to other tables (we’ll explain this in a bit more detail later)
- What are the alternative solutions?
Here’s a brief summary of the data model that we had, and the new one we proposed:
Previously:
- A person stored someone’s name, phone number, and linked them to their user account if they had one.
- A player linked a person and a team
- Associations between a user and a player were modeled as links between person IDs
This should illustrate the “global” player issue described earlier. If a player is on multiple teams and has a user account, that player’s name and relationship information comes from the record in the person
table. There was no straightforward way in our previous model to scope that to a single team.
Currently:
- A player stores a link to a team, and optionally a user. It also stores a name and a phone number.
- Associations between users and players are stored as links between user IDs and player IDs.
This change allows players to have different relationships and different names per team. We completely remove the concept of a person
.
We should note, above are the high level changes to show how we adjusted our data model to accommodate the user experience we desired. But, omitted are many auxiliary changes and other important details that were needed to transition our data model away from person
.
A Detour - Data Mirroring 👯♂️
One interesting problem outlined in our proposal was how to populate our new tables with data. For existing data, we could run a script that would migrate all of it to the new data model. However, for data currently being written into the system, we needed a way to migrate that data immediately. Blocking writes in order to migrate data was definitely not an option – we wanted the system to keep humming along and have data from the old models migrated to the new models in as close to real time as possible.
We referred to the solution that we came up with as “data mirroring”, and it would help ensure that we lost no data while migrating over to the new data model. Its function was essentially to update our new tables every time the old tables changed. Architecturally, this was implemented as a service in our Node.js API.
We proposed two ways in which data mirroring could occur:
- Inline mirroring: update the new tables within the same transaction that the old tables are updated.
- Queued mirroring: update the new tables after the transaction where the old tables are updated. We put the work of updating the new tables on a queue.
You may be wondering, when each of these would be used and why. The purpose of these two ways of mirroring data was to ensure data consistency.
Imagine we need to update a person’s name. In the old data model, this is an update to the person table. In the new data model, this might be an update to a player, or an update to a user. If we only used inline data mirroring, then one possible scenario would be a person’s name changed at the same time that the person was added as a player. If the transactions run in parallel, then the new player might be created with the old name, since each transaction works on a view of the database after the last transaction. We could use postgres’ SERIALIZABLE mode, but that is a non-starter for us due to various unrelated reasons (though let us know if you want to hear more about our experience working with Postgres). We address this scenario by queuing the task to update a person’s name, and then queuing the task to add that person as a player. The order in which these execute does not matter, both will see the database after the relevant changes are applied, and the data is consistent.
Most data mirroring operations would occur as transactional operations, and queued mirroring primarily served the purpose of fixing issues that would arise from transactions happening in parallel.
Execution 🔧
After we identified what had to be done, we broke up our work into roughly 7 phases, with a point of no return that we referred to as “cutover”.
Phase 1: Create new tables, write new API endpoints, implement data mirroring.
Phase 2: Update our sync system to work with our new data model. Sync is our mechanism for keeping client devices up to date with the server - read more about our implementation here.
Phase 3: Build new UI on iOS and Android to accommodate the new data model.
Phase 4: Migrate various features on the server and mobile clients to work with our new data model.
Phase 5: Update the web client to work with the new data model
Phase 6: Data & Analytics changes
— Cutover —
Phase 7: Cleanup, take on various work we punted on.
Cutover ⚠️
Cutover was the point at which we would stop writing to the old models, and begin reading and writing to the new models. This was a point of no return because data mirroring was essentially a one-way operation – once they diverged, it would be tricky, though not impossible, to get back to a place where they were 1 to 1. For all intents and purposes though, we treated it as a point of no return, and so we treated the execution of cutover as a particularly sensitive part in the process of migrating to our new data model.
We did a number of things to make sure the execution of cutover would be successful:
- We created a runbook for the day of cutover.
- We did it at a low traffic time.
- We assigned roles for each person involved in executing cutover.
- A scribe - someone to capture what was going on during the video call and write it into Slack
- A monitor - someone to monitor system behavior to make sure the apps and the API were performing properly. We had 2 people in the monitor role, one person for the API, and one person for the mobile apps.
- A console - someone to perform various development tasks. We had two consoles, someone to run scripts, and someone to flip feature tags.
- We did cutover on staging a week prior to doing it on production as a practice run.
All the planning and preparation paid off, as we were able to execute cutover with no major issues and no data loss.
Looking Back 🌇
At the time of writing, we’re a few months out from cutting over our API and our clients to use the new data model exclusively. The project has been a resounding success – our app now works the way our users want it to, which was the aim of this project.
In addition to eliminating the CX cases in our queue that deal with the complexity around editing players, we’ve also managed to reduce the amount of requests that our sync system has to send when players are updated, since we can send syncs only to the team affected by a change to a player, instead of to all teams where the player has an association.
This work would not have been possible if it were not for the amazing people that we have at GameChanger. Every team at the company played some role in making this project a success, but a very special shout to Eliran Ben-Ezra, Peter Gelsomino, Abhinav Suraiya, Ami Kumar, Adam Tecle, Israel Flores, Dane Menten-Weil, Janell Eichelberger, Wai-Jee Ho, and Leah Giacalone.