At Boomtrain, we’re always trying out new things. New ideas and research can be fun, but there needs to be a way to determine which of them have value, and which are just fun. This is where experimentation comes in.
Experimentation allows us to have objective assessments of many questions we have on a daily basis, such as:
- Does this new model produce higher engagement than the current one?
- What value of a parameter x results in better recommendations?
- If we increase a parameter x, does that increase an email’s CTOR?
Because we run many experiments for many different customers, the two most important things we need are to be able to run randomized field experiments in production, and to be able to run these at scale.
For randomized allocation to be scalable it needs the following properties: 1) it should be distributable: eg. there shouldn’t be a dependency on single process to hand out bucket allocations; and 2) it should be consistent: eg. the same user should get the same allocation no matter where and when we ask for it. The simple answer to this is to use a hash-based allocation. For a given number of buckets , the bucket number is given by:
There are a number of other requirements we had, for an experimentation system to serve our needs:
- Advanced experiment types, including the ability to perform multi-factor and fractional experiments
- Centralized configuration, so all experiments on a customer can be updated simultaneously with ease, and all experiments get updated and deployed in the same manner.
- Analysis decoupled from the mechanics of allocation
- Minimal impact on production code
With these in mind, we built the current Boomtrain experimentation system: a highly-available, distributed experiment service, and a flexible analysis processor. These are tied together using our Kafka-based event pipeline. This system allows us to perform many randomized field experiments, simultaneously, for many different customers.
The way this system works is that, for each user request coming in, we do the following:
The advantages of this architecture are that we can easily update experiments centrally and push them to the experiment service machines, the experiment service itself (due to the hash-based allocation) can be scaled horizontally to meet increased demand, and, as the applications are simply generically requesting a set of parameters from the experiment service, there is no experiment-specific code in our production systems.
In addition, the architecture decouples analysis from how the experiment is conducted, allowing us to independently develop analysis techniques from experimental design and infrastructure. The experiment service is built around Planout4J, which enables sophisticated experiment definitions, and we use Bayesian methods for our analyses, which are better suited to the kinds of incremental and sequential data we get from our event streams.
Looking forward to the future, it’s not hard to imagine how this setup can be extended to remove the human from the analysis step and update the configuration directly using a Bayesian multi-armed bandit approach or other optimization strategy. These enhancements can be built into the experiment service, with very little impact on the production systems.
Online experiments are only one part of building a system that supports cheap and rapid experimentation and we are continuing to expand and improve towards that goal. They do, however, provide an objective way to assess our ideas, a way that is rooted in science, and leaves us free to try out new and fun ideas, which is what makes life at Boomtrain so interesting.