WADE, Our New Best Friend

March 11th, 2016 by Wes

We’re pleased to announce WADE, a little database project of ours. You can get all of the interesting detail from the git repo, but I’ll go into a bit here first, because what’s an engineering blog if there’s no engineering?

At Chartbeat, we’ve found ourselves often times wrestling with database scale problems when something we wanted to do with high throughput didn’t quite match the database’s data model. Usually in these cases, we fallback on a read-write-update cycle, which can kill performance and suffer from race conditions.

One annoying as hell example has to do with us maintaining a map of URLs to what we call canonical paths. You’ll often times see in your browser’s URL bar a path that has a bunch of query parameters that have no relation to the content of the page, such as utm tags or other tracking codes. When we send our data pings, we include all of those parameters, but on the server side we have to make some attempt to strip out the parts of the URL that aren’t relevant, otherwise we run the risk of not counting up all the metrics for a page.

A super simple example might be these two URLs:

We can’t just take the URLs at their face value and assign them each one page view. Rather, we want to strip off the “?referrer=twitter” and “?referrer=facebook” part and count two page views for /squirrels-are-barking-mad. The problem is compounded by the lack of a standard set of tracking codes used by publishers, so we can’t assume we’ll always know how to sanitize the data.

One solution is for our client to customize the ping so that it includes a canonical path of /squirrels-are-barking-mad, or sets an og:url meta tag within the page. That’s all well and good, but it assumes that the client actually implements this correctly, that the browser doesn’t mess something up in transit and that nobody is trying to spoof our client’s data. The internet is a messy place, so we regularly see multiple conflicting canonical paths for the same source URL.

What we do then is keep a tally of what mappings we’ve seen so far, and our system in essence votes for what the canonical path should be for any given URL.

Simple enough, but how do we do this at scale? We handle tens of thousands of such pings per second, and need to be able to vote with very low latency and high throughput. A simple way to achieve this at reasonable cost would be to use memcache or Redis as the backing store, and simply set the key to the source URL, and the value to a map of canonical path to counts, where counts are the number of times a particular source URL mapped to that canonical path. So one entry might be something like:

Meaning, we’ve seen /squirrels-are-barking-mad?referrer=twitter map to /squirrels-are-barking-mad 12,382 times, and to /squirrels-are-cute once.

With memcache, adding a count follows a read-write-update cycle:

  1. Read the opaque value at /squirrels-are-barking-mad?referrer=twitter.
  2. Deserialize it to a dict, call it “counts”.
  3. Add 1 to the value at counts[‘/squirrels-are-barking-mad’].
  4. Serialize and write the values back to /squirrels-are-barking-mad?referrer=twitter.

To do this, the client has to pull down data from the database, do the deserialization mojo, do some serialization mojo, then write it back, so there’s a full roundtrip. In addition, there’s a race between steps #1 and #4 if two clients are simultaneously trying to update the same URL on a backing store that doesn’t support transactions.

While we’re applying a continuous stream of counts, we may get asked what the canonical path is for some source URL. To figure this out, the client has to pull down and deserialize the data for a source URL, then look at the tallies and vote on which canonical path has the most weight. In this example, the a voting query would return /squirrels-are-barking-mad, since it’s clearly the correct one.

How does WADE improve on this? A database is essentially a chunk of data that transitions from state to state, where the transitions are insert/update commands. WADE is a replicated state machine which has no opinions on the structure of its state. This is entirely programmer defined, as are the state transition functions, which we call mutating operations. A WADE cluster keeps track of object state and pushes mutating operations to all of the replicas in a consistent way using the chain replication algorithm. Because operations are fully programmable, we’re able to eliminate read-write-update cycles by defining application specific transitions.

Let’s first set aside all the nice replication benefits WADE gives us for free and look just at the update operation and how the programmer defines mutating operations. We’d call the /squirrels-are-barking-mad?referrer=twitter entry an object, and define state transitions on this object. One such state transition would be an update command that adds one to a candidate canonical path. I’ve simplified the code for the purposes of this post, but it’s only marginally more complex than:

This function would get deployed to the WADE nodes, much like custom written views are deployed with a Django (or any web framework) app to a web server. Then a client connecting to the WADE cluster would issue a single update command:

We don’t have to do a full round trip, and we only send the data necessary to increment the counter.

The vote would simply be another customized WADE operation, but this time a non-mutating operation, ie a query.

The vote function would ship with WADE to the nodes in the same way as the mutating operations and run on the nodes themselves. The client would issue a vote operation, and the cluster would return only the top path, eliminating the need to transfer the entire serialized object and run the logic in the client.

There are consistency benefits as well. WADE serializes update operations and has strong consistency guarantees (at the risk of partial unavailability), so race conditions don’t exist.

The URL tracking example is one simple application that motivated the design of the system, but another is a probabilistic set cardinality (HyperLogLog) service that powers our Engaged Headline Testing product. At the moment our HLL service is built on top of Riak but suffers from high network overhead and chatter between nodes. Each HLL structure is around 10k in size, and even a small bit flipping operation requires downloading the entire serialized data. When you throw in the possibility of siblings, what we’re left with is a system that scales beautifully horizontally, but is not as resource efficient as it could be. We have yet to move our HLL service to WADE, but initial performance tests show significant improvement over our current Riak cluster. We’ll be open sourcing a HLL service on WADE in the near future.

Another nice property of WADE: the core code is simple. While I can’t really argue that thinking about distributed systems is easy, we’ve reduced the surface area of misunderstanding by enforcing a rule that the core code will always be fewer than 1,000 lines. The effect of this is that WADE doesn’t optimally handle a lot of edge cases, though (bugs notwithstanding) it maintains correctness. Any node failure might cause the cluster to be unavailable for longer than an industrial strength database might, but it’s still within acceptable limits for production use.

Also, we leave a lot of details up to the programmer. While WADE will handle replication happily, it needs to be told what nodes to replicate to. There are nuanced reasons why we might want to leave this up to the programmer that I may go into in another blog post.

There is some missing functionality (fast syncing, as described in the repo’s README, as well as a useful generic overlord), but we’ll fill those in over time as we improve our understanding of operational issues.

In the meantime, please take a look at WADE and play with the the in-memory kv store that ships with it. This is still alpha quality software, but we’re optimistic it’ll see varied use within our systems in important and high scale areas. We would love to hear feedback, and of course would be happy to accept pull requests. Just don’t bust our 1,000 LOC quota.