Dr. Parselmouth or: How I Learned to Stop Worrying and Love Integrating with External API Services

November 11th, 2015 by Justin

This is the story of how Chartbeat struggled for months integrating with a complex and sometimes incomprehensible external API service and eventually brought sanity to that notoriously difficult service. Then, we open sourced the whole thing.

Chartbeat helps leading media companies around the world understand, measure, and monetize the time that audiences actively spend with their sites’ online content. Since we are really good at tracking reader engagement on web pages, we had the bright idea to direct our tech at measuring ads as well. To do this we needed to understand the ad service of choice for most of our clients, Google DoubleClick for Publishers (DFP). Luckily for us, Google provides an extensive API suite to interface with their DoubleClick service. Unluckily for us, integrating with this service in a production environment is complicated and difficult to maintain.

We started by constructing a thin wrapper over Google’s python package googleads. This was great when we needed to pull basic information from DFP, such as names or ID’s of ad campaigns. However, as we wanted to integrate with some of DFP’s more complex features, our continuous iteration on this initial wrapper resulted in numerous hacks and tons of unmaintainable code. Soon after, Google announced that the DFP API version we had adapted all of our code to was about to be deprecated. Translation: all of the production systems that depended on DFP were going to break.

Pressured by time, we scrambled to integrate with the newer API version and ended up inadvertently making our integration even shakier than it was before. Our DFP integration survived the version upgrade, but we had gotten ourselves into a state where further upgrades and feature requests would be hard to manage and unreliable to service.

Up until this point, we were only reading information like campaign names, creative images, and delivery percentages from our client’s DFP accounts. So, even though our DFP integration was unstable, there was very little risk of anything catastrophic happening. The worst case scenario was failing to read an up-to-date number.

Therefore, when we made the decision to actually start writing to and editing our client’s DFP accounts, we had to rethink our strategy. DFP serves as the main revenue source for most of our clients.

Now, our worst case scenario for a failure with our DFP integration elevated from mild inconvenience to putting our clients out of business.

Clearly, we needed something more stable and supportable if we wanted to support writing to our client’s DFP accounts. To move forward, we needed to start over and build something that could reliably translate the evil intricacies of ad tech into something comprehensible: a Parselmouth. For those of you that don’t follow the lore of the Harry Potter universe, a Parselmouth is a person who can communicate with snakes.

Parselmouth was built by Paul Kiernan and myself as a universal translator for external API services from different ad service providers like DFP, DFA, and OpenX with the goal of abstracting away all the ugly insides of these systems.

When designing the project, there were a few primary challenges we wanted to overcome. First of all, Parselmouth should be as easy to upgrade and maintain as possible. We did not want to find ourselves in a situation where we had to scramble to upgrade out of a deprecated version again. Next, we wanted to make sure that the usage of Parselmouth was consistent with the rest of our codebase. Previously, we worked with DFP responses as-is, and these were often inconsistent with our general coding style. Finally, Parselmouth had to deal with the arbitrarily complex functionality of ad servers which involve tons of use-cases, edge-cases, and custom data integrations. In other words, we needed our system to manage the fact that ad tech is….well, it’s not great.


Maintainability and Consistency

The first and most obvious step toward maintainability was to ensure that there was very good unit test coverage in the project. When integrating with an external service, however, coverage isn’t really enough because it is important to validate that the responses you receive from this service are consistent with what you expect. For this reason, we made a suite of integration tests. For each function available in Parselmouth, we have a corresponding integration test that ensures that the response from the API service is what we expect. Therefore, when one of these functions breaks, it allows us to quickly identify exactly what feature has changed within the ad server API.

For example, in our original integration all responses were returned as dictionaries. Then, during a version upgrade, Google updated their python API client to return so-called SUDS objects. Naturally, this broke all of our integration tests, and quickly indicated that we needed to update our methods for deserializing the API responses.

We have also found that Google likes to update the names and nested structures of the responses in their API’s. For example, the name of the impression goal structure within a campaign object changed from “goal” to “primaryGoal” in one API version upgrade, and in this case our get_line_item() integration test quickly discovers such a change.

To give you an idea of the kind of thing you get from DFP, here is what a raw line item looks like straight from their API:

Yeah… that’s a ton of information. This is why we wanted to carefully ingest all of this data into a python class called LineItem with naturally accessible fields.

This means that we translate the varied responses from the ad service into a consistent pythonic class which is accessible to the user in a way that doesn’t require the user to know about the intricacies of DFP.


The Beast of Ad Tech

All of this doesn’t get you anywhere in dealing with the fact that ad tech is a complicated beast. In particular, one of the more painful aspects of ad tech comes from the desire to specify how an ad campaign should be targeted. For example, a brand might decide they want to target women in their twenties in Europe on mobile devices. And to represent this kind of arbitrary open-ended targeting, the ad server has to support arbitrary logical structure. This example could be represented as something like this:

But you also have to support more complicated and varied structure like
targeting women in Europe or men in North America, unless they are in Canada on a mobile device.

To tackle this particular problem, we designed a class into Parselmouth called TargetingCriterion which supports arbitrary logical structures like the ones illustrated above.

Many of our other challenges revolved around dealing with the details of the DFP API. For example, if you want to get historical data about line items within DFP, you have to download a report as a gziped csv. This meant we had to code a function which waits for the report to generate, downloads the gziped file, unzips, parses this text file, then formats this data into a list of Parselmouth LineItem objects.

In other cases, the difficulties of integrating with DFP came from the unpredictability of the responses. For example, consider the case of technology targeting within DFP, which allows you to target specific devices, operating systems, or browsers.

An example technology target which targets a specific browser language and device might look something like the following:

There are two key things to focus on with this response. First, notice that a pluralized version of the targeting type is included before the list of items to be targeted. Second, that this is done in proper English. This means that for words that end in “y”, the variable name changes a “y” to an “ies”, and in the other case, an “s” is added to the end. While this reads nicely in English, this is a nightmare for a programmer. But there is also inconsistency in how the responses are structured. For deviceCategory the word targeted prepends the plural name, while it doesn’t for browserLanguage. These kinds of inconsistencies create an additional challenge when working with DFP API’s. Things would have been much easier if the format had been more predictable like this:

For this example we had to laboriously create a map to parse out the meaning of each of these strings.

Despite these frequent snags in developing our DFP integration, an effective strategy emerged for dealing with these issues within Parselmouth. A strategy which we affectionately refer to as “focused rage.”

“Focused rage” involves jamming all of the horrible bits and pieces of the API integration into a compact independent part of the code.

We designed adapters that fall into the “bridge” programming pattern to hold all of the nasty bits of translation code. So when you initialize a connection with Parselmouth, you are connected via the appropriate adapter, and need never know about the dark secrets that live beneath the hood.


Open Sourcing

Since building Parselmouth, we have added new features and upgraded the underlying DFP API version relatively seamlessly. With this internal success and the knowledge that others also struggle to integrate with DFP and other ad providers, we decided to make our work available to the public. If you are a developer who has had some of the same struggles as us, please let us know, or better yet, open up a pull request against Parselmouth, and add your own features! Check out Parselmouth here and here.