Archive for November, 2015

Hackweek 31

November 25th, 2015 by Nathan Potter

What’s Hackweek Again?
Here at Chartbeat, we have a Hackweek every 7 weeks. Hackweek is a time to learn, explore, and just try something new. At the end of October, we had Hackweek 31 (that’s 31 x 7 = 7 months of hacking over 4 years). Lately we’ve been doing team hacks, which are 3 or more people doing a project together. This is a short writeup of just 4 of the projects that got done during our last Hackweek.


Team Gibson (Paul, Matt O, Anastasis)

Here’s a little known fact about AWS Redshift: it was previously known as the “Gibson” project at the Ellingson Mineral Company before that whole Hack the Planet / DaVinci virus scandal drove the company into the ground in 1995. Amazon acquired rights to the project in subsequent years and turned it into the data warehouse we know and love. To the disappointment of programmers everywhere, the product managers at Amazon decided somewhere along the way it would cut costs and ease adoption to remove the original graphical interface to the database and replace it with the Postgres-compatable one we have today.

Okay, not really. The Gibson was the supercomputer featured in the 1995 teen sci-fi thriller, and now cult-classic, Hackers. The movie energetically explored and sensationalized the early Hacker subculture and effectively inspired a whole generation of kids to become computer programmers. Myself (Paul Kiernan) and two other initiates of the hackerverse (Matt Owen and Anastasis Germanidis) came together over the past hackweek at Chartbeat to make the original Gibson interface a reality and perhaps even bring it to the Oculus Rift.


We began by investigating the Oculus Rift SDK and frameworks we could use to include an existing C++ project that emulated the Gibson interface. Unfortunately, it quickly became clear we would be unlikely to produce anything in a reasonable amount of time if we had to struggle with the lack of support for the SDK on OS X, learn how to use openframeworks, and refactor the existing project to include Redshift adapters and work generally with VR in c++. So we researched alternative platforms and settled on the widely accessible combination of WebVR and threejs.

WebVR is an experimental Javascript API that provides access to Virtual Reality devices like the Oculus Rift. Combined with threejs, a wonderful javascript framework around WebGL, we were able to create a Gibson-like interface to one of our Redshift clusters that runs entirely in the browser. The final product queries our cluster to get a list of tables and their sizes and paints them as buildings on a 3D landscape. Buildings are arranged randomly on a plane and their heights are a function of the number of rows in a table they represent.

Here you can see two videos of the gibson interface hooked up to one of our production Redshift databases (with table names obfuscated for, you know, security reasons).

2D Gibson:

3D Gibson:

The Gibson project can be hooked up to any Redshift cluster so let us know if you’d be interested in playing around with it!

Hack the planet!

real hacks
-paulynomial (Paul Kiernan)


Team Faceblocker (Dvd, Nate, Immanuel, Jess)

With all the recent uproar about ad blocking, the Faceblocker team went to work figuring out (a) how to make an ad blocking extension of our own and (b) how to replace ads with something even better! We wanted to put a little soul back into the ads… to help make advertising personal and relevant. So we decided to jack into the user’s webcam.


Faceblocker is a chrome extension that replaces ads with YOUR FACE. Unfortunately, after using it for a couple days, we started getting a little tired of staring at ourselves in 300x250s, so we cooked up a couple different options, including: Darude’s Sandstorm and this inspirational music video. Finally, we experimented with a chat-roulette style video streaming server that replaces each ad with a random feed of another user’s webcam who is also using the extension. A little creepy, but endlessly entertaining when you get enough folks using it.


– Dvd (David)


Team Mobimon (Alex, Tom, Jeremy, Mike)

Team Mobimon crafted the game Mobimon. We focused on the peer to peer aspect of the game’s original intention, which lead to a play on limited time turn based games. The stack behind Mobimon was React with redux, and Firebase. React was chosen for view management, Redux for state management, and Firebase was the crutch we relied on for pvp communication. The team, being all front end developers and a designer, decided to use Firebase because of the ease of use. We didn’t really use Firebase to persist data, but more for its websockets. We had the host player start the game and via websockets would push to Firebase which would then sync any other players with the current state and commands.


The stack was something new to all of us but through pair programming, we were all eventually able to go from crawling to running and contributing. We learned a ton, especially about redux’s strengths and its annoyances; let us not talk about the boilerplate around setting a simple http request. Overall, the game turned out awesome; in a week we not only learned about the whole new stack, a new implementation pattern, but produced something tangible with it.

Screen Shot 2015-11-10 at 9.58.32 AM



Team Confidence Interval UI (Ashley, Brian, Jenn)

Lately I’ve been interested in ‘explorable explanations’. Working with Brian Tice and Jenn Creighton from the frontend team, and Dan Valente and Josh Schwartz from the data science team, we decided to demonstrate the effect of sample size on the type of information you could reliably take from a normal distribution of data.

A good portion of the week was dedicated to first understanding the topic ourselves. The goals for the design were to use common English as much as possible, to interactively visualize the data being graphed, and present visualizations inline with their descriptions when possible.

A somewhat unexpected hurdle was using common English, as many words we use to talk about data vary in meaning from their technical definitions. Another challenge was finding an analogy that allowed us to cover the points we thought were necessary to understanding the concept.

Our solution used a narrative set in space (a standard science fiction strategy, as well as personally amusing), wherein the user imagines themselves a tailor tasked with outfitting an unknown population. Given a certain population size, interested in taking the minimum reliable number of samples, and aiming to match the current sizes in production, the user adjusts ranges setting the breadth of sizes available (translating to the range of the distribution and the target mean), as well as the sample size (creating a distribution that over time will match the target distribution.) Both ranges have small graphic representations next to them, the complete graph is displayed below, and all three adjust in real time.




Why Hackweek?
Some of the other projects that got done that week included “Team 404 – Death of the Shark” (a new interactive 404 page), “Team Clojure API’s” (rewrite APIs in Clojure, just for fun), “Apple TV Big Board” (a Chartbeat Big Board for Apple TV) and others. As you can see, we take our Hacking seriously. It’s an important part of our culture of learning, growth, and making great products. Come hack with us!

This is the story of how Chartbeat struggled for months integrating with a complex and sometimes incomprehensible external API service and eventually brought sanity to that notoriously difficult service. Then, we open sourced the whole thing.

Chartbeat helps leading media companies around the world understand, measure, and monetize the time that audiences actively spend with their sites’ online content. Since we are really good at tracking reader engagement on web pages, we had the bright idea to direct our tech at measuring ads as well. To do this we needed to understand the ad service of choice for most of our clients, Google DoubleClick for Publishers (DFP). Luckily for us, Google provides an extensive API suite to interface with their DoubleClick service. Unluckily for us, integrating with this service in a production environment is complicated and difficult to maintain.

We started by constructing a thin wrapper over Google’s python package googleads. This was great when we needed to pull basic information from DFP, such as names or ID’s of ad campaigns. However, as we wanted to integrate with some of DFP’s more complex features, our continuous iteration on this initial wrapper resulted in numerous hacks and tons of unmaintainable code. Soon after, Google announced that the DFP API version we had adapted all of our code to was about to be deprecated. Translation: all of the production systems that depended on DFP were going to break.

Pressured by time, we scrambled to integrate with the newer API version and ended up inadvertently making our integration even shakier than it was before. Our DFP integration survived the version upgrade, but we had gotten ourselves into a state where further upgrades and feature requests would be hard to manage and unreliable to service.

Up until this point, we were only reading information like campaign names, creative images, and delivery percentages from our client’s DFP accounts. So, even though our DFP integration was unstable, there was very little risk of anything catastrophic happening. The worst case scenario was failing to read an up-to-date number.

Therefore, when we made the decision to actually start writing to and editing our client’s DFP accounts, we had to rethink our strategy. DFP serves as the main revenue source for most of our clients.

Now, our worst case scenario for a failure with our DFP integration elevated from mild inconvenience to putting our clients out of business.

Clearly, we needed something more stable and supportable if we wanted to support writing to our client’s DFP accounts. To move forward, we needed to start over and build something that could reliably translate the evil intricacies of ad tech into something comprehensible: a Parselmouth. For those of you that don’t follow the lore of the Harry Potter universe, a Parselmouth is a person who can communicate with snakes.

Parselmouth was built by Paul Kiernan and myself as a universal translator for external API services from different ad service providers like DFP, DFA, and OpenX with the goal of abstracting away all the ugly insides of these systems.

When designing the project, there were a few primary challenges we wanted to overcome. First of all, Parselmouth should be as easy to upgrade and maintain as possible. We did not want to find ourselves in a situation where we had to scramble to upgrade out of a deprecated version again. Next, we wanted to make sure that the usage of Parselmouth was consistent with the rest of our codebase. Previously, we worked with DFP responses as-is, and these were often inconsistent with our general coding style. Finally, Parselmouth had to deal with the arbitrarily complex functionality of ad servers which involve tons of use-cases, edge-cases, and custom data integrations. In other words, we needed our system to manage the fact that ad tech is….well, it’s not great.

Maintainability and Consistency

The first and most obvious step toward maintainability was to ensure that there was very good unit test coverage in the project. When integrating with an external service, however, coverage isn’t really enough because it is important to validate that the responses you receive from this service are consistent with what you expect. For this reason, we made a suite of integration tests. For each function available in Parselmouth, we have a corresponding integration test that ensures that the response from the API service is what we expect. Therefore, when one of these functions breaks, it allows us to quickly identify exactly what feature has changed within the ad server API.

For example, in our original integration all responses were returned as dictionaries. Then, during a version upgrade, Google updated their python API client to return so-called SUDS objects. Naturally, this broke all of our integration tests, and quickly indicated that we needed to update our methods for deserializing the API responses.

We have also found that Google likes to update the names and nested structures of the responses in their API’s. For example, the name of the impression goal structure within a campaign object changed from “goal” to “primaryGoal” in one API version upgrade, and in this case our get_line_item() integration test quickly discovers such a change.

To give you an idea of the kind of thing you get from DFP, here is what a raw line item looks like straight from their API:

Yeah… that’s a ton of information. This is why we wanted to carefully ingest all of this data into a python class called LineItem with naturally accessible fields.

This means that we translate the varied responses from the ad service into a consistent pythonic class which is accessible to the user in a way that doesn’t require the user to know about the intricacies of DFP.

The Beast of Ad Tech

All of this doesn’t get you anywhere in dealing with the fact that ad tech is a complicated beast. In particular, one of the more painful aspects of ad tech comes from the desire to specify how an ad campaign should be targeted. For example, a brand might decide they want to target women in their twenties in Europe on mobile devices. And to represent this kind of arbitrary open-ended targeting, the ad server has to support arbitrary logical structure. This example could be represented as something like this:

But you also have to support more complicated and varied structure like
targeting women in Europe or men in North America, unless they are in Canada on a mobile device.

To tackle this particular problem, we designed a class into Parselmouth called TargetingCriterion which supports arbitrary logical structures like the ones illustrated above.

Many of our other challenges revolved around dealing with the details of the DFP API. For example, if you want to get historical data about line items within DFP, you have to download a report as a gziped csv. This meant we had to code a function which waits for the report to generate, downloads the gziped file, unzips, parses this text file, then formats this data into a list of Parselmouth LineItem objects.

In other cases, the difficulties of integrating with DFP came from the unpredictability of the responses. For example, consider the case of technology targeting within DFP, which allows you to target specific devices, operating systems, or browsers.

An example technology target which targets a specific browser language and device might look something like the following:

There are two key things to focus on with this response. First, notice that a pluralized version of the targeting type is included before the list of items to be targeted. Second, that this is done in proper English. This means that for words that end in “y”, the variable name changes a “y” to an “ies”, and in the other case, an “s” is added to the end. While this reads nicely in English, this is a nightmare for a programmer. But there is also inconsistency in how the responses are structured. For deviceCategory the word targeted prepends the plural name, while it doesn’t for browserLanguage. These kinds of inconsistencies create an additional challenge when working with DFP API’s. Things would have been much easier if the format had been more predictable like this:

For this example we had to laboriously create a map to parse out the meaning of each of these strings.

Despite these frequent snags in developing our DFP integration, an effective strategy emerged for dealing with these issues within Parselmouth. A strategy which we affectionately refer to as “focused rage.”

“Focused rage” involves jamming all of the horrible bits and pieces of the API integration into a compact independent part of the code.

We designed adapters that fall into the “bridge” programming pattern to hold all of the nasty bits of translation code. So when you initialize a connection with Parselmouth, you are connected via the appropriate adapter, and need never know about the dark secrets that live beneath the hood.

Open Sourcing

Since building Parselmouth, we have added new features and upgraded the underlying DFP API version relatively seamlessly. With this internal success and the knowledge that others also struggle to integrate with DFP and other ad providers, we decided to make our work available to the public. If you are a developer who has had some of the same struggles as us, please let us know, or better yet, open up a pull request against Parselmouth, and add your own features! Check out Parselmouth here and here.