Archive for the ‘Development’ Category

WADE, Our New Best Friend

March 11th, 2016 by Wes

We’re pleased to announce WADE, a little database project of ours. You can get all of the interesting detail from the git repo, but I’ll go into a bit here first, because what’s an engineering blog if there’s no engineering?

At Chartbeat, we’ve found ourselves often times wrestling with database scale problems when something we wanted to do with high throughput didn’t quite match the database’s data model. Usually in these cases, we fallback on a read-write-update cycle, which can kill performance and suffer from race conditions.

One annoying as hell example has to do with us maintaining a map of URLs to what we call canonical paths. You’ll often times see in your browser’s URL bar a path that has a bunch of query parameters that have no relation to the content of the page, such as utm tags or other tracking codes. When we send our data pings, we include all of those parameters, but on the server side we have to make some attempt to strip out the parts of the URL that aren’t relevant, otherwise we run the risk of not counting up all the metrics for a page.

A super simple example might be these two URLs:

We can’t just take the URLs at their face value and assign them each one page view. Rather, we want to strip off the “?referrer=twitter” and “?referrer=facebook” part and count two page views for /squirrels-are-barking-mad. The problem is compounded by the lack of a standard set of tracking codes used by publishers, so we can’t assume we’ll always know how to sanitize the data.

One solution is for our client to customize the ping so that it includes a canonical path of /squirrels-are-barking-mad, or sets an og:url meta tag within the page. That’s all well and good, but it assumes that the client actually implements this correctly, that the browser doesn’t mess something up in transit and that nobody is trying to spoof our client’s data. The internet is a messy place, so we regularly see multiple conflicting canonical paths for the same source URL.

What we do then is keep a tally of what mappings we’ve seen so far, and our system in essence votes for what the canonical path should be for any given URL.

Simple enough, but how do we do this at scale? We handle tens of thousands of such pings per second, and need to be able to vote with very low latency and high throughput. A simple way to achieve this at reasonable cost would be to use memcache or Redis as the backing store, and simply set the key to the source URL, and the value to a map of canonical path to counts, where counts are the number of times a particular source URL mapped to that canonical path. So one entry might be something like:

Meaning, we’ve seen /squirrels-are-barking-mad?referrer=twitter map to /squirrels-are-barking-mad 12,382 times, and to /squirrels-are-cute once.

With memcache, adding a count follows a read-write-update cycle:

  1. Read the opaque value at /squirrels-are-barking-mad?referrer=twitter.
  2. Deserialize it to a dict, call it “counts”.
  3. Add 1 to the value at counts[‘/squirrels-are-barking-mad’].
  4. Serialize and write the values back to /squirrels-are-barking-mad?referrer=twitter.

To do this, the client has to pull down data from the database, do the deserialization mojo, do some serialization mojo, then write it back, so there’s a full roundtrip. In addition, there’s a race between steps #1 and #4 if two clients are simultaneously trying to update the same URL on a backing store that doesn’t support transactions.

While we’re applying a continuous stream of counts, we may get asked what the canonical path is for some source URL. To figure this out, the client has to pull down and deserialize the data for a source URL, then look at the tallies and vote on which canonical path has the most weight. In this example, the a voting query would return /squirrels-are-barking-mad, since it’s clearly the correct one.

How does WADE improve on this? A database is essentially a chunk of data that transitions from state to state, where the transitions are insert/update commands. WADE is a replicated state machine which has no opinions on the structure of its state. This is entirely programmer defined, as are the state transition functions, which we call mutating operations. A WADE cluster keeps track of object state and pushes mutating operations to all of the replicas in a consistent way using the chain replication algorithm. Because operations are fully programmable, we’re able to eliminate read-write-update cycles by defining application specific transitions.

Let’s first set aside all the nice replication benefits WADE gives us for free and look just at the update operation and how the programmer defines mutating operations. We’d call the /squirrels-are-barking-mad?referrer=twitter entry an object, and define state transitions on this object. One such state transition would be an update command that adds one to a candidate canonical path. I’ve simplified the code for the purposes of this post, but it’s only marginally more complex than:

This function would get deployed to the WADE nodes, much like custom written views are deployed with a Django (or any web framework) app to a web server. Then a client connecting to the WADE cluster would issue a single update command:

We don’t have to do a full round trip, and we only send the data necessary to increment the counter.

The vote would simply be another customized WADE operation, but this time a non-mutating operation, ie a query.

The vote function would ship with WADE to the nodes in the same way as the mutating operations and run on the nodes themselves. The client would issue a vote operation, and the cluster would return only the top path, eliminating the need to transfer the entire serialized object and run the logic in the client.

There are consistency benefits as well. WADE serializes update operations and has strong consistency guarantees (at the risk of partial unavailability), so race conditions don’t exist.

The URL tracking example is one simple application that motivated the design of the system, but another is a probabilistic set cardinality (HyperLogLog) service that powers our Engaged Headline Testing product. At the moment our HLL service is built on top of Riak but suffers from high network overhead and chatter between nodes. Each HLL structure is around 10k in size, and even a small bit flipping operation requires downloading the entire serialized data. When you throw in the possibility of siblings, what we’re left with is a system that scales beautifully horizontally, but is not as resource efficient as it could be. We have yet to move our HLL service to WADE, but initial performance tests show significant improvement over our current Riak cluster. We’ll be open sourcing a HLL service on WADE in the near future.

Another nice property of WADE: the core code is simple. While I can’t really argue that thinking about distributed systems is easy, we’ve reduced the surface area of misunderstanding by enforcing a rule that the core code will always be fewer than 1,000 lines. The effect of this is that WADE doesn’t optimally handle a lot of edge cases, though (bugs notwithstanding) it maintains correctness. Any node failure might cause the cluster to be unavailable for longer than an industrial strength database might, but it’s still within acceptable limits for production use.

Also, we leave a lot of details up to the programmer. While WADE will handle replication happily, it needs to be told what nodes to replicate to. There are nuanced reasons why we might want to leave this up to the programmer that I may go into in another blog post.

There is some missing functionality (fast syncing, as described in the repo’s README, as well as a useful generic overlord), but we’ll fill those in over time as we improve our understanding of operational issues.

In the meantime, please take a look at WADE and play with the the in-memory kv store that ships with it. This is still alpha quality software, but we’re optimistic it’ll see varied use within our systems in important and high scale areas. We would love to hear feedback, and of course would be happy to accept pull requests. Just don’t bust our 1,000 LOC quota.

Hackweek 31

November 25th, 2015 by Nathan Potter

What’s Hackweek Again?
Here at Chartbeat, we have a Hackweek every 7 weeks. Hackweek is a time to learn, explore, and just try something new. At the end of October, we had Hackweek 31 (that’s 31 x 7 = 7 months of hacking over 4 years). Lately we’ve been doing team hacks, which are 3 or more people doing a project together. This is a short writeup of just 4 of the projects that got done during our last Hackweek.


Team Gibson (Paul, Matt O, Anastasis)

Here’s a little known fact about AWS Redshift: it was previously known as the “Gibson” project at the Ellingson Mineral Company before that whole Hack the Planet / DaVinci virus scandal drove the company into the ground in 1995. Amazon acquired rights to the project in subsequent years and turned it into the data warehouse we know and love. To the disappointment of programmers everywhere, the product managers at Amazon decided somewhere along the way it would cut costs and ease adoption to remove the original graphical interface to the database and replace it with the Postgres-compatable one we have today.

Okay, not really. The Gibson was the supercomputer featured in the 1995 teen sci-fi thriller, and now cult-classic, Hackers. The movie energetically explored and sensationalized the early Hacker subculture and effectively inspired a whole generation of kids to become computer programmers. Myself (Paul Kiernan) and two other initiates of the hackerverse (Matt Owen and Anastasis Germanidis) came together over the past hackweek at Chartbeat to make the original Gibson interface a reality and perhaps even bring it to the Oculus Rift.


We began by investigating the Oculus Rift SDK and frameworks we could use to include an existing C++ project that emulated the Gibson interface. Unfortunately, it quickly became clear we would be unlikely to produce anything in a reasonable amount of time if we had to struggle with the lack of support for the SDK on OS X, learn how to use openframeworks, and refactor the existing project to include Redshift adapters and work generally with VR in c++. So we researched alternative platforms and settled on the widely accessible combination of WebVR and threejs.

WebVR is an experimental Javascript API that provides access to Virtual Reality devices like the Oculus Rift. Combined with threejs, a wonderful javascript framework around WebGL, we were able to create a Gibson-like interface to one of our Redshift clusters that runs entirely in the browser. The final product queries our cluster to get a list of tables and their sizes and paints them as buildings on a 3D landscape. Buildings are arranged randomly on a plane and their heights are a function of the number of rows in a table they represent.

Here you can see two videos of the gibson interface hooked up to one of our production Redshift databases (with table names obfuscated for, you know, security reasons).

2D Gibson:

3D Gibson:

The Gibson project can be hooked up to any Redshift cluster so let us know if you’d be interested in playing around with it!

Hack the planet!

real hacks
-paulynomial (Paul Kiernan)


Team Faceblocker (Dvd, Nate, Immanuel, Jess)

With all the recent uproar about ad blocking, the Faceblocker team went to work figuring out (a) how to make an ad blocking extension of our own and (b) how to replace ads with something even better! We wanted to put a little soul back into the ads… to help make advertising personal and relevant. So we decided to jack into the user’s webcam.


Faceblocker is a chrome extension that replaces ads with YOUR FACE. Unfortunately, after using it for a couple days, we started getting a little tired of staring at ourselves in 300x250s, so we cooked up a couple different options, including: Darude’s Sandstorm and this inspirational music video. Finally, we experimented with a chat-roulette style video streaming server that replaces each ad with a random feed of another user’s webcam who is also using the extension. A little creepy, but endlessly entertaining when you get enough folks using it.


– Dvd (David)


Team Mobimon (Alex, Tom, Jeremy, Mike)

Team Mobimon crafted the game Mobimon. We focused on the peer to peer aspect of the game’s original intention, which lead to a play on limited time turn based games. The stack behind Mobimon was React with redux, and Firebase. React was chosen for view management, Redux for state management, and Firebase was the crutch we relied on for pvp communication. The team, being all front end developers and a designer, decided to use Firebase because of the ease of use. We didn’t really use Firebase to persist data, but more for its websockets. We had the host player start the game and via websockets would push to Firebase which would then sync any other players with the current state and commands.


The stack was something new to all of us but through pair programming, we were all eventually able to go from crawling to running and contributing. We learned a ton, especially about redux’s strengths and its annoyances; let us not talk about the boilerplate around setting a simple http request. Overall, the game turned out awesome; in a week we not only learned about the whole new stack, a new implementation pattern, but produced something tangible with it.

Screen Shot 2015-11-10 at 9.58.32 AM



Team Confidence Interval UI (Ashley, Brian, Jenn)

Lately I’ve been interested in ‘explorable explanations’. Working with Brian Tice and Jenn Creighton from the frontend team, and Dan Valente and Josh Schwartz from the data science team, we decided to demonstrate the effect of sample size on the type of information you could reliably take from a normal distribution of data.

A good portion of the week was dedicated to first understanding the topic ourselves. The goals for the design were to use common English as much as possible, to interactively visualize the data being graphed, and present visualizations inline with their descriptions when possible.

A somewhat unexpected hurdle was using common English, as many words we use to talk about data vary in meaning from their technical definitions. Another challenge was finding an analogy that allowed us to cover the points we thought were necessary to understanding the concept.

Our solution used a narrative set in space (a standard science fiction strategy, as well as personally amusing), wherein the user imagines themselves a tailor tasked with outfitting an unknown population. Given a certain population size, interested in taking the minimum reliable number of samples, and aiming to match the current sizes in production, the user adjusts ranges setting the breadth of sizes available (translating to the range of the distribution and the target mean), as well as the sample size (creating a distribution that over time will match the target distribution.) Both ranges have small graphic representations next to them, the complete graph is displayed below, and all three adjust in real time.




Why Hackweek?
Some of the other projects that got done that week included “Team 404 – Death of the Shark” (a new interactive 404 page), “Team Clojure API’s” (rewrite APIs in Clojure, just for fun), “Apple TV Big Board” (a Chartbeat Big Board for Apple TV) and others. As you can see, we take our Hacking seriously. It’s an important part of our culture of learning, growth, and making great products. Come hack with us!

This is the story of how Chartbeat struggled for months integrating with a complex and sometimes incomprehensible external API service and eventually brought sanity to that notoriously difficult service. Then, we open sourced the whole thing.

Chartbeat helps leading media companies around the world understand, measure, and monetize the time that audiences actively spend with their sites’ online content. Since we are really good at tracking reader engagement on web pages, we had the bright idea to direct our tech at measuring ads as well. To do this we needed to understand the ad service of choice for most of our clients, Google DoubleClick for Publishers (DFP). Luckily for us, Google provides an extensive API suite to interface with their DoubleClick service. Unluckily for us, integrating with this service in a production environment is complicated and difficult to maintain.

We started by constructing a thin wrapper over Google’s python package googleads. This was great when we needed to pull basic information from DFP, such as names or ID’s of ad campaigns. However, as we wanted to integrate with some of DFP’s more complex features, our continuous iteration on this initial wrapper resulted in numerous hacks and tons of unmaintainable code. Soon after, Google announced that the DFP API version we had adapted all of our code to was about to be deprecated. Translation: all of the production systems that depended on DFP were going to break.

Pressured by time, we scrambled to integrate with the newer API version and ended up inadvertently making our integration even shakier than it was before. Our DFP integration survived the version upgrade, but we had gotten ourselves into a state where further upgrades and feature requests would be hard to manage and unreliable to service.

Up until this point, we were only reading information like campaign names, creative images, and delivery percentages from our client’s DFP accounts. So, even though our DFP integration was unstable, there was very little risk of anything catastrophic happening. The worst case scenario was failing to read an up-to-date number.

Therefore, when we made the decision to actually start writing to and editing our client’s DFP accounts, we had to rethink our strategy. DFP serves as the main revenue source for most of our clients.

Now, our worst case scenario for a failure with our DFP integration elevated from mild inconvenience to putting our clients out of business.

Clearly, we needed something more stable and supportable if we wanted to support writing to our client’s DFP accounts. To move forward, we needed to start over and build something that could reliably translate the evil intricacies of ad tech into something comprehensible: a Parselmouth. For those of you that don’t follow the lore of the Harry Potter universe, a Parselmouth is a person who can communicate with snakes.

Parselmouth was built by Paul Kiernan and myself as a universal translator for external API services from different ad service providers like DFP, DFA, and OpenX with the goal of abstracting away all the ugly insides of these systems.

When designing the project, there were a few primary challenges we wanted to overcome. First of all, Parselmouth should be as easy to upgrade and maintain as possible. We did not want to find ourselves in a situation where we had to scramble to upgrade out of a deprecated version again. Next, we wanted to make sure that the usage of Parselmouth was consistent with the rest of our codebase. Previously, we worked with DFP responses as-is, and these were often inconsistent with our general coding style. Finally, Parselmouth had to deal with the arbitrarily complex functionality of ad servers which involve tons of use-cases, edge-cases, and custom data integrations. In other words, we needed our system to manage the fact that ad tech is….well, it’s not great.

Maintainability and Consistency

The first and most obvious step toward maintainability was to ensure that there was very good unit test coverage in the project. When integrating with an external service, however, coverage isn’t really enough because it is important to validate that the responses you receive from this service are consistent with what you expect. For this reason, we made a suite of integration tests. For each function available in Parselmouth, we have a corresponding integration test that ensures that the response from the API service is what we expect. Therefore, when one of these functions breaks, it allows us to quickly identify exactly what feature has changed within the ad server API.

For example, in our original integration all responses were returned as dictionaries. Then, during a version upgrade, Google updated their python API client to return so-called SUDS objects. Naturally, this broke all of our integration tests, and quickly indicated that we needed to update our methods for deserializing the API responses.

We have also found that Google likes to update the names and nested structures of the responses in their API’s. For example, the name of the impression goal structure within a campaign object changed from “goal” to “primaryGoal” in one API version upgrade, and in this case our get_line_item() integration test quickly discovers such a change.

To give you an idea of the kind of thing you get from DFP, here is what a raw line item looks like straight from their API:

Yeah… that’s a ton of information. This is why we wanted to carefully ingest all of this data into a python class called LineItem with naturally accessible fields.

This means that we translate the varied responses from the ad service into a consistent pythonic class which is accessible to the user in a way that doesn’t require the user to know about the intricacies of DFP.

The Beast of Ad Tech

All of this doesn’t get you anywhere in dealing with the fact that ad tech is a complicated beast. In particular, one of the more painful aspects of ad tech comes from the desire to specify how an ad campaign should be targeted. For example, a brand might decide they want to target women in their twenties in Europe on mobile devices. And to represent this kind of arbitrary open-ended targeting, the ad server has to support arbitrary logical structure. This example could be represented as something like this:

But you also have to support more complicated and varied structure like
targeting women in Europe or men in North America, unless they are in Canada on a mobile device.

To tackle this particular problem, we designed a class into Parselmouth called TargetingCriterion which supports arbitrary logical structures like the ones illustrated above.

Many of our other challenges revolved around dealing with the details of the DFP API. For example, if you want to get historical data about line items within DFP, you have to download a report as a gziped csv. This meant we had to code a function which waits for the report to generate, downloads the gziped file, unzips, parses this text file, then formats this data into a list of Parselmouth LineItem objects.

In other cases, the difficulties of integrating with DFP came from the unpredictability of the responses. For example, consider the case of technology targeting within DFP, which allows you to target specific devices, operating systems, or browsers.

An example technology target which targets a specific browser language and device might look something like the following:

There are two key things to focus on with this response. First, notice that a pluralized version of the targeting type is included before the list of items to be targeted. Second, that this is done in proper English. This means that for words that end in “y”, the variable name changes a “y” to an “ies”, and in the other case, an “s” is added to the end. While this reads nicely in English, this is a nightmare for a programmer. But there is also inconsistency in how the responses are structured. For deviceCategory the word targeted prepends the plural name, while it doesn’t for browserLanguage. These kinds of inconsistencies create an additional challenge when working with DFP API’s. Things would have been much easier if the format had been more predictable like this:

For this example we had to laboriously create a map to parse out the meaning of each of these strings.

Despite these frequent snags in developing our DFP integration, an effective strategy emerged for dealing with these issues within Parselmouth. A strategy which we affectionately refer to as “focused rage.”

“Focused rage” involves jamming all of the horrible bits and pieces of the API integration into a compact independent part of the code.

We designed adapters that fall into the “bridge” programming pattern to hold all of the nasty bits of translation code. So when you initialize a connection with Parselmouth, you are connected via the appropriate adapter, and need never know about the dark secrets that live beneath the hood.

Open Sourcing

Since building Parselmouth, we have added new features and upgraded the underlying DFP API version relatively seamlessly. With this internal success and the knowledge that others also struggle to integrate with DFP and other ad providers, we decided to make our work available to the public. If you are a developer who has had some of the same struggles as us, please let us know, or better yet, open up a pull request against Parselmouth, and add your own features! Check out Parselmouth here and here.

Not too long ago, Camille Fournier at Rent the Runway shared the software engineering ladder their team uses for promoting engineers within the development team. I thought I’d take the opportunity to share the ladders we use at Chartbeat and look at how you might structure a ladder for your startup.

First, The Basics.
A ladder is an outline of expectations for people at various levels within the company. Many companies of varying sizes have them, you may vaguely remember seeing one during a new hire orientation, or you may have revisited it regularly if you’re gunning to climb the ladder quickly. They serve both as a means for people to know what’s expected of them to get a promotion, and a framework for managers to know when to promote someone.

This is Chartbeat’s ladder for Backend Engineers. We have a separate ladder for Frontend Engineers and Designers, and another ladder for Data Scientists. For comparison, here is Rent the Runway’s ladder and Joel Spolsky’s ladder for Fog Creek.

I hadn’t seen Camille’s or Joel’s prior to doing ours, yet they have a lot in common. I guess there are some universal truths in charting growth once you know a little about how engineers and engineering teams work.

Do I really need a ladder? HR is such a drag.
If your company is just starting out, you probably don’t need a ladder. However, it’s reasonable for an engineer to start thinking about promotion eventually, and you’ll need to address that. You may be able to start by giving promotions in a more haphazard manner, but as the team grows you need to do it evenly and with some justification.
If your company is more than 2 years old, and you have employees that have been around since the beginning (and I hope you do) it’s time to start thinking about it. I joined Chartbeat when it was just over 2, but it had had an extended gestation at Betaworks and was just getting an established engineering team. I waited 3 years to put a ladder in place, which was probably a bit longer than I should have.

It’s also worth noting that raises are not tied directly to the ladder at here at Chartbeat. A move up the ladder will always come with a significant raise and title change, but smaller raises can happen in the mean time.

Two Paths
The Chartbeat and RtR ladders both include a concept of “Manager” and “Architect” tracks. This is a common distinction in software engineering teams and one that most developers will face at some point in their careers. “Do I want to build bigger and better systems, or do I want to manage bigger and better teams?”

Both jobs require leadership. For an architect, it’s more about thought leadership. Thinking long term and getting a team to share your vision. For a manager, it’s more about hiring, team organization, and helping people up the ladder.

There’s no right answer to this, it’s all about your personality and preferences. They are not typically mutually exclusive either. In my experience, if you’re excellent at one of them you’d probably be excellent at the other as well. Maybe this is why Fog Creek makes no distinction between the two (in their ladder, anyway).

Experience Counts
The Chartbeat ladder does not explicitly include years of experience in the requirements for moving up. However, I’ve told the team to expect it to take 2-3 years to move up a stage. RtR has what looks like years of experience associated with each stage on the left side of their Ladder. Fog Creek disassociates experience from skills entirely, but does tie it to salary.

How experience relates to the value you bring to a team as an engineer is not always obvious. In one regard, it could be argued that a new grad with all the skills of a developer with 25 years of experience should be paid the same, but are they really the same? What metrics do you use for measuring experience and how is it valued?

For starters, the top rungs of the ladder will include some things new grads will not have done at a company of scale. “Successful platform cycle launch”, “Plays a key role in developing multi-year technology strategy”, or “Owns the development for a major project” are not something you will have done without a few years under your belt.

But more to the point, if you are an experienced engineer bringing your hard earned knowledge to bear on your work; sharing experiences, offering insight and guidance, you are delivering far more than quality code to team.

Climbing The Ladder
When it comes down to it, creating a ladder is the easy part. How you apply it is much trickier. You can use a form, have a meeting, write a letter, but these are only useful if they create honest discussions around performance. An honest discussion, structured around the framework you’ve provided your team, is the best and most transparent way to help someone grow their career. Those discussions can be hard to have, but a ladder will help you focus on tangible actions both good and bad.

As a an engineering manager it’s your job to help your team up the ladder. That’s not just because it’s good for your business, it’s your commitment to them when they joined your team. If I didn’t believe that I’d have stuck with writing code instead of managing coders. I could have avoided a lot of meetings, but I like to think I’m helping good engineers along their way.

Back in 2014 I gave a presentation at PyGotham on a neat PostgreSQL feature called Foreign Data Wrappers. Because of a variety of factors, the presentation was a little unhinged, and I declined making it public in favor of a more detailed written up post, which I swore would appear just a couple weeks after PyGotham was finished. Well, typical of a software developer, I shipped right on time plus 7 months. You can find the post over at GitHub along with code and Vagrantfile for your enjoyment.

More from Chartbeat, angrier, newbier, and foodier than ever!

Notes from the Mystery Machine Bus – Another Steve Yegge classic, this time literally politically polarizing. Yegge argues that software engineers fall along a conservative/liberal programming language spectrum, and that this is as fundamental to your character as your political leanings.

Framer.js – One of our illustrious designers who is learning to code says this: Building interactive prototypes to test out your ideas is really cool. The problem is that usually involves javascript, which I am patently afraid of. Framer.js is for people like me. Check out these videos from their first meetup.

Edible Geography – Nicola Twilley’s excellent blog about food, but not how to make it! She focuses on everything else: history, technology, obscura, and science. You might not realize it but your life would be complete if Twilley were to come to your home for dinner and tell you about refrigeration in China. For reals.

Last Thursday Chartbeat hosted the Frontend Innovators Meetup group at our office. The focus was Angular.js, and my co-workers and I had the good fortune of being presenters. Our goal was to share some of the experiences and insights we’ve gained over the past year working with Angular.js.

From Closure To Angular

Slides are available online.

I presented first, focusing on how and why Chartbeat began using Angular.js for all of our front-end applications in the first place. I discuss our decision to migrate from Google Closure Library to Angular.js after weighing all the options we had available. I then shared how we actually develop with Angular.js, focusing on how we layout our directories, files, and applications, and discussing our multiple applications at Chartbeat and our need for a system that allows for growth, flexibility, and sharing between all applications.

Reusable Components

Slides are available online.

Danny presented next, discussing the merits and pitfalls of developing reusable components. He first discussed the advantages of working with components, which give us the ability to quickly throw together entire new applications by using existing components, allowing us to rapidly develop new applications by leveraging existing components. Danny then dove into the trade-offs that accompany the creation and maintenance of components: if it is more “expensive” to create a component than it is to simply duplicate code then perhaps it isn’t worth the time and effort to create it as a component. It was a nuanced talk with some great points. Worth watching in full, I think.

Graphing Chartbeat with Angular + SVG + D3

Slides are available online.

Nick then presented about SVG, D3, and Chartbeat’s use of D3 within Angular. He began by giving a solid background on SVG and the pitfalls you can encounter when trying to use Angular’s templating functionality with SVG elements (spoiler alert: there are a few!). Nick then discussed our internal library called C3 that he developed. He showed the design decisions that shaped C3 and the places where he’s found some shortcomings that he hopes to fix in the future.

Frontend Testing and Build Process

Slides are available online.

Jem rounded out the night with a thorough talk on Chartbeat’s test practices and build process. He talked about our testing stack, consisting of Jasmine, Karma, Protractor, Jenkins, and Selenium, and the best ways to put  these systems to work. Jem also shared some good practices to keep in mind when using these tools. He then turned his focus to how we compile our applications for a production environment, discussing our move from Grunt to Gulp and why we’re finding it a better fit — it’s cleaner and clearer to work with, as it’s mostly vanilla JavaScript, which makes reasoning with it easier.


We had a great time presenting and hope everyone learned something from our talks.

Angular.js is still pretty new and we’re all still learning the best way to do things. Over the past year we’ve learned a lot about the right way to do things and the wrong way to do things. If you have too, please chime in. Sharing what we’ve learned is the best way for our whole front-end community to continue building better and better things.

The Weekliest Links

April 25th, 2014 by Wes

Chartbeat engineers are avid readers and we wanted to share some of the gems that we’ve come across recently (or not so recently). This being the engineering blog, these links are engineering-ish related. 

  • The Big Ball of Mud — at Chartbeat we write a lot of APIs and have repeatedly run up against the problem of staying nimble and keeping our codebase organized. This read has help us step back and recognize the “pattern” in common anti-patterns.
  • Intro to the A* Algorithm — this quick read nicely details the intuition behind one of the best path finding algorithms in existence.
  • Why Virtual isn’t Real to your Brain — this is a longer read about why VR is hard. The human visual cortex has evolved over millions of years, and inventing graphics algorithms to spoof it is an elusive endeavor.

Happy Friday!

PS. Here’s a rare pic of Chartbeat designers in their natural habitat. Please don’t feed them, otherwise they will learn to not fear humans.


Hacking Cassandra

January 10th, 2014 by Nathan Potter

Tadas (@tadasv) looked at implementing HyperLogLog in Cassandra during hack-week. Checkout his blog post on hacking cassandra.

Chartbeat Tech Talks

November 8th, 2012 by beaufour

We are doing so many interesting things at Chartbeat Engineering, and we try our best to share our experiences at conferences and meetups. Here are a few of the recent ones (all of them can be found here).

I recently gave a talk at New York Times’ TimesOpen event about how to handle scale:

Tom talked about building mobile apps at EmpireJS:

Matt talked about designing data at Asbury Agile:

And Meagan talked about how to become a web design champion at Web Design Day:

I hope to meet you at an event soon. Until then, enjoy the slides!