Showing posts from August, 2016

Apache Beam in action: same code, several execution engines

If the previous article was an introduction to Apache Beam, it’s now time to see some of the key provided features. It’s the perfect timing as Apache Beam 0.2.0-incubating has just been released. This articles will show a first pipeline use case, and will execute the same pipeline code on different execution engines. Context: GDEL analyses For this article, we are going to create a pipeline to analyse GDELT data and count the number of events per location in the world. The GDELT project gathers all events happening in the world. It creates daily CSV files, containing one line per event. For instance, an event look like: 545037848 20150530 201505 2015 2015.4110 JPN TOKYO JPN 1 046 046 04 1 7.0 15 1 15 -1.06163552535792 0