In January 2014, we organized first edition of the MCE conference in Warsaw, Poland. Even though it was the first time we organized such an event, it’s been really successful. Now, while we organize a second edition we analysed past one, and we found out that we can show in rather cool way what was really interesting for attendees of the conference (450 of them!).
You can see the results of our work here (click to see the interactive, animated version!) :
You can see the heatmap of people changing over time, and also the corresponding screenshots of talks at the time (we’ve published all the talks for free!).
If you want to find out how we’ve done that - read on.
Devices and data gathering
We knew from the very beginning that WiFi is really important for our attendees - and we could not simply outsource it. Thats why we decided to do it ourselves. After the event we gathered a lot of feedback from the attendees, and one of the comments we had was that the WiFi was really great - especially comparing to other events people were attending.
We needed to be able to provision a new device and reconfigure part of the network on the fly. The OpenWrt configuration consist of simple text files that can be easily managed with a few bash scripts over SSH. Our great team at Polidea with Kamil and Maciek leading the effort managed to build a complete wireless system with help from Warsaw’s hackerspace.
That turned to be really good decision - on top of really great experience for our attendees, we had a chance to do more than simply delivering the internet.... We could log quite a lot of data. But we logged only the absolute minimum data to be able to make some useful analysis: MAC addresses (to distinguish between the devices), time of last data transfer and signal strength (which finally we did not use). We did not log URLs, session times, nor data transfer - anything that could be considered as private data. Then for processing we even aggregated the data and removed MAC addresses to make sure we are not violating anyone’s privacy.
On every hotspot in /www/cgi-bin we put simple bash script. This is special folder which is exposed over www. When script has executable flag set it can run commands on router. In our case the script was simply printing all associated stations with information how long ago transmission was seen. One the other side we had server with MySQL database and second script which was crawling all the devices every minute. Received data were then inserted into the database for later analysis.
These data we later aggregated into 5-minute slots with information how many devices were connected in the cinema rooms. Aggregation were needed to remove some fluctuations, ie. random reconnections on the edge of wireless coverage.
That was Devops approach from the very beginning.
So what can you do when you run the WiFi hotspots at the conference? It turns out that you can do quite interesting visualisation and analysis.
Secondly - the Praha Cinema in Warsaw has a lot of insulation between the cinema halls and the main cinema hall . The insulation was mainly about sound separation of course, but it turned out that it’s also a great WiFi signal barrier - thanks to that we could tell at any time during the conference how many connections to WiFi hotspots were established in which room. So - what can you do with that data? Of course you can visualise it showing the heatmap of people during the conference! It’s obvious, right?
Since at Polidea we love diversity, polyglot programming and good engineering in general, we chose a mix of tools and languages to help us to get there fast and efficiently.
We have a couple of scripts which allows us to manage all the devices from command line.
The data from devices needed to be fetched and put into database using script added to crontab.
Later we had to post process raw data in database. We did it with simple PHP script which produced readable .csv output. The data was in 5 minute intervals and it was simply number of people that were connected to our WiFi in each room (in each interval).
In order to be able to experiment, we defined the "cinema world" in configuration.py file - it describes the cinema using simple dimensions. Thanks to flexible configuration we could generate the cinema room layout image and see if our rooms were defined properly and also change the dimensions very easily in the final version. That was done using prepare_image.py script and the great PIL library
We also defined movies.json file where we stored data about the presentations, number of frames, links to youtube recordings.
Extracting talk data was fairly easy to do as we already had structured yaml data about presentations. Data that we used to generate our http://2014.mceconf.com page (hosted with Github Pages of course and preprocesed using Ruby’s Jekyll). Which BTW is also used to generate the page you are looking at - and the whole of Polidea’s website.
We also added schedule definition (schedule.json), where the movies were allocated to the rooms and timings. You might notice that by clicking the screenshots of talks you can get directly to the youtube talk recording at the exact time of screenshot. How cool is that?
We developed several python scripts that pre-processed the data and movies:
- extract_snapshots.py script to extract snapshots of the recordings using ffmpeg utility from the original mpg files for the talks.
- prepare_montages.py python script to combine the snapshots into bigger montage images - useful for performance (more of this later)
- prepare_data.py script that preprocessed raw data from routers into data that was directly useful for visualisation. It’s fairly complex on it’s own, so more information about this later
The raw data was pre-processed in order to have really good looking visualisation and smooth animation:
we did not have the actual location of people, we only had data about the hall where people connected. So we had to "cheat" a bit - the heatmap “points” - representing people - were randomly generated rather than known by us
we interpolated the 5-minute-interval data to have 1 minute intervals - that’s another small "cheating" but providing much smoother transitions
we adjusted the data between the frames to account for exits from individual halls and from the cinema. If the number of people in each hall changed - we assumed that during that frame those people were moving out or in the hall (and we placed them in the exit area for that hall). If the total number of people in the cinema changed - we assumed that people were moving in or out the cinema, and we placed them in the cinema exit area
we needed to have smooth transitions during animations. If we get random distribution for each frame, prepared separately, the heatmap would change significantly after every frame. So we’ve done it incrementally (or decrementally) - each following frame is built from the previous frame. The points from previous frame are base for the next one - if the new frame has more points, only the few new points are randomly added, it the new frame had less points - the missing points are removed from the previous frame points.
Heatmap.js library is great to generate static heatmaps (it uses HTML5’s canvas to draw the heatmap) but we had to perform several optimisations for smooth transitions:
We used the old trick of double-buffering - we generate two neighbouring frames and then transition between the frames using parallel fade-in / fade-out
We display the current talk screenshots in parallel. In order to avoid reloading images for each frame - we use montages prepared by the python scripts and css image sprites to display appropriate frame from the montage. This way we don’t reload image at every frame, only when the talk changes we reload the whole talk montage and keep it in memory for following frames.
we use requestAnimationFrame in order to automatically adjust the processing time to the speed of client’s browser - if it takes longer to process a frame, then some of the frames will be automatically skipped (total animation frame remains constant then independently on the speed of the client’s browser)
We used rangeslider library http://andreruffert.github.io/rangeslider.js/ to get custom slider - we even made a small fix that now got into the released version - we had more frames (steps) than most of the slider users and there was a subtle bug with rounding steps that manifested itself sometimes when slider was clicked.
That was really nice exercise to develop this small interactive visualisation of the WiFi data that we gathered. The polyglot programming and devops approach to your infrastructure is right - combining several technologies, tools and languages, can be efficiently used to get cool results. We cannot wait to see what we can get with MCE 2015 which is happening soon (hint: NFC, BLE, beacons). If you want to be part of it and your interest is in Mobile or Internet of Things - buy tickets at http://register.mceconf.com.