How We Think

Exploratory Data Analysis with Github

In this series of posts, we work through an exploratory data science analysis from start to finish. Our goal with this series is to show our current and future clients the work that goes on behind the scenes and to serve as tutorials aspiring data scientists wishing to learn how to approach an exploratory problem.

Github 1 - Who are Github Users?

Github is an online platform for code collaboration and project management. We explored 2 million Github public profiles and discovered some interesting differences in usernames between active and inactive profiles.

Github 2 - Parsing Github Profiles

This is the second post in a series of data science tutorials involving GitHub profile data. we used Ruby to pull from GitHub’s API at the max rate and then used Unix tools and R to combine the data into one CSV.

Github 3 - Where in the World are Github Users?

This is the third post in a series of data science tutorials using GitHub profile data. By doing Wikipedia queries on location names from Github profiles, we created a dataset of longitude and latitudes for Github users worldwide.

Github 4 - Mapping Github Users with ggplot2

This is the fourth post in a series of data science tutorials using Github profile data. Using our dataset of Github user longitudes and latitudes from Part 3, we visualize and analyze the distribution of user locations using R with maps and ggplot2.