How We Think

This is the third post in a series of data science tutorials using GitHub profile data. In the previous post, we extracted data from the Github API using Ruby and cleaned it with Unix utilities and R. You can read the previous post here: parsing github profiles. Summary By doing Wikipedia queries on location names Continue Reading →
This is the second post in a series of data science tutorials involving GitHub profile data. Summary To obtain raw data for 2 million public GitHub profiles, we used Ruby to pull from GitHub’s API at the max rate and then used Unix tools and R to combine the data into one CSV. Gathering data Continue Reading →
Summary We explored 2 million Github public profiles and discovered some interesting differences in usernames between active and inactive profiles. We began with an open-ended exploration of Github’s public user profiles. Github is an online platform for code collaboration and project management. Their service is used widely enough that a presence on Github could get Continue Reading →
Over 80 people at Microsoft NERD for the workshop. We had a lot of people come out for the Predictive Analytics Meetup Machine Learning Workshop.  David Weisman did a great job teaching about clustering and classification.  I hope everyone enjoyed the decision tree and random forest talk.  It was great to have both beginners and Continue Reading →
I’m doing a machine learning workshop on 12/2 with David Weisman sponsored by the Boston Predictive Analytics Meetup Group. It’s sold out but add yourself to the waiting list.  We’ll be covering clustering and random forests. On 12/13 I’ll be doing a talk with Aki Balogh as the second of two classes at General Assembly about data Continue Reading →