• Git risky!

    Git is a powerful tool for code versioning. If you follow its best practices and have good ‘commit hygiene’ (easier said than done) it can also be a source of valuable data about your coding practices. A while back I built a little tool that uses the metadata git collects, along with its logging and ‘blaming’ functionality to score commits on their likelihood of introducing a bug. The code is open sourced on github, check it out. You could even run the tool against its own git history! I also gave a talk at PyData NYC 2017 about how it works, linked below the fold.

  • An Alethiometer for the Modern Age

    The Golden Compass was one of my favorite books growing up. It has lots of your standard young adult fantasy epic elements – a plucky heroine, talking animals, authoritarian villians – but it also touches on some weighty theological themes. The author described it as a deliberate inversion of Milton’s Paradise Lost (and not for nothing, at the end of the series the protagonists save the world by killing God and re-committing original sin). A central element in the book is the existence of the eponymous “golden compass”, a literal machina qua deus ex which answers questions through divine intervention. The compass presents its answers as a series of ideograms: its face is ringed with symbols and when posed a question its needle sweeps around the face selecting the symbols which comprise the answer. I always wanted one of those when I was a kid but, alas, back then powerful artifacts with oracular capabilities were in short supply. Nowadays we have smartphones and twitter though so better late than never! In this post I’m going to describe a twitter bot I made which answers questions with emoji (hence alethiomoji, the name of the project; the golden compass was also called an alethiometer).

  • Better Bluegrass through Javascript

    I like bluegrass for a lot of reasons, but one of the main ones is its communal character: this kind of music is often made as much for the joy of making it as for the sake of the audience. This attitude is especially apparent in bluegrass ‘jams’ – unrehearsed and improvised performances which are endemic to the genre. Bluegrass jams are governed by a rich set of unwritten rules, but the general idea is that a group of musicians (often strangers to one another) sit in a circle and take turns selecting and singing a song (from a standard repertoire), with the rest of the group providing accompaniment and improvised solos. This mode of performance makes for a uniquely ephemeral music experience, but it also introduces a particular set of challenges.

  • What happend with Legos?

    What happened with Legos? The question implies a kind of grumpy nostalgia that I don't necessarily agree with, but underneath the back-in-my-day bluster there is an interesting question: how have Lego sets changed over the past several decades? There are a couple obvious differences: the introduction of sets featuring licensed content, models aimed at adult collectors, and non-smiley face minifigures, to name a few, but these are largely differences in marketing and branding. I'm interested in how Lego as a creative toy has changed over time.

  • 'Hot or Not' in Academic Research

    Success in graduate school is all about learning. This isn’t exactly news, but I think that the types of learning that contribute most to success aren’t what one would expect from the outside. In popular presentation, a grad student’s job is to learn everything there is to know about one extremely narrow topic, and eventually push the boundaries of human knowledge a tiny bit farther out. Being a good graduate student certainly requires that you learn the main research results of your field, however as I’ve spent more time in grad school I’ve come to understand that knowing the human context surrounding research results is equally important. Identifying potential collaborators (or competitors) who are working on similar problems can have a huge impact. Likewise, knowing what types of problems are currently in vogue can acutely affect a grad student’s academic prospects. As much as professors like to tout the intellectual freedom of academic research, people who sail with the prevailing winds go furthest.

  • Mario Kart and the Pareto Frontier

    Who is the best character in Mario Kart? This is actually a non-trivial question, because the characters have widely varying stats across a number of attributes. (For the unfamiliar, Mario Kart is a video game where you select characters from the Nintendo universe and race them against each other in cartoonish go-karts.) The question is compounded when you consider the modifications introduced by the the various karts and tires players can select from. In general it isn’t possible to optimize across multiple dimensions simultaneously, however some setups are undeniably worse than others. The question for an aspiring Mario Kart champion is “How can one pick a character / kart / tire combination that is in some sense optimal, even if there isn’t one ‘best’ option?”

  • Using Chicago's Open Data to Generate Policy Recommendations

    Chicago has a lot of cool things going on, but from a data science perspective one of the most exciting is the awesome Chicago Data Portal. Hundreds of datasets are available to download, covering all aspects of life in the city, from where potholes were patched in the past week to the temperature of the lake. Prompted by a question on the Chicago Ideas Week application, I want to use some of this data to address the following question: how can we use civic data to inform policy decisions at the city level, and in particular, can we use data to intelligently craft policies that benefit the most economically depressed regions of the city?

  • In-game coaching

    As I mentioned before, I’m a big Cornell basketball fan. One particular point of contention among Cornell basketball fans is whether the current coach Bill Courtney makes strategically sound decisions during games. The consensus among fans (or rather, the subset of fans that takes to the internet to complain) is that he’s a bad in-game coach, at least compared to the previous coach Steve Donahue. This certainly isn’t unique to Cornell – lots of fanbases are unhappy with their coach’s in-game coaching abilities (see Crean, Tom) – but Cornell presents a nice case for analysis because we have a (comparatively) decent amount of data from both B.C and A.D eras (Before Courtney, and After Donahue, natch).

  • Automated literature searches with Google Scholar

    Google Scholar is an amazing tool for looking up academic papers, it’s my go-to whether I need to find a particular paper or just explore the existing literature on some topic. One thing I’ve always wanted to be able to do is specify a list of articles I’m interested in and get a tailored set of recommendations based on just those papers. If you make an account on Google Scholar you do get recommendations, but the current setup works more like Netflix before they introduced separate profiles: the recommendations are a mashup of articles related to all the different topics you’ve searched for (or published on, if you link your papers to your account).

  • Home Court Advantage

    Who has the best home court advantage in college basketball? Duke? Kansas? Gonzaga? This can make for a fun bar conversation, mostly because you can make an argument for so many of the top teams. These arguments are typically qualitative - “Duke’s student section is the best in the country” or “Allen Fieldhouse is insanely loud. I mean seriously, insanely loud.” Since I have the box score data from every regular season college basketball game in the past seven years I’m going to take a more statistical approach.

  • Roster Heatmaps

    I’m a big Cornell basketball fan. A few years ago (ca. 2007-2010) this was a lot of fun. These days not so much; Cornell hasn’t finished in the top half of its conference since 2010. Part of this is certainly regression to the mean – Cornell going to the Sweet Sixteen (like they did in 2010) was probably a once in a lifetime kind of thing – but the consensus among the fans is that the departure of the old coach, and more to the point, the presence of the new coach, are a big factors in the continuing mediocrity of Cornell basketball.