Google Scholar is an amazing tool for looking up academic papers, it’s my go-to whether I need to find a particular paper or just explore the existing literature on some topic. One thing I’ve always wanted to be able to do is specify a list of articles I’m interested in and get a tailored set of recommendations based on just those papers. If you make an account on Google Scholar you do get recommendations, but the current setup works more like Netflix before they introduced separate profiles: the recommendations are a mashup of articles related to all the different topics you’ve searched for (or published on, if you link your papers to your account).

bibcheck.py is a home-grown implementation of this feature. It takes as input a BibTeX bibliography (.bib) file and outputs a list of papers that might be relevant to the papers contained in the bibliography.

The recommendations are generated by triangle closing. Briefly, we consider each paper in the bibliography as a vertex in a network, along with the bibliography itself. Edges connect the bibliography to each paper it cites. Each cited paper also has edges connecting it to other papers which include it in their bibliographies. To make recommendations, the code searches for other vertices, which when connected to the vertex of the supplied bibliography, would create multiple triangles in the network.

A more intuitive example: if you know Alice, Bob, Carol, and Dan, and Elise knows Bob, Carol, and Dan, then you are likely to know Elise. A hypothetical link between you and Elise would create three new triangles: (you, Bob, Elise), (you, Carol, Elise), and (you, Dan, Elise).

There are certainly more sophisticated ways to make recommendations, but even this simple method generates useful results. An example output is shown below:

>>> bibcheck.py -rmax 80 test.bib
Trying to get cluster ID for article 42/42
Getting citations for article 10/10
Citations shared   Title
3                  Facile formation of graphene p–n junctions using self-assembled monolayers
2                  Voltage-Controlled Ferroelastic Switching in Pb (Zr0. 2Ti0. 8) O3 Thin Films
2                  Photochemical doping and tuning of the work function and dirac point in graphene using photoacid and photobase generators
2                  Homo-and Hetero-p–n Junctions Formed on Graphene Steps
2                  Optoelectrical Molybdenum Disulfide (MoS2) Ferroelectric Memories
2                  Orientation-dependent structural phase diagrams and dielectric properties of PbZr 1− x Ti x O 3 polydomain thin films

The script found all 42 papers in the bibliography. Of those 42, 10 had fewer than 80 citations (this limit is to exclude review and other highly cited articles, first because searching for all of their citing articles would take forever, and second because shared citations of review articles don’t carry much information). For each of those 10 articles, the script found the title of every citing article, and finally searched for commonalities.

For this example I used the bibliography from a paper titled ‘Single Gate PN Junctions in Graphene-Ferroelectric Devices’. Without knowing anything about the content of the paper we can see that the recommendations are pretty relevant.

bibcheck.py is built on top of a couple great python libraries; thanks to ckreibich for building scholar.py and to sciunto for building python-bibtexparser.

If you want to give it a try head on over the the github repository.