Datasets‎ > ‎

BibTex Dataset

This is a dataset containing a large number of BibTex files downloaded from the internet. The dataset has been used for large-scale entity resolution (see the publications below).

Statistics

Number of BibTex files:    4,387
Number of Papers:           607,335 (correctly parsed)
Number of Authors:          1,313,517

Download

The archive files is available here: bibtex.tar.gz (87MB, 385MB uncompressed), MD5 Checksum: 7dfea8b8228dc55b2d6173aa4484becd
If you use this data in your papers, please use the following citation: bib

Code

This dataset can be processed by any BibTex parser. The Factorie library contains the parser we used (cc.factorie.app.bib.BibReader.loadBibTexDir*), along with code to construct the variables and the model for author disambiguation.

Papers

The following papers use this dataset. If you are using this dataset, and would like your paper to be added, let us know.
The BibTex files were downloaded and aggregated by Michael Wick. This website is maintained by Michael Wick and Sameer Singh.