This is a dataset containing a large number of BibTex files downloaded from the internet. The dataset has been used for large-scale entity resolution (see the publications below).
The archive files is available here: bibtex.tar.gz (87MB, 385MB uncompressed), MD5
If you use this data in your papers, please use the following citation: bib
This dataset can be processed by any BibTex parser. The Factorie library contains the parser we used (
cc.factorie.app.bib.BibReader.loadBibTexDir*), along with code to construct the variables and the model for author disambiguation.
The following papers use this dataset. If you are using this dataset, and would like your paper to be added, let us know.