This is a dataset containing a large number of BibTex files downloaded from the internet. The dataset has been used for large-scale entity resolution (see the publications below).
Number of BibTex files: 4,387
Number of Papers: 607,335 (correctly parsed)
Number of Authors: 1,313,517
The archive files is available here: bibtex.tar.gz (87MB, 385MB uncompressed), MD5 Checksum: 7dfea8b8228dc55b2d6173aa4484becd
If you use this data in your papers, please use the following citation: bib
This dataset can be processed by any BibTex parser. The Factorie library contains the parser we used (
The following papers use this dataset. If you are using this dataset, and would like your paper to be added, let us know.