Information Extraction and Synthesis Laboratory

Information A collection of facts, relations or events from which conclusions may be drawn. Knowledge that has been gathered or received.

Extraction Obtaining materials in concentrated, usable form from a dilluted, unusable source.

Synthesis The combining of separate elements or substances to form a coherent whole. Reasoning from the general to the particular; logical deduction.

Laboratory An organization performing scientific experimentation and research.

IESL aims to dramatically increase our ability to mine actionable knowledge from unstructured text. We are especially interested in information extraction from the Web, understanding the connections between people and between organizations, expert finding, social network analysis, and mining the scientific literature and community. We develop and employ various methods in statistical machine learning, natural language processing and information retrieval. We tend toward probabilistic approaches, graphical models, and Bayesian methods.

IESL Group, 2016 Retreat
  • Structured Prediction Energy Networks [Belanger, McCallum ICML 2016] are an alternative to graphical models, leveraging deep learning to discover rich dependencies among output variables.
  • Our research on universal schema is currently at the top of the Stanford KBP leaderboard! Congratulations to Haw-Shiuan Chang, Pat Verga, Emma Strubell, Nick Monath, and the other IESL students who worked on this.
  • OpenReview.net is hosting reviewing for ICLR 2017, as well as the upcoming UAI 2017.
  • FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
  • Generalized Expectation is an accurate way to train models by labeling features.
  • We have publicly launched Rexa, a research paper search engine. It provides search and browsing over multiple “object types”, including not only papers, but also people, grants and topics. In current work we are leveraging our new research in probabilistic databases to create Rexa 2.0.
  • Charles Sutton and I have a comprehensive introduction to conditional random fields, a book chapter in Lise Getoor and Ben Taskar’s book on statistical relational learning.
  • McCallum has written an introduction to information extraction by machine learning, intended for an audience that doesn’t know machine learning.
  • MALLET is a Java toolkit for machine learning applied to natural language. It provides facilities for document classification, information extraction, part-of-speech tagging, noun phrase segmentation, general finite state transducers and classification, and much more—all designed to be extremely efficient for large data and feature sets.