Information Extraction and Synthesis Laboratory (IESL)

Information -- A collection of facts, relations or events from which conclusions may be drawn. Knowledge that has been gathered or received.
Extraction -- Obtaining materials in concentrated, usable form from a dilluted, unusable source.
Synthesis -- The combining of separate elements or substances to form a coherent whole. Reasoning from the general to the particular; logical deduction.
Laboratory -- An organization performing scientific experimentation and research.

IESL aims to dramatically increase our ability to mine actionable knowledge from unstructured text. We are especially interested in information extraction from the Web, understanding the connections between people and between organizations, expert finding, social network analysis, and mining the scientific literature and community. We develop and employ various methods in statistical machine learning, natural language processing and information retrieval. We tend toward probabilistic approaches, graphical models, and Bayesian methods.

Openings

The lab currently has an opening for a postdoc.  Research in machine learning, natural language processing, lightly-supervised learning, automatic knowledge base construction, approximate inference, parallel & distributed inference; topic models, scientometrics.  Contact Andrew McCallum by email for more information.

News

  • Andrew McCallum will be the General Chair of ICML 2012, with Program Chairs Joelle Pineau and John Langford.
  • FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.
  • Generalized Expectation is an accurate way to train models by labeling features.
  • We have publicly launched Rexa, a research paper search engine. It provides search and browsing over multiple "object types", including not only papers, but also people, grants and topics.  In current work we are leveraging our new research in probabilistic databases to create Rexa 2.0.
  • Charles Sutton and I have a comprehensive introduction to conditional random fields, a book chapter in Lise Getoor and Ben Taskar's book on statistical relational learning.
  • McCallum has written an introduction to information extraction by machine learning, intended for an audience that doesn't know machine learning. Information Extraction: Distilling Structured Data from Unstructured Text . Andrew McCallum. ACM Queue, Volume 3, Number 9, November 2005.
  • MALLET is a Java toolkit for machine learning applied to natural language. It provides facilities for document classification, information extraction, part-of-speech tagging, noun phrase segmentation, general finite state transducers and classification, and much more---all designed to be extremely efficient for large data and feature sets.