Bi-affine Relation Attention Network (BRAN) is a model which simultaneously predicts relationships between all mention pairs in a document. We form pairwise predictions over entire paper abstracts using an efficient self-attention encoder. All-pairs mention scores allow us to perform multi-instance learning by aggregating over mentions to form entity pair representations. We further adapt to settings without mention-level annotation by jointly training to predict named entities and adding a corpus of weakly labeled data.


Distributional inclusion vector embedding (DIVE) is a unsupervised method of hypernym discovery via per-word non-negative vector embeddings which preserve the inclusion property of word contexts in a low-dimensional and interpretable space. It can also be viewed as an unsupervised method which compresses sparse bag of words by grouping words into topics and makes the visualization of word co-occurring statistics much more easier.


MINERVA is a RL agent which answers queries in a knowledge graph of entities and relations. Starting from an entity node, MINERVA learns to navigate the graph conditioned on the input query till it reaches the answer entity. For example, give the query, (Colin Kaepernick, PLAYERHOMESTADIUM, ?), MINERVA takes the path in the knowledge graph below as highlighted. Note: Only the solid edges are observed in the graph, the dashed edges are unobsrved


PERCH is a new non-greedy algorithm for online hierarchical clustering that scales to both massive number of samples and number of clusters. Please see our introduction video, paper, or talk for more detials.

Dilated CNN for Named Entity Recognition

ID-CNN is designed to have better capacity than traditional CNNs for large context and structured prediction, and is a faster alternative to Bi-LSTMs for NER. Please see our paper for more details.


TextKBQA is a tensorflow implementation of the paper “Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks”.


ChainsofReasoning combines the rich multistep inference of symbolic logical reasoning with the generalization capabilities of neural networks. Please see our paper for more details.

Compositional Universal Schema

Compositional Universal Schema performs relation extraction via matrix factorization and LSTM, which allows us to predict the relation of unseen sentences and improve the coverage of universal schema. Please see our paper for more details.


Structured Prediction Energy Networks (SPENs) are a flexible, expressive approach to structured prediction. A deep architecture is used to define an energy function of candidate labels, and then predictions are produced by using backpropagation to iteratively optimize the energy with respect to the labels. Please see our paper for more details.


FACTORIE is a toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating relational factor graphs, estimating parameters and performing inference.


MALLET is a Java toolkit for machine learning applied to natural language. It provides facilities for document classification, information extraction, part-of-speech tagging, noun phrase segmentation, general finite state transducers and classification, and much more – all designed to be extremely efficient for large data and feature sets.