Natural Language Relation Extraction and Implicature through "Universal Schema" using Embeddings

Acknowledgment: This material is based upon work supported by the National Science Foundation under Grant No. 1514053.

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Award number: 1514053

Duration: September 1, 2015 — August 31, 2019 (Estimated)

Awarded Amount to Date: $1,000,000.00

Award title: Constructing Knowledge Bases by Extracting Entity-Relations and Meanings from Natural Language via “Universal Schema”

PI: Andrew McCallum

Student(s): Amirmohammad Rooshenas, Haw-Shiuan Chang, Luke M. Vilnis and Patrick W. Verga

Project Goals:
The major goal of the project is to build a new foundation for knowledge representation and reasoning, based on deep learning.
Automated knowledge base (KB) construction from natural language is of fundamental importance to (a) scientists (for example, there has been long-standing interest in building KBs of genes and proteins), (b) social scientists (for example, building social networks from textual data), and (c) national defense (where network analysis of criminals and terrorists have proven useful). The core of a knowledge base is its objects ("entities", such as proteins, people, organizations and locations) and its connections between these objects ("relations", such as one protein increasing production of another, or a person working for an organization). This project aims to greatly increase the accuracy with which entity-relations can be extracted from text, as well as increase the fidelity which many subtle distinctions among types of relations can be represented. The project's technical approach---which we call "universal schema"---is a markedly novel departure from traditional methods, based on representing all of the input relation expressions as positions in a common multi-dimensional space, with nearby relations having similar meanings. Broader impacts will include collaboration with industry on applications of economic importance, collaboration with academic non-computer-scientists on a multidisciplinary application, creating and publicly releasing new data sets for benchmark evaluation by ourselves and others (enabling scientific progress through improved performance comparisons), creating and publicly releasing an open-source implementation of our methods (enabling further scientific research, easy large-scale use, rapid commercialization and third-party enhancements). Education impacts include creating and teaching a new course on knowledge base construction for the sciences, organizing a research workshop on embeddings, extraction and knowledge representation, and training multiple undergraduates and graduate students.
Most previous research in relation extraction falls into one of two categories. In the first, one must define a pre-fixed schema of relation types (such as lives-in, employed-by and a handful of others), which limits expressivity and hides language ambiguities. Training machine learning models here either relies on labeled training data (which is scarce and expensive), or uses lightly-supervised self-training procedures (which are often brittle and wander farther from the truth with additional iterations). In the second category, one extracts into an ‘‘open’’ schema based on language strings themselves (lacking ability to generalize among them), or attempts to gain generalization with unsupervised clustering of these strings (suffering from clusters that fail to capture reliable synonyms, or even find the desired semantics at all). This project proposes research in relation extraction of ‘‘universal schema,’’ where we learn a generalizing model of the union of all input schemas, including multiple available pre-structured KBs as well as all the observed natural language surface forms. The approach thus embraces the diversity and ambiguity of original language surface forms (not trying to force relations into pre-defined boxes), yet also successfully generalizes by learning non-symmetric implicature among explicit and implicit relations using new extensions to the probabilistic matrix factorization and vector embedding methods that were so successful in the NetFlix prize competition. Universal schema provide for a nearly limitless diversity of relation types (due to surface forms), and support convenient semi-supervised learning through integration with existing structured data (i.e. the relation types of existing databases). In preliminary experiments, the approach already surpassed by a wide margin the previous state-of-the-art relation extraction methods on a benchmark task. New proposed research includes new training processes, new representations that include multiple-senses for the same surface form as well as embeddings with variances, new methods of incorporating constraints, joint inference between entity- and relation-types, new models of non-binary and higher-order relations, and scalability through parallel distribution.

Current Results:
Universal schema construct knowledge bases by extracting entities, their types, and their relations from text and embedding them in the same space as existing structured data. We improved methods for modeling input text efficiently over large contexts to reduce ambiguity, modeled type hierarchies explicitly in the embedded space to improve representations, and leveraged the power of universal schema to improve question answering.
Our previous work reasons over arbitrary textual patterns using recurrent neural networks. However, this work -- and similar related work -- is restricted to looking at only local context. Typically, this means a single sentence that contains a single entity pair mention. However, in many instances, adjacent sentences or even the entire document are required to adequately disambiguate the type of the entity or define the context in which a relation type holds. Additionally, in domains such as biomedicine, many relation types are expressed across sentence boundaries.
Extending universal schema to consider greater context, we proposed Iterated Dilated CNNs: a distinct combination of network structure, parameter sharing and training procedures that enable dramatic 14-20x test-time speedups by improving parallelizability, facilitating tagging using context from the entire document. In extensive experiments on both CoNLL-2003 and OntoNotes English NER, we show that we can improve performance on named entity recognition by incorporating contextual information from the entire document rather than just a single sentence.
We can perform end-to-end relation using a similar feed-forward architecture to jointly extract both entities and their relations. We use self-attention to encode entire abstracts from biomedical science articles. This architecture can be even more computationally efficient than the convolutional neural network due to its ability to encode the full document context in a shallower feed-forward network. Shared encoded token representations are used to predict both named entities and relations between them.
This model is also unique to previous approaches in that it makes relation predictions for all mention pairs in a document simultaneously. In this way the model can predict long distance relations, use far-reaching context for disambiguation, and consider the structure of the local document graph over all mentions and entities. We use this model to achieve state of the art results on the Biocreative V task on chemical-disease interactions. We also construct and evaluate our model on a large-scale dataset containing genes, chemicals, and diseases.
Until now, universal schema learned embeddings of entities and relations without an explicit model of hierarchical information. However, this does not take advantage of existing rich structured hierarchical information which can be leveraged for reasoning, such as inferring an entity's properties from its type’s hypernyms. Previous work used Order Embeddings for explicit hierarchical modeling in the embedding space. However, they focus on training using structured data only. By using Universal Schema to jointly embed structured data with unstructured text, we improve the hierarchical embedded space with increased performance on WordNet and ConceptNet.
In addition to explicitly modeling and exploiting known hierarchical information, we developed methods for unsupervised hierarchy discovery. This is particularly important in domains where existing hierarchical information is sparse or absent. Using a large amount of of unlabeled data to automatically construct hierarchies, our model performs as well as a semi-supervised method on numerous datasets including common sense and medical data.
Recent work on question answering has used neural networks to learn a mapping function from natural language questions to select answers from a structured knowledge source. However, using only existing curated structured sources limits coverage and does not take advantage of the massive amounts of rich information in raw text. We instead extend universal schema to answer questions using both text and structured knowledge, seeing a significant increase in performance over using structured knowledge alone.
Universal schema predicts the types of entities and relations in a knowledge base (KB) by jointly embedding the union of all available schema types—not only types from multiple structured databases (such as Freebase or Wikipedia infoboxes), but also types expressed as textual patterns from raw text. This prediction is typically modeled as a matrix completion problem, with one type per column, and either one or two entities per row (in the case of entity types or binary relation types, respectively). Factorizing this sparsely observed matrix yields a learned vector embedding for each row and each column.
We extended our universal schema model to learn embeddings of contextual linguistic features beyond the relational surface forms. We will then include these embeddings in the row representation that is used to predict relation types (columns). For training data that has no available surface text, such as when using an existing knowledge base, we simply ignore the contextual embeddings. This enables us to maintain the flexibility of Universal Schema for mixing structured and unstructured data, but also employ more expressive models.
Our work introduces significant improvements to the coverage and flexibility of universal schema relation extraction: predictions for entities unseen in training and multilingual transfer learning to domains with no annotation. We evaluated our model through extensive experiments on the English and Spanish TAC KBP benchmark, outperforming the top system from TAC 2013 slot-filling using no handwritten patterns or additional annotation. We also considered a multi-lingual setting in which English training data entities overlap with the seed KB, but Spanish text does not. Despite having no annotation for Spanish data, we trained an accurate predictor, with additional improvements obtained by tying word embeddings across languages. Furthermore, we found that multilingual training improves English relation extraction accuracy. Our approach is thus suited to broad-coverage automated knowledge base construction in a variety of languages and domains.
We explored the problem of making predictions for entities or entity-pairs unseen at training time (and hence without a pre-learned row embedding). We propose an approach having no per-row parameters at all; rather we produce a row vector on the fly using a learned aggregation function of the vectors of the observed columns for that row. We experimented with various aggregation functions, including neural network attention models. Our approach can be understood as a natural language database, in that questions about KB entities are answered by attending to textual or database evidence. In experiments predicting both relations and entity types, we demonstrated that despite having an order of magnitude fewer parameters than traditional universal schema, we can match the accuracy of the traditional model, and more importantly, we can now make predictions about unseen rows with nearly the same accuracy as rows available at training time.

Publications:
Shikhar Murty, Patrick Verga, Luke Vilnis, I Radovanovic, Andrew McCallum. Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking. The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018). Melbourne, Australia. July 15th to 20th, 2018.
Luke Vilnis, Xiang Li, Shikar Murty, Andrew McCallum. Probabilistic Embedding of Knowledge Graphs with Box Lattice Measures. The 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018). Melbourne, Australia. July 15th to 20th, 2018.
Patrick Verga, Emma Strubell, Andrew McCallum. Simultaneously Self-Attending to All Mentions for Full-Abstract Biological Relation Extraction. The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). New Orleans, June 1 to June 6, 2018.
Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, Andrew McCallum. Distributional Inclusion Vector Embedding for Unsupervised Hypernymy Detection. The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). New Orleans, June 1 to June 6, 2018.
Amirmohammad Rooshenas, Aishwarya Kamath, Andrew McCallum. Training Structured Prediction Energy Networks with Indirect Supervision. The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2018). New Orleans, June 1 to June 6, 2018.
Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum. Go for a walk and arrive at the answer: Reasoning over paths in knowledge bases using reinforcement learning. Sixth International Conference on Learning Representations (ICLR 2018). Vancouver Canada. April 30 to May 03, 2018.
Shikhar Murty, Patrick Verga, Luke Vilnis, Andrew McCallum. Finer Grained Entity Typing with TypeNet. 6th Workshop on Automated Knowledge Base Construction (AKBC) 2017. Long Beach, California. December 8th, 2017.
Trapit Bansal, Arvind Neelakantan, Andrew McCallum. RelNet: End-to-End Modeling of Entities & Relations. 6th Workshop on Automated Knowledge Base Construction (AKBC) 2017. Long Beach, California. December 8th, 2017.
Rajarshi Das, Shehzaad Dhuliawala, Manzil Zaheer, Luke Vilnis, Ishan Durugkar, Akshay Krishnamurthy, Alex Smola, Andrew McCallum. Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases with Reinforcement Learning. 6th Workshop on Automated Knowledge Base Construction (AKBC) 2017. Long Beach, California. December 8th, 2017.
Haw-Shiuan Chang, Amol Agrawal, Ananya Ganesh, Anirudha Desai, Vinayak Mathur, Alfred Hough, Andrew McCallum. c. TextGraphs 2018: the Workshop on Graph-based Methods for Natural Language Processing at NAACL. New Orleans, June 6, 2018.
Patrick Verga, Emma Strubell, Ofer Shai, and Andrew McCallum (2017) Attending to All Mention Pairs for Full Abstract Biological Relation Extraction. AKBC 2017
Emma Strubell, and Andrew McCallum (2017) Dependency Parsing with Dilated Iterated Graph CNNs 2nd Workshop on Structured Prediction for NLP, EMNLP 2017
Traylor, Aaron, Nicholas Monath, Rajarshi Das, and Andrew McCallum (2017) Learning String Alignments for Entity Aliases. AKBC 2017
Haw-Shiuan Chang, Erik Learned-Miller, and Andrew McCallum (2017) Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples. NIPS 2017
Patrick Verga, Arvind Neelakantan, and Andrew McCallum (2017) Generalizing to Unseen Entities and Entity Pairs with Row-less Universal Schema. EACL 2017
Haw-Shiuan Chang, ZiYun Wang, Luke Vilnis, and Andrew McCallum (2017) Unsupervised Hypernym Detection by Distributional Inclusion Vector Embedding. ArXiv preprint (ArXiv) 2017
Patrick Verga and Andrew McCallum (2016). Row-less Universal Schema. Automated Knowledge Base Construction (AKBC) workshop at NAACL 2016
Rajarshi Das, Arvind Neelakantan, David Belanger and Andrew McCallum (2016) Incorporating Selectional Preferences in Multi-hop Relation Extraction. Automated Knowledge Base Completion (AKBC) 2016
Patrick Verga , David Belanger, Emma Strubell, Benjamin Roth, and Andrew McCallum (2016) Multilingual Relation Extraction using Compositional Universal Schema. NAACL 2016.

Software:
Compositional Univeral Schema
BRAN full abstract biological entity and relation extraction
DIVE
MINERVA RL path reasoning

Data:
TypeNet
MedMentions

Evaluation:
Best performance on Spanish relation extraction at TAC 2016
(Up to 11/30/2017) Best performance on English relation extraction at the crowdsourcing based evaluation from Stanford NLP group.

Point of Contact: Andrew McCallum

Date of Last Update: 08/01/2018