xlm-core
The xLM framework for non-autoregressive language models — install via pip as xlm-core.
xlm-core is a modular, research-friendly framework for developing and comparing non-autoregressive (NAR) language models. It is built on PyTorch and PyTorch Lightning, with Hydra for configuration management.
The goal of xlm-core is to make it effortless to experiment with NAR architectures. Each model is broken into four interchangeable components — model, loss, predictor, and collator — so that variations along any one of these axes can be implemented and ablated independently.
Available model families
- mlm — Masked Language Model (BERT-style)
- arlm — Autoregressive baseline (left-to-right)
- mdlm — Masked Diffusion LM
- ilm — Insertion Language Model
- flexmdm — Flexible Masked Diffusion Model
Quick start
pip install xlm-core
pip install xlm-models
A complete workflow on the LM1B dataset:
xlm job_type=prepare_data job_name=lm1b_ilm experiment=lm1b_ilm
xlm job_type=train job_name=lm1b_ilm experiment=lm1b_ilm
xlm job_type=eval job_name=lm1b_ilm experiment=lm1b_ilm +eval.ckpt_path=<CKPT>
xlm job_type=generate job_name=lm1b_ilm experiment=lm1b_ilm +generation.ckpt_path=<CKPT>
Links
- Source code: github.com/dhruvdcoder/xlm-core
- Paper (EACL 2026 System Demonstrations): xLM: A Python Package for Non-Autoregressive Language Models
Cite
If you find xlm-core useful in your research, please cite:
@inproceedings{patel2026xlm,
title = {{xLM}: A {P}ython Package for Non-Autoregressive Language Models},
author = {Patel, Dhruvesh and Maram, Durga Prasad and Chintha, Sai Sreenivas
and Rozonoyer, Benjamin and McCallum, Andrew},
booktitle = {Proceedings of the 19th Conference of the European Chapter of the
Association for Computational Linguistics (Volume 3: System Demonstrations)},
year = {2026},
pages = {445--456},
publisher = {Association for Computational Linguistics}
}