dIESL

dIESL is a working group at the Information Extraction and Synthesis Lab (IESL), UMass Amherst, studying non-autoregressive (NAR) language models — masked diffusion, insertion, and edit-based generation. The name is diffusion at IESL; the engine reference is on us.

Autoregressive models produce text strictly left to right. We are interested in models that can do better: generate tokens in flexible orders, fill in arbitrary-length gaps, revise prior decisions, and exploit pairwise relative-position structure rather than absolute positions. Concretely, we work on

Masked diffusion — sampling strategies, classifier-free guidance, and inference-time scaling.
Insertion language models — variable-length generation through token injection, derived from continuous-time Markov chains on sequences of variable length.
Edit / substitution models — generation as iterative refinement, with an API that resembles a stream of diff-style operations on the current draft.
Software for NAR research — see xlm-core, our modular framework for training and comparing NAR language models.

people

Person	Affiliation
Andrew McCallum	UMass Amherst
Dhruvesh Patel	UMass Amherst
Benjamin Rozonoyer	UMass Amherst
Avinash Amballa	UMass Amherst
Neil Band	Stanford
Joey Bose	Imperial College London, Mila
Sai Sreenivas Chintha	UMass Amherst
Soumitra Das	UMass Amherst
Durga Prasad Maram	UMass Amherst
Jacopo Minniti	University of Toronto
Tahira Naseem	IBM Research
Gaurav Pandey	IBM Research
Tim G. J. Rudner	University of Toronto, Vijil
Aishwarya Sahoo	UMass Amherst
Md Arafat Sultan	IBM Research
Ramón Fernandez Astudillo	IBM Research

news

Mar 01, 2026	xLM is presented at EACL 2026 (System Demonstrations track) in Rabat, Morocco. Paper · Code
Feb 01, 2026	A Continuous Time Markov Chain Framework for Insertion Language Models accepted as a Spotlight at AISTATS 2026.
May 08, 2025	Released the Insertion Language Models preprint — sequence generation with arbitrary-position insertions.

selected publications

SPIGM @ ICML
Learned Relay Representations for Forward-Thinking Discrete Diffusion Models

Benjamin Rozonoyer, Jacopo Minniti, Dhruvesh Patel, and 4 more authors

In Structured Probabilistic Inference & Generative Modeling (SPIGM) and Frontiers in Generative AI (FoGen) Workshops at ICML, Jul 2026

Accepted to SPIGM @ ICML and FoGen @ ICML

Abs Bib HTML PDF Blog Code

When Masked Diffusion Models (MDMs) generate sequences through iterative refinement, the rich internal computation over masked positions is discarded—forcing every subsequent refinement step to recompute the valuable internal information stored as model representations. To avoid a hard reset between denoising rounds, we propose Learned Relay Representations (Relay), a method that allows MDMs to be “forward-thinking” when denoising—explicitly learning how to propagate latent information for the benefit of future denoising steps. Relay introduces a differentiable per-token channel that passes information between forward passes and is trained via truncated backpropagation through time (BPTT). We show that this framework can be scaled to state-of-the-art Diffusion Language Models (DLMs), and is seamlessly compatible with techniques like block diffusion and KV caching. We first provide a thorough justification of the design choices in Relay on a challenging Sudoku-based planning task. We then scale Relay to Fast-dLLM v2, a state-of-the-art DLM, outperforming standard supervised finetuning on coding tasks while reducing the inference latency by up to 32%. Our empirical results demonstrate that state-of-the-art DLMs can be explicitly trained to relay latent information forward across decoding steps, advancing the performance-latency Pareto frontier. We provide code for all our experiments.
@inproceedings{rozonoyer2026relay, title = {Learned Relay Representations for Forward-Thinking Discrete Diffusion Models}, author = {Rozonoyer, Benjamin and Minniti, Jacopo and Patel, Dhruvesh and Band, Neil and Bose, Joey and Rudner, Tim G. J. and McCallum, Andrew}, booktitle = {Structured Probabilistic Inference \& Generative Modeling (SPIGM) and Frontiers in Generative AI (FoGen) Workshops at ICML}, year = {2026}, month = jul, note = {Accepted to SPIGM @ ICML and FoGen @ ICML}, url = {https://arxiv.org/abs/2605.22967}, }
EACL Demo
xLM: A Python Package for Non-Autoregressive Language Models

Dhruvesh Patel, Durga Prasad Maram, Sai Sreenivas Chintha, and 2 more authors

In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations), Mar 2026

Abs DOI Bib HTML PDF Code

In recent years, there has been a resurgence of interest in non-autoregressive text generation in the context of general language modeling. Unlike the well-established autoregressive language modeling paradigm, which has a plethora of standard training and inference libraries, implementations of non-autoregressive language modeling have largely been bespoke, making it difficult to perform systematic comparisons of different methods. Moreover, each non-autoregressive language model typically requires its own data collation, loss, and prediction logic, making it challenging to reuse common components. In this work, we present the xLM Python package, designed to make implementing small non-autoregressive language models faster, with a secondary goal of providing a suite of small pre-trained models (through a companion package) that can be used by the research community.
@inproceedings{patel2026xlm, title = {{xLM}: A {P}ython Package for Non-Autoregressive Language Models}, author = {Patel, Dhruvesh and Maram, Durga Prasad and Chintha, Sai Sreenivas and Rozonoyer, Benjamin and McCallum, Andrew}, booktitle = {Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 3: System Demonstrations)}, year = {2026}, month = mar, pages = {445--456}, address = {Rabat, Morocco}, publisher = {Association for Computational Linguistics}, doi = {10.18653/v1/2026.eacl-demo.31}, url = {https://aclanthology.org/2026.eacl-demo.31/}, }
AISTATS
Spotlight
A Continuous Time Markov Chain Framework for Insertion Language Models

Dhruvesh Patel, Benjamin Rozonoyer, Soumitra Das, and 3 more authors

In Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS), 2026

Abs Bib HTML PDF Code

Insertion Language Models (ILMs) offer several advantages over left-to-right generation and mask-based generation. However, existing formulations of insertion-based generation have largely been ad-hoc. In this paper, we derive a diffusion-style denoising objective for ILMs from first principles by formulating the noising process as a continuous-time Markov chain on the space of variable-length sequences. We show that previous formulations of ILMs can be viewed as special cases of this denoising framework. Through empirical evaluation on a synthetic planning task, we show that the proposed approach retains the benefits of insertion-based generation over left-to-right generation and masked diffusion models. In language modeling, our diffusion-based approach is competitive with left-to-right generation and masked diffusion models, while offering additional flexibility in sampling compared to existing insertion language models.
@inproceedings{patel2026ctmc, title = {A Continuous Time {M}arkov Chain Framework for Insertion Language Models}, author = {Patel, Dhruvesh and Rozonoyer, Benjamin and Das, Soumitra and Naseem, Tahira and Rudner, Tim G. J. and McCallum, Andrew}, booktitle = {Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS)}, year = {2026}, note = {Spotlight}, url = {https://openreview.net/forum?id=nCyV21FmUI}, }
ICML
Insertion Based Sequence Generation with Learnable Order Dynamics

Dhruvesh Patel, Benjamin Rozonoyer, Gaurav Pandey, and 3 more authors

In Proceedings of the 43rd International Conference on Machine Learning, Jul 2026

Abs Bib HTML PDF

In many domains, generating variable-length sequences through insertions provides greater flexibility over autoregressive models. However, the action space of insertion models is much larger than that of autoregressive models, making learning challenging. To address this, we incorporate trainable order dynamics into the target rates for discrete flow matching, and show that with suitable choices of parameterizations, joint training of the target order dynamics and the generator is tractable without the need for numerical simulation. As the generative insertion model, we use a variable-length masked diffusion model that generates by inserting and filling mask tokens. On graph traversal tasks for which a locally optimal insertion order is known, we explore the choices of parameterization empirically and demonstrate the trade-offs between flexibility, training stability and generation quality. On de novo small molecule generation, we find that the learned order dynamics lead to an increase in the number of valid molecules generated, when compared to uniform order dynamics.
@inproceedings{patel2026learnableorder, title = {Insertion Based Sequence Generation with Learnable Order Dynamics}, author = {Patel, Dhruvesh and Rozonoyer, Benjamin and Pandey, Gaurav and Naseem, Tahira and Fernandez Astudillo, Ram{\'o}n and McCallum, Andrew}, booktitle = {Proceedings of the 43rd International Conference on Machine Learning}, year = {2026}, month = jul, url = {https://arxiv.org/abs/2602.18695}, }
SPIGM @ NeurIPS
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions

Dhruvesh Patel, Aishwarya Sahoo, Avinash Amballa, and 3 more authors

In Structured Probabilistic Inference & Generative Modeling Workshop at NeurIPS, 2025

Abs Bib HTML PDF Website

Autoregressive models (ARMs) predict subsequent tokens one-by-one “from left to right.” Masked Diffusion Models (MDMs) can generate tokens in arbitrary order, but unmasking multiple tokens simultaneously can introduce incoherence, and MDMs cannot handle arbitrary infilling constraints when the number of tokens to be filled is not known in advance. We introduce Insertion Language Models (ILMs), which learn to insert tokens at arbitrary positions in a sequence—jointly selecting both the position and the vocabulary element to be inserted. By inserting tokens one at a time, ILMs can represent strong dependencies between tokens, and their ability to generate sequences in arbitrary order allows them to accurately model sequences whose token dependencies do not follow a left-to-right sequential structure. To train ILMs, we propose a tailored network parameterization and use a simple denoising objective. Our empirical evaluation demonstrates that ILMs outperform both ARMs and MDMs on common planning tasks. Furthermore, ILMs outperform MDMs and perform on par with ARMs on unconditional text generation while offering greater flexibility than MDMs in arbitrary-length text infilling.
@inproceedings{patel2025ilm, title = {Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions}, author = {Patel, Dhruvesh and Sahoo, Aishwarya and Amballa, Avinash and Naseem, Tahira and Rudner, Tim G. J. and McCallum, Andrew}, booktitle = {Structured Probabilistic Inference \& Generative Modeling Workshop at NeurIPS}, year = {2025}, url = {https://arxiv.org/abs/2505.05755}, }
SPIGM @ NeurIPS
Improved Sampling from Masked Diffusion Models with Position Contrastive Guidance

Dhruvesh Patel, Tahira Naseem, Gaurav Pandey, and 3 more authors

In Structured Probabilistic Inference & Generative Modeling Workshop at NeurIPS, 2025

Abs Bib HTML PDF

Masked Diffusion Models (MDMs), which generate multiple tokens at a time, hold the promise of accelerating text generation. However, the performance of MDMs is sensitive to the order in which the tokens are generated. We observe that MDMs are overconfident about the masked positions on the extreme ends of the output sequence. MDMs also express uncertainty by producing similar probability scores for tokens regardless of the query position. Utilizing these insights, we propose Position Contrastive Guidance, which has two components: a soft order bias that favors left-to-right decoding, and a novel classifier-free-guidance that renormalizes the probabilities using position uncertainty to generate more informative tokens earlier in the generation. Our approach can be easily plugged into any existing uncertainty-guided sampling strategy. Experiments on GSM8k, MATH500, and HumanEval show that PCG improves both accuracy and throughput for the base and instruct versions of DREAM-7B and LLaDA-8B models.
@inproceedings{patel2025pcg, title = {Improved Sampling from Masked Diffusion Models with Position Contrastive Guidance}, author = {Patel, Dhruvesh and Naseem, Tahira and Pandey, Gaurav and Sultan, Md Arafat and McCallum, Andrew and Fernandez Astudillo, Ram{\'o}n}, booktitle = {Structured Probabilistic Inference \& Generative Modeling Workshop at NeurIPS}, year = {2025}, url = {https://openreview.net/forum?id=e0WmOrWbtc}, }