NLP for living texts, in context.

Natural language processing (NLP) is a subfield of AI and computational linguistics dedicated to the analysis and generation of natural texts. For most of its history, NLP has seen text as a static, isolated entity – yet real digital texts change over time, and are written and read in the context of other texts. To enable a new generation of NLP research and applications, the InterText initiative develops novel NLP approaches for modeling text as a living object in context. Our initiative covers three core areas:

InterText
Data and Benchmarking

Research-ready datasets and benchmarks for interconnected living texts.

Learn more
InterText
Cross-document NLP

Joint unified framework for modeling cross-document discourse.

Learn more
InterText
Applications

New generation of NLP applications for collaborative text work.

Learn more

News

Oct 2024

Are LLMs good classifiers?... To find out, we propose a framework to study LLM fine-tuning for classification with generation- and encoding-based approaches. We apply it to the edit intent classification task and create Re3-Sci2.0: a new large-scale dataset of scientific document revisions with over 94k labeled edits. Have a look at the preprint, while we prepare the camera-ready for EMNLP!

Jul 2024

InterText at ACL-2024. Two InterText papers to appear at ACL-2024 in Bangkok! Qian Ruan will present our new dataset and approach for holistic modelling of document revision [1], and Furkan Şahinuç will talk about systematic exploration of creative multi-document NLG tasks in the age of LLMs [2]. While the authors are busy preparing their posters, take a look at the preprints and meet us at the conference!

Jul 2024

Introducing M2QA. Language and domain are two major sources of data variation in NLP, motivating the need for joint language-domain transfer. Yet, reliable evaluation remains a challenge. To address this gap, together with colleagues, we created M2QA - a new multi-domain multi-lingual QA benchmark that allows testing for domain and/or language transfer across 4 distinct languages and domains. Find out the details in our preprint, or get the data and start experimenting!

May 2024

New white paper on NLP for peer review. Peer review is at the core of modern science. Yet it is hard, time consuming and often unfair. What makes peer review challenging, how can NLP help, and where should it stand aside? A new, extensive white paper in collaboration with over 20 high-profile NLP and ML researchers lays the foundation for machine-assisted scientific quality control in the age of AI. The companion repository aggregates datasets for peer review assistance to help new researchers get started. Have a look and contribute!

Apr 2024

InterText at EACL-2024. Long documents are often structured, making it much easier for humans to navigate large texts. Is document structure encoded in long-document transformers, and how can their structure-awareness be improved? We investigate this with a novel probing suite and structure infusion kit in our new EACL paper.

Nov 2023

InterText at EMNLP-2023. We are excited to announce two upcoming EMNLP presentations from InterText. CiteBench is our new benchmark for citation text generation in collaboration with IBM Research. The Argument Mining workshop will host our PragTag shared task. Have a look at our preprints and meet us at the conference!

Nov 2023

Related work from our colleagues. Peer review is one of the core objects of study in InterText. A closely related new work from our colleagues at UKP Lab and the University of Hamburg explores argumentation in peer reviews and rebuttals. Take a look at their pre-print and visit their talk at the upcoming EMNLP!

May 2023

Join the PragTag-2023 Shared Task. We are hosting PragTag-2023: The First Shared Task on Pragmatic Tagging of Peer Reviews at the Argument Mining Workshop at EMNLP-2023. Learn more and register here!

May 2023

Three papers accepted at ACL-2023. Our NLPeer, Inclusive Notion of Text and CARE to appear in ACL-2023! 🥳

Feb 2023

CARE tool release. Natural highlights and comments are a key element of modern text work, yet datasets, tasks and applications of NLP to inline commentary are missing. To address this gap, we developed CARE: a Collaborative AI-Assisted Reading Environment. It is an open-source web application that allows users to collaboratively read and annotate documents, while collecting textual and behavioral data, and providing an easy way to integrate NLP models for real-time assistance. Read more in our preprint, check out the extensive documentation and try the demo!

Team

Iryna Gurevych

Principal Investigator

Ilia Kuznetsov

Postdoc

Martin Tutek

Postdoc

Jan Buchmann

PhD Student

Nils Dycke

PhD Student

Max Eichler

PhD Student

Dennis Zyska

PhD Student

Qian Ruan

PhD Student

Toru Sasaki

PhD Student

Funding