NLP for living texts, in context.

Natural language processing (NLP) is a subfield of AI and computational linguistics dedicated to the analysis and generation of natural texts. For most of its history, NLP has seen text as a static, isolated entity – yet real digital texts change over time, and are written and read in the context of other texts. To enable a new generation of NLP research and applications, the InterText initiative develops novel NLP approaches for modeling text as a living object in context. Our initiative covers three core areas:

InterText
Data and Benchmarking

Research-ready datasets and benchmarks for interconnected living texts.

Learn more
InterText
Cross-document NLP

Joint unified framework for modeling cross-document discourse.

Learn more
InterText
Applications

New generation of NLP applications for collaborative text work.

Learn more

News

Nov 2022

New preprint. Peer review is the cornerstone of academic publishing -- yet NLP for peer review lacks solid data foundation and is over-focused on few select research communities. We introduce NLPeer: the first clearly licensed, large-scale, multi-domain resource for computational study of peer review, incl. two novel corpora of peer reviews from the NLP community. While we are preparing the data for release, take a look at the preprint!

Nov 2022

New preprint. NLP studies text, but what counts as text and what gets discarded depends on the study. In our fresh preprint, we discuss the emerging inclusive approach to text in NLP, introduce a taxonomy of extended text, and propose a community-wide reporting schema to facilitate future discussion.

Oct 2022

Paper accepted at EMNLP. Our paper "Yes-Yes-Yes" will appear in the Findings of EMNLP-2022! We propose a new ethics-, copyright- and confidentiality-aware data collection workflow for peer reviews and study how donation-based collection affects the data. While we polish the camera-ready, have a look at our preprint and repo!

Aug 2022

Keynote at EPIA-2022. Iryna Gurevych will give a keynote on new frontiers in cross-document NLP at the 21st EPIA Conference on Artificial Intelligence.

Aug 2022

New CL article. Our article "Revise and Resubmit" got published in Computational Linguistics. We propose a joint framework for cross-document modeling inspired by theoretical work in intertextuality, and instantiate it in the peer reviewing domain, resulting in a large new corpus. Find the article online to know more!

Team

Iryna Gurevych

Principal Investigator

Ilia Kuznetsov

Postdoc

Martin Tutek

Postdoc

Jan Buchmann

PhD Student

Nils Dycke

PhD Student

Max Eichler

PhD Student

Dennis Zyska

PhD Student

Qian Ruan

PhD Student

Funding