The InterText Initiative, UKP Lab

Area: Data and Benchmarking

Natural Language Processing critically depends on data. Yet, most existing text collections consist of individual, isolated documents. To foster research in cross-document NLP, this area develops novel corpora and unified benchmarks for the study of interconnected, changing texts. We devise an inclusive view on text that - unlike most prior work in NLP - takes both textual and non-textual elements into account to enable efficient cross-document processing.

As NLP finds its way into real-life applications, concerns regarding the provenance, quality and legal status of data arise. Collecting interconnected, living texts is coupled with additional challenges, incl. multiple authorship, confidentiality and privacy concerns. Contributing to the growing body of research on ethics in NLP, this area puts special focus on developing general-purpose methodologies and workflows for ethics-, confidentiality- and copyright-aware data collection.

Publications

Apr 2025

⤴️ PeerQA: A Scientific Question Answering Dataset from Peer Reviews
Tim Baumgärtner, Ted Briscoe, Iryna Gurevych (2025)
NAACL-2025 [paper] [repo]
[bibTex] [plain]

Apr 2025

⤴️ COVE: COntext and VEracity prediction for out-of-context images
Jonathan Tonglet, Gabriel Thiem, Iryna Gurevych (2025)
NAACL-2025 [paper] [repo]
[bibTex] [plain]

Apr 2025

⤴️ Grounding Fallacies Misrepresenting Scientific Publications in Evidence
Max Glockner, Yufang Hou, Preslav Nakov, Iryna Gurevych (2025)
NAACL-2025 [paper] [repo]
[bibTex] [plain]

Dec 2024

Attribute or Abstain: Large Language Models as Long Document Assistants
Jan Buchmann, Xiao Liu, Iryna Gurevych (2024)
EMNLP-2024 [paper] [repo]
[bibTex] [plain]

Dec 2024

STRICTA: Structured Reasoning in Critical Text Assessment for Peer Review and Beyond.
Nils Dycke, Matej Zečević, Ilia Kuznetsov, Beatrix Suess, Kristian Kersting, Iryna Gurevych (2024)
🔥 accepted at ACL-2025 [paper]
[bibTex] [plain]

Nov 2024

⤴️ “Image, Tell me your story!” Predicting the original meta-context of visual misinformation
Jonathan Tonglet, Marie-Francine Moens, Iryna Gurevych (2024)
EMNLP-2024 [paper] [repo]
[bibTex] [plain]

Oct 2024

Are Large Language Models Good Classifiers? A Study on Edit Intent Classification in Scientific Document Revisions
Qian Ruan, Ilia Kuznetsov, Iryna Gurevych (2024)
EMNLP-2024 [paper] [repo]
[bibTex] [plain]

Oct 2024

M2QA: A Multi-domain Multilingual Question Answering Benchmark Dataset
Leon Engländer, Hannah Sterz, Clifton Poth, Jonas Pfeiffer, Ilia Kuznetsov, Iryna Gurevych (2024)
EMNLP-2024 Findings [paper] [repo]
[bibTex] [plain]

Jul 2024

Re3: A Holistic Framework and Dataset for Modeling Collaborative Document Revision
Qian Ruan, Ilia Kuznetsov, Iryna Gurevych (2024)
ACL-2024 [paper] [repo]
[bibTex] [plain]

Apr 2024

Document Structure in Long Document Transformers
Jan Buchmann, Max Eichler, Jan-Micha Bodensohn, Ilia Kuznetsov, Iryna Gurevych (2024)
EACL-2024 [paper] [repo]
[bibTex] [plain]

Dec 2023

Overview of PragTag-2023: Low-Resource Multi-Domain Pragmatic Tagging of Peer Reviews
Nils Dycke, Ilia Kuznetsov, Iryna Gurevych (2023)
Proceedings of the 10th Workshop on Argument Mining [paper] [repo]
[bibTex] [plain]

Dec 2023

CiteBench: A benchmark for Scientific Citation Text Generation
Martin Funkquist, Ilia Kuznetsov, Yufang Hou, Iryna Gurevych (2023)
EMNLP-2023 [paper] [repo]
[bibTex] [plain]

Jul 2023

NLPeer: A Unified Resource for the Computational Study of Peer Review
Nils Dycke, Ilia Kuznetsov, Iryna Gurevych (2023)
ACL-2023 [paper] [repo]
[bibTex] [plain]

Jul 2023

An Inclusive Notion of Text
Ilia Kuznetsov, Iryna Gurevych (2023)
ACL-2023 [paper]
[bibTex] [plain]

Oct 2022

Yes-Yes-Yes: Proactive Data Collection for ACL Rolling Review and Beyond
Nils Dycke, Ilia Kuznetsov, Iryna Gurevych (2022)
Findings of EMNLP [paper] [repo]
[bibTex] [plain]

Nov 2019

Does My Rebuttal Matter? Insights from a Major NLP Conference
Yang Gao, Steffen Eger, Ilia Kuznetsov, Iryna Gurevych, Yusuke Miyao (2019)
NAACL [paper] [repo]
[bibTex] [plain]

Datasets and Code

The intertext-graph library
A pre-release of our general-purpose library for cross-document NLP modelling and analysis. Current version of the library provides converters from several document formats into a uniform data model, as well as an API for common graph operations that facilitate cross-document analysis on varying granularity levels. The library is constantly extended to cover more document formats and cross-document relation types, star the repo to stay up-to-date with the new releases!

[link]
[paper]

3Y Data Collection Implementation
An open implementation of a peer reviewing data collection workflow for OpenReview.net. The code can be used to set up a licensing workflow for peer review data and paper drafts submitted to OpenReview-based venues. We provide the implementation for creating license tasks for reviewers and authors of selected submissions, as well as the code for retrieving the peer reviewing data in a privacy- and anonymity-aware fashion.

[link]
[paper]

ACL-2018 Review Corpus
A corpus of anonymised structured peer reviews collected during the ACL-2018 reviewing campaign. ACL-2018 employed a rich reviewing schema, with each review containing a wide range of textual, binary, ternary and numerical fields, including Strengths, Weaknesses, Summary, aspect scores, overall score and confidence scores. While openly publishing the textual data is not possible due to the ethical concerns, we make numerical data publicly available to support meta-scientific study of peer reviewing in the NLP community.

[link]
[paper]

Re3 Corpus
The first large-scale manually labeled corpus of document-level edits in the scholarly domain.

[link]
[paper]

NLPeer
An openly-licensed, unified, multi-domain resource for the computational study of peer review. Papers, reviews and paper revisions in a unified format across a range of research communities, incl. new data from ACL and COLING review collection campaigns.

[link]
[paper]

LAB
A new six-task benchmark to study long-document attribution.

[link]
[paper]

M2QA
Our brand-new large-scale multi-lingual AND multi-domain benchmark for SQuAD-style question answering.

[link]
[paper]

CiteBench
Source code and data for the CiteBench: the first benchmark for citation text generation.

[link]
[paper]

NLPeer 2
A brand-new, high-coverage, data-rich, clearly licensed dataset of papers, peer reviews, rebuttals and meta-reviews from the ACL community and beyond. Your one-stop-shop for empirical study of peer reviewing, reviewing assistance, edit analysis, and many other exciting problems. Learn more here.

[link]
[paper]