Data and Benchmarking
Research-ready datasets and benchmarks for interconnected living texts.
Learn moreNews
InterText at EACL-2024. Long documents are often structured, making it much easier for humans to navigate large texts. Is document structure encoded in long-document transformers, and how can their structure-awareness be improved? We investigate this with a novel probing suite and structure infusion kit in our new EACL paper.
InterText at EMNLP-2023. We are excited to announce two upcoming EMNLP presentations from InterText. CiteBench is our new benchmark for citation text generation in collaboration with IBM Research. The Argument Mining workshop will host our PragTag shared task. Have a look at our preprints and meet us at the conference!
Related work from our colleagues. Peer review is one of the core objects of study in InterText. A closely related new work from our colleagues at UKP Lab and the University of Hamburg explores argumentation in peer reviews and rebuttals. Take a look at their pre-print and visit their talk at the upcoming EMNLP!
Join the PragTag-2023 Shared Task. We are hosting PragTag-2023: The First Shared Task on Pragmatic Tagging of Peer Reviews at the Argument Mining Workshop at EMNLP-2023. Learn more and register here!
Three papers accepted at ACL-2023. Our NLPeer, Inclusive Notion of Text and CARE to appear in ACL-2023! 🥳
CARE tool release. Natural highlights and comments are a key element of modern text work, yet datasets, tasks and applications of NLP to inline commentary are missing. To address this gap, we developed CARE: a Collaborative AI-Assisted Reading Environment. It is an open-source web application that allows users to collaboratively read and annotate documents, while collecting textual and behavioral data, and providing an easy way to integrate NLP models for real-time assistance. Read more in our preprint, check out the extensive documentation and try the demo!
New preprint. Peer review is the cornerstone of academic publishing -- yet NLP for peer review lacks solid data foundation and is over-focused on few select research communities. We introduce NLPeer: the first clearly licensed, large-scale, multi-domain resource for computational study of peer review, incl. two novel corpora of peer reviews from the NLP community. While we are preparing the data for release, take a look at the preprint!
New preprint. NLP studies text, but what counts as text and what gets discarded depends on the study. In our fresh preprint, we discuss the emerging inclusive approach to text in NLP, introduce a taxonomy of extended text, and propose a community-wide reporting schema to facilitate future discussion.
Paper accepted at EMNLP. Our paper "Yes-Yes-Yes" will appear in the Findings of EMNLP-2022! We propose a new ethics-, copyright- and confidentiality-aware data collection workflow for peer reviews and study how donation-based collection affects the data. While we polish the camera-ready, have a look at our preprint and repo!
Keynote at EPIA-2022. Iryna Gurevych will give a keynote on new frontiers in cross-document NLP at the 21st EPIA Conference on Artificial Intelligence.
Team
Iryna Gurevych
Principal Investigator
Ilia Kuznetsov
Postdoc
Martin Tutek
Postdoc
Jan Buchmann
PhD Student
Nils Dycke
PhD Student
Max Eichler
PhD Student
Dennis Zyska
PhD Student
Qian Ruan
PhD Student
Toru Sasaki
PhD Student