Area: Cross-Document NLP
Static, isolated, short texts have been the main focus of NLP research to date. However, many real-world tasks require humans to simultaneously work with multiple, connected, potentially long documents that change over time: from collaborative writing to fake news detection, and from peer review to social media management. While isolated, application-specific approaches to cross-document discourse modeling exist, the general NLP methodology for cross-document analysis is yet to be established.
Instead of treating each application scenario separately, the InterText initiative develops a joint, unified framework for cross-document modeling. Inspired by the theoretical works in literary and discourse studies, we propose a typology of general cross-document relations that might differ by their type, granularity and explicitness, aiming to cover a wide range of cross-document discourse phenomena. We use this typology to model cross-document discourse in diverse application scenarios.