D18-2014@ACL

Total: 1

#1 A Multilingual Information Extraction Pipeline for Investigative Journalism [PDF] [Copy] [Kimi1] [REL]

Authors: Gregor Wiedemann, Seid Muhie Yimam, Chris Biemann

We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism. The pipeline serves as a new input processor for the upcoming major release of our New/s/leak 2.0 software, which we develop in cooperation with a large German news organization. The use case is that journalists receive a large collection of files up to several Gigabytes containing unknown contents. Collections may originate either from official disclosures of documents, e.g. Freedom of Information Act requests, or unofficial data leaks.