D18-2014@ACL

Total: 1

#1 A Multilingual Information Extraction Pipeline for Investigative Journalism [PDF] [Copy] [Kimi1]

Authors: Gregor Wiedemann ; Seid Muhie Yimam ; Chris Biemann

We introduce an advanced information extraction pipeline to automatically process very large collections of unstructured textual data for the purpose of investigative journalism. The pipeline serves as a new input processor for the upcoming major release of our New/s/leak 2.0 software, which we develop in cooperation with a large German news organization. The use case is that journalists receive a large collection of files up to several Gigabytes containing unknown contents. Collections may originate either from official disclosures of documents, e.g. Freedom of Information Act requests, or unofficial data leaks.