D18-2010@ACL

Total: 1

#1 Juman++: A Morphological Analysis Toolkit for Scriptio Continua [PDF] [Copy] [Kimi1]

Authors: Arseny Tolmachev ; Daisuke Kawahara ; Sadao Kurohashi

We present a three-part toolkit for developing morphological analyzers for languages without natural word boundaries. The first part is a C++11/14 lattice-based morphological analysis library that uses a combination of linear and recurrent neural net language models for analysis. The other parts are a tool for exposing problems in the trained model and a partial annotation tool. Our morphological analyzer of Japanese achieves new SOTA on Jumandic-based corpora while being 250 times faster than the previous one. We also perform a small experiment and quantitive analysis and experience of using development tools. All components of the toolkit is open source and available under a permissive Apache 2 License.