Published online by Cambridge University Press: 10 November 2020
This study deals with widespread issues on constituent parsing for Korean including the quantitative and qualitative error analyses on parsing results. The previous treebank grammars have been accepted as being interpretable in the various annotation schemes, whereas the recent parsers turn out to be much harder for humans to interpret. This paper, therefore, intends to find the concrete typology of parsing errors, to describe how these parsers deal with sentences and to show their statistical distribution, using state-of-the-art statistical and neural parsers. For doing this work, we train and evaluate the phrase structure Sejong treebank using statistical and neural parsing systems and obtain results up to a 89.18% F $_1$ score, which outperforms previous constituent parsing results for Korean. We also define best practices for correct comparison to future work by proposing the standard corpus division for the Sejong treebank.
Mija Kim and Jungyeul Park contributed equally.