Annak érdekében, hogy Önnek a legjobb élményt nyújtsuk "sütiket" használunk honlapunkon. Az oldal használatával Ön beleegyezik a "sütik" használatába.

Benchmarking morphological analyzers for the Hungarian language

  •  Minősített cikkek
  • 2023-02-01 12:20:00
In this paper we evaluate, compare and benchmark the four most widely used and most advanced morphological analyzers for the Hungarian language, namely Hunmorph-Ocamorph, Hunmorph-Foma, Humor and Hunspell. The main goal of the current research is to define objective metrics while comparing these tools. The novelty of this paper is the fact that the analyzers are compared based on their annotation token systems instead of their lemmatization features. The proposed metrics for the comparison are the following: how different their annotation token systems are, how many words are recognized by the different analyzers and how many words are there whose morphological structure is equivalent using a well-defined mapping among the annotation token systems. For each of these metrics, we define the concept of similarity and distance. For the evaluation we use a unique Hungarian corpus that we generated in an automated way from Hungarian free texts, as well as a novel automated token mapping generation algorithm. According to our experimental results, Hunmorph-Ocamorph gives the best results. Hunmorph-Foma is very close to it, but sometimes returns an invalid lemma. Humor is the third best analyzer, while Hunspell is far worse than the other three tools.

A teljes cikk innen tölthető le.

 

 

Hivatkozás

MLA: Szabó, Gábor, and László Kovács. "Benchmarking morphological analyzers for the Hungarian language." Annales Mathematicae et Informaticae. Vol. 49. Eszterházy Károly University Institute of Mathematics and Informatics, 2018.

APA: Szabó, G., & Kovács, L. (2018). Benchmarking morphological analyzers for the Hungarian language. In Annales Mathematicae et Informaticae (Vol. 49, pp. 141-166). Eszterházy Károly University Institute of Mathematics and Informatics.

ISO690: SZABÓ, Gábor; KOVÁCS, László. Benchmarking morphological analyzers for the Hungarian language. In: Annales Mathematicae et Informaticae. Eszterházy Károly University Institute of Mathematics and Informatics, 2018. p. 141-166.

BibTeX:

 

@inproceedings{szabo2018benchmarking,
  title={Benchmarking morphological analyzers for the Hungarian language},
  author={Szab{'o}, G{'a}bor and Kov{'a}cs, L{'a}szl{'o}},
  booktitle={Annales Mathematicae et Informaticae},
  volume={49},
  pages={141--166},
  year={2018},
  organization={Eszterh{'a}zy K{'a}roly University Institute of Mathematics and Informatics}
}

 

 

 

 

 

Megosztás