Using Pos-Tagging Tools in the Uzbek Language

Authors

  • Amirkulov Ma’rufjon Alikulovich Tashkent State University of Uzbek Language and Literature

DOI:

https://doi.org/10.51699/cajlpc.v7i1.1435

Keywords:

PoS tagging, Uzbek language, corpus linguistics, morphological analysis, Hidden Markov Model, BERT, uznatcorpora.uz, parallel corpus, linguistic annotation

Abstract

This article analyzes the challenges of automatic part-of-speech identification (PoS tagging) in the Uzbek language, existing approaches, and the possibilities of using practical tools. Due to the agglutinative nature of Uzbek, PoS tagging requires careful consideration of morphological analysis, contextual meaning, and variations in affixal forms. The paper discusses the effectiveness of rule-based and statistical PoS taggers, particularly those developed using the Hidden Markov Model (HMM), as well as the advantages of the BBPOS system based on neural networks. In addition, the article demonstrates how morphological analysis results obtained through the uznatcorpora.uz platform provide a solid foundation for PoS tagging in Uzbek. The research findings highlight the necessity of creating PoS-tagged texts for an Uzbek–English parallel corpus and reveal the linguistic and practical value of such a corpus.

References

E. B. Boltayevich, S. S. Samariddinovich, S. M. Mirdjonovna, E. Adalı, and X. Z. Yuldashevna, “POS tagging of Uzbek text using Hidden Markov Model,” in Proc. 8th Int. Conf. Computer Science and Engineering (UBMK), Sep. 2023, pp. 63–68.

E. B. Boltayevich, E. Adalı, S. M. Mirdjonovna, A. O. Xolmo’minovna, X. Z. Yuldashevna, and X. N. Uktamboy O‘g‘li, “The problem of POS tagging and stemming for agglutinative languages (Turkish, Uyghur, Uzbek languages),” in Proc. 8th Int. Conf. Computer Science and Engineering (UBMK), Sep. 2023, pp. 57–62.

B. Elov and N. Xudayberganov, “Methods of POS tagging for Uzbek language corpus texts,” Computer Linguistics: Problems, Solutions, Prospects, vol. 1, no. 1, 2024.

L. Bobojonova, A. Akhundjanova, P. Ostheimer, and S. Fellenz, “BBPOS: BERT-based part-of-speech tagging for Uzbek,” arXiv preprint arXiv:2501.10107, 2025.

D. Jurafsky and J. H. Martin, Speech and Language Processing, 4th ed. Hoboken, NJ, USA: Pearson, 2023.

C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. Bethard, and D. McClosky, “The Stanford CoreNLP natural language processing toolkit,” in Proc. ACL System Demonstrations, 2014, pp. 55–60.

K. Toutanova, D. Klein, C. D. Manning, and Y. Singer, “Feature-rich part-of-speech tagging with a cyclic dependency network,” in Proc. HLT–NAACL, 2003, pp. 252–259.

J. Hajič, “Building a syntactically annotated corpus: The Prague Dependency Treebank,” in Issues of Valency and Meaning, 1998, pp. 106–132.

J. Tiedemann, “Parallel data, tools and interfaces in OPUS,” in Proc. Int. Conf. Language Resources and Evaluation (LREC), 2012, pp. 2214–2218.

R. Sennrich, B. Haddow, and A. Birch, “Neural machine translation of rare words with subword units,” in Proc. Annu. Meeting Assoc. Computational Linguistics (ACL), 2016, pp. 1715–1725.

Y. Kim, Y. Jernite, D. Sontag, and A. M. Rush, “Character-aware neural language models,” in Proc. AAAI Conf. Artificial Intelligence, 2016, pp. 2741–2749.

B. Bohnet, “Top accuracy and fast dependency parsing is not a contradiction,” in Proc. Int. Conf. Computational Linguistics (COLING), 2010, pp. 89–97.

N. Habash, Introduction to Arabic Natural Language Processing, Synthesis Lectures on Human Language Technologies, vol. 3, no. 1, pp. 1–187, 2010.

H. Tseng, D. Jurafsky, and C. D. Manning, “Morphological normalization for English out-of-vocabulary words,” in Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), 2005, pp. 356–363.

B. Bohnet and J. Nivre, “A transition-based system for joint part-of-speech tagging and labeled non-projective dependency parsing,” in Proc. EMNLP–CoNLL, 2012, pp. 1455–1465.

Y. Zhang and S. Clark, “A tale of two parsers: Investigating and combining graph-based and transition-based dependency parsing using beam-search,” in Proc. Conf. Empirical Methods in Natural Language Processing (EMNLP), 2008, pp. 562–571.

Downloads

Published

2026-01-24

How to Cite

Ma’rufjon Alikulovich, A. (2026). Using Pos-Tagging Tools in the Uzbek Language. Central Asian Journal of Literature, Philosophy and Culture, 7(1), 269–272. https://doi.org/10.51699/cajlpc.v7i1.1435

Issue

Section

Articles