DEVELOPING NLP TOOL FOR LINGUISTIC ANALYSIS OF UZBEK LANGUAGES

Authors

Ass. professor Marhamat Haydarova Yunusovna Tashkent University of Information Technologies named after Muhammad al-Khwarizmi
Assistant Guzal Shikhnazarova Alisherovna Tashkent University of Information Technologies named after Muhammad al-Khwarizmi

Keywords:

NLP, computer linguistics, tokenization

Abstract

Automatic processing of unstructured texts in natural languages is one of the urgent problems of computer analysis and synthesis of texts. It is possible to separately highlight the task of text normalization, usually implying the implementation of such processes as tokenization , stemming and lemmatization . Existing stemming algorithms are mostly focused on synthetic languages, in which form formation using morphemes prevails. The Uzbek language is an example of an agglutinative language, characterized by polysemantic affixal and service morphemes. Although the Uzbek language has many differences, for example, from the English language, nevertheless, it can be successfully processed by stemming algorithms.