Using artificial intelligence to develop a machine translation system and teaching resources in the Tuvan language
DOI:
https://doi.org/10.25178/nit.2024.1.1Keywords:
Tuvan language; artificial intelligence; machine translation; neural networks; large language models; digital presence; machine learningAbstract
The advancement of computer technologies applied in the humanities and the progress in the development of large language models based on machine learning and neural network technologies have reached an exceptionally high level of sophistication. The linguistic potential of large language models elicits a natural interest among researchers, which constitutes a justified reflection of the relevance and importance of using artificial intelligence to create machine translation systems and educational resources.
The article explores the experience of creating a large language model for the Tuvan language using machine learning and artificial intelligence. The authors undertook an attempt to develop a large language model capable of recognizing the Tuvan language, translating phrases into Russian and back. In addition, the possibilities of generating text in Tuvan were examined and tested, which can be used both in the field of language teaching and when conducting various kinds of linguistic research.
This experience is unique since, as of now, the Tuvan language is not represented in any well-established machine translation systems. A secondary aim of the research is to analyze the level of the language's digital presence on the Internet, as well as to provide recommendations for devising an optimal algorithm for building similar systems and web services based on machine learning. The research outcomes are of practical value not only with respect to the Tuvan language but can also be extrapolated to other official languages in the Russian Federation.
References
Arefyev, A. L., Bakhtikireeva, U. M. and Sinyachkin, V. P. (2021). Issues of bilingualism in the school language education system of the Republic of Tuva. New Research of Tuva, no. 1, pp. 255–272. (In Russ.) DOI: https://doi.org/10.25178/nit.2021.1.14
Borgoiakova, T. G. and Bitkeeva, A. N. (2023) The Tuvan component of the bilingual space or reflections on the strategy of state support of the Tuvan language. New Research of Tuva, no. 4, pp. 290–300. (In Russ.) DOI: https://doi.org/10.25178/nit.2023.4.20
Dyrkheeva, G. A. and Tsybenova, Ch. S. (2020) Language attitudes and language loyalty of minor language speakers under the conditions of national-Russian bilingualism: the case of Buryats and Tuvans. New Research of Tuva, no. 1, pp. 62–74. (In Russ.) DOI: https://doi.org/10.25178/nit.2020.1.5
Kuzhugget, Sh. Yu., Suvandii, N. D. and Lamazhaa, Ch. K. (2021) The problem of translating cultural concepts into another language (on the example of Tuvan cultural concepts). Polylinguality & transcultural practices, nо. 18. (4), pp. 405–420. (In Russ.) DOI: https://doi.org/10.22363/2618-897X-2021-18-4-405-420
Ondar, Ch. G., Dongak, V. S. and Mongush, D. Sh. (2023). The Tuvan language on the Internet: representation, challenges, and prospects. New Research of Tuva, no. 1, pp. 186–207. (In Russ.) DOI: https://doi.org/10.25178/nit.2023.1.11
Papyn, A. S. (2010) Tuvan keyboard layout. New Research of Tuva, no. 1, pp. 19–25. (In Russ.)
Tuvans: Native People (2022). Ed. by Ch. K. Lamazhaa and N. D. Suvandii. St. Petersburg, Nestor-Istoriya. 344 pp. (In Russ.).
Athaluri, S. A., Manthena, S. V., Kesapragada, V. K. M., Yarlagadda, V., Dave, T. and Duddumpudi, R. T. S. (2023). Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references. Cureus, no. 15(4). DOI: https://doi.org/10.7759/cureus.37432
Armstrong, L. E., Bergeron, M. F., Lee, E. C., Mershon, J. E. and Armstrong, E. M. (2022) Overtraining syndrome as a complex systems phenomenon. Frontiers in Network Physiology, no. 1 (20). DOI: https://doi.org/10.3389/fnetp.2021.794392
Garcia, X. Bansal, Y, Cherry, C., Foster, G., Krikun, M., Feng, F., Johnson, M. and First, O. (2023) The unreasonable effectiveness of few-shot learning for machine translation. International Conference on Machine Learning, PMLR, pp. 10867–10878. DOI: https://doi.org/10.48550/arXiv.2302.01398
Le, T. N. and Sadat, F. (2020) Revitalization of indigenous languages through pre-processing and neural machine translation: The case of Inuktitut. Proceedings of the 28th International Conference on Computational Linguistics, pp. 4661–4666. DOI: https://doi.org/10.18653/v1/2020.coling-main.410.
Sreelekha, S., Bhattacharyya, P., Jha, S. K. and Malathi, D. (2016) A survey report on evolution of machine translation. IJCTA, 9 (33), pp. 233–240 [online]: https://www.serialsjournals.com/abstract/65435_article-24.pdf (access date: 12.11.2023).
Srinivasan, K., Raman, K., Chen, J., Bendersky, M. and Najork, M. (2021) Wit: Wikipedia-based image text dataset for multimodal multilingual machine learning. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2443–2449. DOI: https://doi.org/10.48550/arXiv.2103.01913
Spennemann, D. H. R. (2023) ChatGPT and the generation of digitally born “knowledge”: How does a generative AI language model interpret cultural heritage values? Knowledge, no. 3, pp. 480–512. DOI: https://doi.org/10.3390/knowledge3030032
Zwischenberger, C. (2022) Online collaborative translation: its ethical, social, and conceptual conditions and consequences. Perspectives, no. 30 (1), pp. 1–18. DOI: https://doi.org/10.1080/0907676X.2021.1872662
Published
How to Cite
For citation:
Novikova M. L. and Novikov Ph. N. Using artificial intelligence to develop a machine translation system and teaching resources in the Tuvan language. New Research of Tuva, 2024, no. 1, pp. 6-17. (In Russ.). DOI: https://doi.org/10.25178/nit.2024.1.1
Issue
Section
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Author(s) license holder(s) grant rights for their work to the journal (grantee of a license) under the simple non-exclusive open license in accordance with Art. 1286.1 «Open license for a research work, work of literature or fine arts», Civil Code of the Russian Federation.
New Research of Tuva publishes articles under the Creative Commons Attribution-NonCommercial license (CC BY-NC).
Since it is an open license, author(s) reserve the right to upload the article to their institutional repository, submit it to another journal (if it allows republications), or republish it on their own website (in full, or in part).
However, several conditions apply here:
a) The republished version must always contain the name(s) and affiliation(s) of the author(s), the original title and the hyperlink to the original version on the New Research of Tuva website;
b) It must be in open access, free of charge, and no category of readers must be in any way whatsoever advantaged over general readership.
c) should the contribution be submitted elsewhere by its author(s) without substantial modification (30% or more of original text unchanged), the body of the article should contain a disclaimer that the original version was published in New Research of Tuva (with a link to the respective page)
The CC-BY-NC is a non-revocable license which applies worldwide and lasts for the duration of the work’s copyright.