The structure of an entry in the National corpus of Tuvan language

Authors

  • Angyr-ool S. Dagbazhyk Siberian Federal University

Keywords:

dictionary structure; textual corpus; dictionary; Tuvan language; electronic dictionary; Microsoft Office Access

Abstract

Contemporary information technologies and mathematical modelling has made creating corpora of natural languages significantly easier. A corpus is an information and reference system based on a collection of digitally processed texts. A corpus includes various written and oral texts in the given language, a set of dictionaries and markup – information on the properties of the text. It is the presence of the markup which distinguishes a corpus from an electronic library.

At the moment, national corpora are being set up for many languages of the Russian Federation, including those of the Turkic peoples. Faculty members, postgraduate and undergraduate students at Tuvan State University and Siberian Federal University are working on the National corpus of Tuvan language.

This article describes the structure of a dictionary entry in the National corpus of Tuvan language. The corpus database comprises the following tables: MAIN – the headword table, RUS, ENG, GER — translations of the headword into three languages, MORPHOLOGY — the table containing morphological data on the headword. The database is built in Microsoft Office Access.

Working with the corpus dictionary includes the following functions: adding, editing and removing an entry, entry search (with transcription), setting and visualizing morphological features of a headword.

The project allows us to view the corpus dictionary as a multi-structure entity with a complex hierarchical structure and a dictionary entry as its key component. The corpus dictionary we developed can be used for studying Tuvan language in its pronunciation, orthography and word analysis, as well as for searching for words and collocations in the texts included into the corpus.

References

Bavuu-Siuriun, M. V. (2010) Tuvinskii iazyk na sovremennom etape. Novye issledovaniia Tuvy, no. 3 [online] Available at: http://www.tuva.asia/journal/issue_7/2158-bavyy-suyruyn-mv.html (access date: 12.09.2016). (In Russ.).

Salchak, A. Ya. and Baiyr-ool, A. V. (2013) Elektronnyi korpus tu-vinskogo iazyka: sostoianie, problem. Mir nauki, kul'tury, obrazovanie, no. 6, pp. 408-409. (In Russ.).

Ssorina, M. S. (2011) Slovar' kak mul'tistrukturnaia organizatsiia. Iaroslavskii pedagogicheskii vestnik, no. 1, vol. 1. Gumanitarnye nauki, pp. 142–146. (In Russ.).

Stupin, L. P. (1985) Leksikografiia angliiskogo iazyka : uchebnoe posobie. Moscow, Vysshaia shkola. 185 p. (In Russ.).

Published

02.12.2016

How to Cite

Dagbazhyk, A. S. (2016) “The structure of an entry in the National corpus of Tuvan language”, The New Research of Tuva, 4. Available at: https://nit.tuva.asia/nit/article/view/612 (Accessed: 22.11.2024).

Issue

Section

Philology

Author Biography

Angyr-ool S. Dagbazhyk, Siberian Federal University

Postgraduate student, Institute of Mathematics and Fundamental Information Technology, Siberian Federal University.

Postal address: Room 34-03, 79 Svobodny Pr., 660041 Krasnoyarsk, Russian Federation.

Tel.: +7 (391) 206-21-48.

Email: angyrool-d@mail.ru

Research advisor: Doctor of Physics and Mathematics, Professor V.V. Bykova.