The structure of an entry in the National corpus of Tuvan language
Keywords:
dictionary structure; textual corpus; dictionary; Tuvan language; electronic dictionary; Microsoft Office AccessAbstract
Contemporary information technologies and mathematical modelling has made creating corpora of natural languages significantly easier. A corpus is an information and reference system based on a collection of digitally processed texts. A corpus includes various written and oral texts in the given language, a set of dictionaries and markup – information on the properties of the text. It is the presence of the markup which distinguishes a corpus from an electronic library.
At the moment, national corpora are being set up for many languages of the Russian Federation, including those of the Turkic peoples. Faculty members, postgraduate and undergraduate students at Tuvan State University and Siberian Federal University are working on the National corpus of Tuvan language.
This article describes the structure of a dictionary entry in the National corpus of Tuvan language. The corpus database comprises the following tables: MAIN – the headword table, RUS, ENG, GER — translations of the headword into three languages, MORPHOLOGY — the table containing morphological data on the headword. The database is built in Microsoft Office Access.
Working with the corpus dictionary includes the following functions: adding, editing and removing an entry, entry search (with transcription), setting and visualizing morphological features of a headword.
The project allows us to view the corpus dictionary as a multi-structure entity with a complex hierarchical structure and a dictionary entry as its key component. The corpus dictionary we developed can be used for studying Tuvan language in its pronunciation, orthography and word analysis, as well as for searching for words and collocations in the texts included into the corpus.
References
Bavuu-Siuriun, M. V. (2010) Tuvinskii iazyk na sovremennom etape. Novye issledovaniia Tuvy, no. 3 [online] Available at: http://www.tuva.asia/journal/issue_7/2158-bavyy-suyruyn-mv.html (access date: 12.09.2016). (In Russ.).
Salchak, A. Ya. and Baiyr-ool, A. V. (2013) Elektronnyi korpus tu-vinskogo iazyka: sostoianie, problem. Mir nauki, kul'tury, obrazovanie, no. 6, pp. 408-409. (In Russ.).
Ssorina, M. S. (2011) Slovar' kak mul'tistrukturnaia organizatsiia. Iaroslavskii pedagogicheskii vestnik, no. 1, vol. 1. Gumanitarnye nauki, pp. 142–146. (In Russ.).
Stupin, L. P. (1985) Leksikografiia angliiskogo iazyka : uchebnoe posobie. Moscow, Vysshaia shkola. 185 p. (In Russ.).
Published
How to Cite
Issue
Section
Author(s) license holder(s) grant rights for their work to the journal (grantee of a license) under the simple non-exclusive open license in accordance with Art. 1286.1 «Open license for a research work, work of literature or fine arts», Civil Code of the Russian Federation.
New Research of Tuva publishes articles under the Creative Commons Attribution-NonCommercial license (CC BY-NC).
Since it is an open license, author(s) reserve the right to upload the article to their institutional repository, submit it to another journal (if it allows republications), or republish it on their own website (in full, or in part).
However, several conditions apply here:
a) The republished version must always contain the name(s) and affiliation(s) of the author(s), the original title and the hyperlink to the original version on the New Research of Tuva website;
b) It must be in open access, free of charge, and no category of readers must be in any way whatsoever advantaged over general readership.
c) should the contribution be submitted elsewhere by its author(s) without substantial modification (30% or more of original text unchanged), the body of the article should contain a disclaimer that the original version was published in New Research of Tuva (with a link to the respective page)
The CC-BY-NC is a non-revocable license which applies worldwide and lasts for the duration of the work’s copyright.