Semantic markup of nouns and adjectives for the Electronic corpus of texts in Tuvan language
Keywords:
Tuvan language; electronic database; automated search system; lexis; lexico-semantic classes and subclasses; descriptor; tag; noun; adjective; lexical compatibilityAbstract
The article examines the progress of semantic markup of the Electronic corpus of texts in Tuvan language (ECTTL), which is another stage of adding Tuvan texts to the database and marking up the corpus. ECTTL is a collaborative project by researchers from Tuvan State University (Research and Education Center of Turkic Studies and Department of Information Technologies).
Semantic markup of Tuvan lexis will come as a search engine and reference system which will help users find text snippets containing words with desired meanings in ECTTL.
The first stage of this process is setting up databases of basic lexemes of Tuvan language. All meaningful lexemes were classified into the following semantic groups: humans, animals, objects, natural objects and phenomena, and abstract concepts. All Tuvan object nouns, as well as both descriptive and relative adjectives, were assigned to one of these lexico-semantic classes. Each class, sub-class and descriptor is tagged in Tuvan, Russian and English; these tags, in turn, will help automatize searching.
The databases of meaningful lexemes of Tuvan language will also outline their lexical combinations. The automatized system will contain information on semantic combinations of adjectives with nouns, adverbs with verbs, nouns with verbs, as well as on the combinations which are semantically incompatible.
References
Bavuu-Siuriun, M. V. and Dalaa, S. M. Morfemno-orfograficheskii slovar' tuvinskogo iazyka. Elektronnyi korpus tekstov tuvinskogo iazyka [online] Available at: http://www.tuvacorpus.ru/?q=content/slovari (access date: 12.09.2016). (In Russ.)
Oorzhak, B. Ch. and Khertek, A. B. (2015) Razrabotka semanticheskoi razmetki elektronnogo korpusa tuvinskogo iazyka. In: Materialy 3-ei Mezhdunarodnoi konferentsii po komp'iuternoi obrabotke tiurkskikh iazykov «TurkLang 2015». Kazan', 17–19 sentiabria 2015. Kazan', Izd-vo AN Respubliki Tatarstan. Pp. 351–362. (In Russ.)
Cozdanie bazy dannykh leksicheskogo fonda tuvinskogo iazyka (2016) / Oorzhak, B. Ch, Khertek, A. B., Kuzhuget, M. A., Salchak, A. Ia., Ondar, V. S. and Chamzyryn, E. T. In: Trudy Mezhdunarodnoi konferentsii po komp'iuternoi i kognitivnoi lingvistike. TEL-2016. Kazan', 21–24 aprelia 2016. Kazan', Izd-vo Kazanskogo gosuniversiteta. Vol. 17. 392 p. Pp. 278–281. (In Russ.)
Published
How to Cite
Issue
Section
Author(s) license holder(s) grant rights for their work to the journal (grantee of a license) under the simple non-exclusive open license in accordance with Art. 1286.1 «Open license for a research work, work of literature or fine arts», Civil Code of the Russian Federation.
New Research of Tuva publishes articles under the Creative Commons Attribution-NonCommercial license (CC BY-NC).
Since it is an open license, author(s) reserve the right to upload the article to their institutional repository, submit it to another journal (if it allows republications), or republish it on their own website (in full, or in part).
However, several conditions apply here:
a) The republished version must always contain the name(s) and affiliation(s) of the author(s), the original title and the hyperlink to the original version on the New Research of Tuva website;
b) It must be in open access, free of charge, and no category of readers must be in any way whatsoever advantaged over general readership.
c) should the contribution be submitted elsewhere by its author(s) without substantial modification (30% or more of original text unchanged), the body of the article should contain a disclaimer that the original version was published in New Research of Tuva (with a link to the respective page)
The CC-BY-NC is a non-revocable license which applies worldwide and lasts for the duration of the work’s copyright.