A metatextual markup in the national corpus of Tuvan language: the structure and functionality
Keywords:
natural language corpus; National corpus of Tuvan language; meta-markup; Tuvan language; Tuvan epic poetryAbstract
Creating natural language corpora helps solve a number of philological and purely linguistic problems for many languages of the peoples of Russian Federation. National corpus of Tuvan language (http://www.tuvancorpus.ru/) is one of such products jointly developed by faculty and students at two universities in Krasnoyarsk and Kyzyl.
The article presents a meta-markup system which forms the most important part of the search functionality in any corpus. Meta-markup refers to assigning parameters characterizing the text as a whole. Within a corpus, meta-markup provides the opportunity to search and select texts to include them into subcorpora by the presence of a certain feature(s). Consequently, the larger the set of such features is for each text, the wider become the search functionality for various philological and linguistic purposes.
The meta-markup system for the texts included into the National corpus of Tuvan language may include up to 18 parameters, such as the author’s name and gender, the title and creation date (year) of the text, its functional sphere, topic, subject area, time and setting of events described in it, the text’s classification by type of spoken language or literary genre and style, its source, name of the periodical it appeared in, publisher, publication date, medium, comments, as well as some features of its audience, such as age and education level.
References
Bavuu-Siuriun, M. V. (2010) Tuvinskii iazyk na sovremennom etape: obrazovatel'nyi aspekt. Novye issledovaniia Tuvy, no. 3 [online] Available at: http://www.tuva.asia/journal/issue_7/2158-bavyy-suyruyn-mv.html (access date: 12.06.2016). (In Russ.).
Zakharov, V. P. (2005) Korpusnaia lingvistika: uchebno-metodicheskoe posobie. St. Petersburg, BVKh-Peterburg. 48 p. (In Russ.).
Kukanova, V. V. (2015) Natsional'nyi korpus kalmytskogo iazyka: itogi raboty i perspektivy. Novye issledovaniia Tuvy, no. 1 [online] Available at: http://www.tuva.asia/journal/issue_25/7760-kukanova.html (access date: 01.04.2016). (In Russ.).
Savchuk, S. O. (2005) Metatekstovaia razmetka v natsional'nom korpuse russkogo iazyka: bazovye printsipy i osnovnye funktsii. Natsional'nyi korpus russkogo iazyka: 2003–2005. Rezul'taty i perspektivy [online] Available at: http://ruscorpora.ru/sbornik2005/05savchuk.pdf (access date: 01.04.2016). (In Russ.).
Salchak, A. Ia. and Baiyr-ool, A. V. (2013) Elektronnyi korpus tuvinskogo iazyka: sostoianie, problem. Mir nauki, kul'tury, obrazovanie, no. 6, pp. 408-409. (In Russ.).
Sysoev, P. V. (2010) Lingvisticheskii korpus v metodike obucheniia inostrannym iazykam. Iazyk i kul'tura, no. 1, pp. 99-111. (In Russ.).
Khertek, A. B., Oorzhak A. B. (2012) O morfologicheskoi razmetke elektronnogo korpusa tekstov tuvinskogo iazyka. Gramota. Filologicheskie nauki. Voprosy teorii i praktiki, no. 7, pp. 214–218. (In Russ.).
EAGLES. Text Corpora Working Group Reading Guide. EAG--TCWG--FR—2. Version of May, 1996. Istituto di Linguistica Computazionale «A. Zampolli» [online] Available at: http://www.ilc.cnr.it/EAGLES96/corpintr/corpintr.html (access date: 12.09.2016).
Published
How to Cite
Issue
Section
Author(s) license holder(s) grant rights for their work to the journal (grantee of a license) under the simple non-exclusive open license in accordance with Art. 1286.1 «Open license for a research work, work of literature or fine arts», Civil Code of the Russian Federation.
New Research of Tuva publishes articles under the Creative Commons Attribution-NonCommercial license (CC BY-NC).
Since it is an open license, author(s) reserve the right to upload the article to their institutional repository, submit it to another journal (if it allows republications), or republish it on their own website (in full, or in part).
However, several conditions apply here:
a) The republished version must always contain the name(s) and affiliation(s) of the author(s), the original title and the hyperlink to the original version on the New Research of Tuva website;
b) It must be in open access, free of charge, and no category of readers must be in any way whatsoever advantaged over general readership.
c) should the contribution be submitted elsewhere by its author(s) without substantial modification (30% or more of original text unchanged), the body of the article should contain a disclaimer that the original version was published in New Research of Tuva (with a link to the respective page)
The CC-BY-NC is a non-revocable license which applies worldwide and lasts for the duration of the work’s copyright.