Anthropology Club and CASI joint session

Anthropology Club and CASI joint session

January 22, 2014

The Kyrgyzstan Corpus Project: Building a language resource unique to Kyrgyzstan and available to all

January 22, 2014

Joshua Meyer, University of Arizona

Abstract: How do we do linguistic research? Some linguists might ask native speakers of Kyrgyz if sentence A sounds better than sentence B, or if the imaginary word /gabo/ sounds "more or less Kyrgyz" than the imaginary word /kupo/. Based on these limited interactions, the linguist will then construct a theory of the linguistic structure of Kyrgyz. That is, for some linguists, the only relevant evidence is judgment of a native speaker. 

Other researchers, however, try to get as much data as possible, and then look at statistical trends in the data. With regards to language, we can do this by creating and analyzing a corpus. 

A corpus is a collection of language that is searchable. It could be a collection of written texts, such as books, articles, and newspapers, or it can be primarily recorded speech that has been transcribed afterwards. If all these texts/transcriptions are in one place, the researcher can then get answers to certain questions. Maybe she wants to know how often a verb comes before a object in a language like Russian, where word order is relatively loose. With 300 million words in one place the researcher can find, with a computer program, all sentences which contain verbs and objects, and then compare their relative order. 

There are lots of reasons to create and use a corpus, and this talk will cover some of them and also describe a project at AUCA's Central Asian Studies Institute to create such a resource for Kyrgyzstan. 

Why Kyrgyzstan?

a) there is no such resource available for Kyrgyz -- A corpus of Kyrgyz would allow linguists, and others, to do interesting research on Kyrgyz.

b) Kyrgyzstan, being in many ways a bilingual country, has a very special and interesting phenomenon: code-switching -- When people mix Kyrgyz and Russian in one conversation, sentence, phrase, or even word. 

What can you do with a corpus?

Investigate...
a) structural properties of language
b) sociolinguistic issues
c) language variation within a population
d) collect stories, history

Bio: Joshua Meyer is currently a Ph.D. Student in Theoretical Linguistics at the University of Arizona. His research interests lie in bilingualism and psycholinguistics, especially in phonological processing. Josh’s methods are mainly experimental, using data from behavioral production and perception studies. He is a currently visiting research fellow at AUCA's Central Asian Studies Institute.

<< go to news list

American University of Central Asia
7/6 Aaly Tokombaev Street
Bishkek, Kyrgyz Republic 720060

Tel.: +996 (312) 915000 + Еxt.
Fax: +996 (312) 915 028
AUCA Contacts