Complex Sciences. First International Conference, Complex 2009, Shanghai, China, February 23-25, 2009. Revised Papers, Part 1

Research Article

Approaching the Linguistic Complexity

Download
435 downloads
  • @INPROCEEDINGS{10.1007/978-3-642-02466-5_104,
        author={Stanisław Drożdż and Jarosław Kwapień and Adam Orczyk},
        title={Approaching the Linguistic Complexity},
        proceedings={Complex Sciences. First International Conference, Complex 2009, Shanghai, China, February 23-25, 2009. Revised Papers, Part 1},
        proceedings_a={COMPLEX PART 1},
        year={2012},
        month={5},
        keywords={Complexity natural language Zipf law word classes},
        doi={10.1007/978-3-642-02466-5_104}
    }
    
  • Stanisław Drożdż
    Jarosław Kwapień
    Adam Orczyk
    Year: 2012
    Approaching the Linguistic Complexity
    COMPLEX PART 1
    Springer
    DOI: 10.1007/978-3-642-02466-5_104
Stanisław Drożdż,*, Jarosław Kwapień1, Adam Orczyk1
  • 1: Polish Academy of Science
*Contact email: Stanislaw.Drozdz@ifj.edu.pl

Abstract

We analyze the rank-frequency distributions of words in selected English and Polish texts. We compare scaling properties of these distributions in both languages. We also study a few small corpora of Polish literary texts and find that for a corpus consisting of texts written by different authors the basic scaling regime is broken more strongly than in the case of comparable corpus consisting of texts written by the same author. Similarly, for a corpus consisting of texts translated into Polish from other languages the scaling regime is broken more strongly than for a comparable corpus of native Polish texts. Moreover, based on the British National Corpus, we consider the rank-frequency distributions of the grammatically basic forms of words (lemmas) tagged with their proper part of speech. We find that these distributions do not scale if each part of speech is analyzed separately. The only part of speech that independently develops a trace of scaling is verbs.