Manual ofcomparative linguistics
Alexander Akulov


The book is about precise typological methods ofcomparative linguistics: Prefixation Ability Index and Verbal Grammar Correlation Index. These two methods allowus todetect very distant relationship oflanguages bydirect comparison oftheir structures and without making reconstructions.





Manual ofcomparative linguistics

Prefixation Ability Index (PAI) and Verbal Grammar Correlation Index (VGCI)

Alexander Akulov


Dedicated tothe memory

ofProf. AlexanderB. Valiotti



Alexander Akulov,2015



Created with intellectual publishing system Ridero




1.Why typology but not lexis should be the base ofgenetic classification oflanguages


Incontemporary linguistics can be seen an obsession ofproving relationship ofcertain languages bycomparison oflexis and an obsession toseparate typology from comparative linguistics. The main problem ofall such hypotheses is that they are not based on any firmly testable methods but just on certain particular points ofview and on artist sees so principle. Tendency tothink that typology should be separated from historical linguistics was inspired byJoseph Greenberg inthe West and bySegrei Starostin and Nostratic tradition inUSSR/Russia. Despite followers ofNostractics insist that their methods differ from those ofGreenberg actually their methods are almost the same: they take word lists, find some look-alike lexemes[1 - Also nobody actually cares that sometimes certain lexemes can look alike just bycoincidence: the shorter certain lexeme is the more is probability that it can look alike some random lexemes ofother languages.] and on the base ofthese facts conclude about genetic relationship ofcertain languages. Followers ofGreenberg and Starostin consider typological studies as rather useless glass beads game. Typological items are never considered as asystem byadepts ofmegalocomparison[2 - Megalocomparison is term invented byJ. Matisoff (Matisoff 1990) specially todenote attempts toprove distant genetic relationship basing on comparison oflexis, i.e.: attempts toprove genetic relationship ofcertain languages inGreenberg style ofso called mass comparison.]; usually some randomly chosen typological items are taken outside oftheir appropriate contexts. For instance, active or ergative typology, or the fact ofso called isolating or polysintetic typology (i.e.: items that are not usual for native languages ofresearchers and that shock researchers minds) are considered as interesting exotic items, while no attention is paid toholistic and systematic analysis oflanguage structures. Such approach makes typology be acuriosity store but not atool ofcomparative linguistics, however, initially, according tofounding-fathers oflinguistics, it is typology that should be the main tool ofcomparative linguistics. According tothe mythology created byadepts ofmegalocomparison comparative linguistics has actually little connection with typology and makes its statements with use oflexicostatistical hoodoo. Megalocomparativists often object on this critics saying that they also pay attention tostructural issues and they also compare morphemes beside lexis. However, we know very well what actually means megalocomparative comparison ofmorphemes: it means analysis inalexical way, i.e.: only material components are compared so there is no difference between such comparison ofmaterial components ofmorphemes and comparison oflexemes. The cause ofit is the fact that megalocomparativists ignore that any morpheme consists ofthree components: meaning, position and material expression and reduce morpheme totheir material implementation. Almost no attention is paid tothe fact that grammar is first ofall positional distribution ofcertain meanings. There is apresupposition that genetic relationship oftwo languages can be proved bydiscovering oflook-alike lexemes ofso called basic vocabulary and bydetecting certain regular phonetic correspondence. However, yet Atoine Meillet pointed on the fact that lexical and phonetic correspondences can appear due toborrowings and cant be proves ofrelationship:



Grammatical correspondences provide proof, and they alone prove rigorously, but only if one makes use ofthe details ofthe forms and if one establishes that certain particular grammatical forms used inthe languages considered go back toacommon origin. Correspondences invocabulary never provide absolute proof, because one can never be sure that they are not due toloans (Meillet 1954:27).


Correspondence invocabulary and regular phonetic correspondence can be between any randomly chosen languages. For instance it is possible tofind some regular correspondence between Japanese and Cantonese and even prove their relationship: boku Japanese personal pronoun I used bymales Cantonese buk servant, I; Japanese bo stick Cantonese baang stick; Japanese o-taku your family, your house or your husband Cantonese zaak house; Japanese taku swamp incompounds Cantonese zaak swamp; Japanese san three Cantonese sam; Japanese shin forest used incompounds Cantonese sam forest; Japanese roku six Cantonese l?k; Japanese ran orchid Cantonese laan orchid. If there would be no other languages ofso called Buyeo[3 - Buyeo stock is still ahypothetical stock that includes Japanese, Korean and Okinawan languages (Buyeo stock is discussed inchapter5.2)] stock[4 - Inthis text Iintentially use term stock instead oftermfamilythat is used insuch context usually; languages are not self-replicating systems like biological systems, so Ithink any biological analogies should be avoided.] and no languages ofChinese stock we would have no ability tosingle those words as items borrowed from Southern Chinese dialects since they have same regular and wide use as well as words ofJapanese origin. Inthe case ofJapanese and Cantonese we know history ofcorrespondent stocks rather well and have many firm evidences that Japanese isnt arelative ofChinese stock.

If someone thinks that the example ofJapanese and Cantonese is just aweird joke, then everyone can take alook at the procedure that was used byGreenberg inorder toprove that Waikuri language belonged toHokan stock[5 - Waikuri is an extinct language that existed inSouthern part ofBaja California. Hokan stock is ahypothetical stock ofadozen small language families that were spoken mainly inCalifornia, Arizona and Baja California (pic.1).]: the conclusion was based on comparison ofFOUR (!) words only (Poser, Campbell 1992: 217 218). Also we should keep inmind that Greenberg actually didnt care much about precise phonetic correspondence and superficial likeness was rather sufficient forhim.






Pic.1. Map representing location ofhypothetical Hokan stock (blue) and Waicuri language (red)



Phonetic correspondences themselves can be even between completely unrelated languages and so astock cant be proved byregular correspondences, but regular correspondences should be proven byexistence ofastock since true regular phonetic correspondences exist only inside stocks.

Then, it was Swadesh yet who warned that comparison ofvocabularies cant be proof ofgenetic relationship oflanguages and some other methods should be used for it, i.e.: analysis ofstructures. Swadesh method is method ofestimation ofapproximate time ofdivergence oflanguages which have been already proved tobe relatives. However, Swadeshs warning has been well forgotten. Also we should keep inmind that even so called basic lexicon is actually culturally determined (Hoijer 1956) and borrowings can be inside it (above considered example ofJapanese and Cantonese).

Moreover, we should keep inmind the fact that there are thousands oflanguages which history is completely unknown and which are described only intheir current phase and so there is no ability todistinguish borrowings intheir lexicon and so its completely impossible tosay anything about their genetic relationship basing on methodology ofcomparison oflexis.

Methodology that ignores structural/grammatical issues allows different scholars tomake completely different conclusions about the same language, for instance: Sumerian is thought tobe arelative ofKartvelian stock (Nicholas Marr), ofUralic stock (Simo Parpola), ofSino-Tibetan (Jan Braun), ofMon-Khmer (Igor M. Diakonoff) or even ofBasque (Aleksi Sahala). Another notable example is Ainu that is attributed toAltaic (James Patrie), toAustronesian (Murayama Shichiro), toMon-Khmer (Alexander Vovin)[6 - Due tocompletely isolated position among languages ofthe world Ainu is especially attractive material for perfunctory and amateurish hypotheses.]. The most notable fact is that all such attempts coexist and all are considered bypublic as rather reliable inthe same time, obviously it looks much alike aplot for avaudeville sketch rather than aserious matter ofascience.

Different methods can lead todifferent conclusions but if people use same methodology they supposedly are expected tomake same conclusions about the same material, however, we dont see it; it means only that methodology based on comparison oflexis isnt relevant for comparative linguistics.

Also aweird issue is that such lexical methodology has never been tested inan appropriate way. Being asked why you came tothe conclusion that it is possible toconclude something about certain languages genetic relationship basing on comparison oflexis only? megalocomparativists usually answer morphology doesnt matter and dont explain how they came tosuch conclusion; they actually look much alike adepts ofareligion but not alike scientists since science always supposes experiments and verifications while statements it is so because it is so obviously dont belong tothe field ofscience but actually are statements ofareligion.

All facts showus that comparison oflexicon is completely irrelevant methodology inthe field ofhistorical comparative studies oflanguages.

Why we can say that language is first ofall grammar, i.e.: system ofgrammar meanings and their distributions but not aheap oflexemes?

Yet William Jones, founding father oflinguistics, pointed on the fact that grammar is much more important than lexis:



The Sanscrit language, whatever be its antiquity, is ofawonderful structure; more perfect than the Greek, more copious than the Latin, and more exquisitely refined than either; yet bearing toboth ofthem astronger affinity, both inthe roots ofverbs, and inthe forms ofgrammar, than could possibly have been produced byaccident; so strong, indeed, that no philologer could examine them all three without believing them tohave sprung from some common source, which perhaps no longer exists. There is asimilar reason, though not quite so forcible, for supposing that both the Gothick and the Celtick, though blended with avery different idiom, had the same origin with the Sanscrit; and the old Persian might be added tothe same family, if this were the place for discussing any question concerning the antiquities ofPersia (Jones 1798: 422 423).


Main function ofany language is tobe mean ofcommunication, but inorder tobe able tocommunicate we have toset asystem ofrubrics/labels/markers first ofall, thats why main function ofany language is torubricate/tostructurize reality. Structural level/grammar is the mean that rubricates reality and so it is much more important than lexicon. Isuppose we can even say that structure appeared before languages ofmodern type, i.e.: when ancestors ofHomo sapiens developed possibility offree combination oftwo signals inside one utterance it already was primitive form ofmodern language. Structure is something alike bottle while lexicon is liquid/matter which is inside the bottle; inabottle can be put wine, water, gasoline or even sand but the bottle always remains bottle.

Tothose who think that structure is not important Ican give the following example taken from Japanese language: Gakusei ha essei wo gugutte purinto shita. Having googled an essay student printed [it]. What makes this phrase be aJapanese phrase? Japanese words gakusei student (aword ofChinese origin), essei essay (aword ofEnglish origin), purinto print or, may be, Japanese verb guguru togoogle? One can probably say that this example is very special since it was made without so called basic lexicon; however, such words are ofeveryday use and also, as it has been noted above, it is impossible todistinguish so called basic lexicon since all lexis is culturally determined and borrowings can be even inside ofso called basic lexis. Any language can potentially accept thousands offoreign words and still remains the same language until its structure remains the same.

All the above considered facts mean that comparison oflexis should not be base ofgenetic classification oflanguages and any researches about genetic affiliation should be based on comparative analysis ofstructures/grammar, i.e.: analysis and comparison ofgrammatical systems ofcompared languages is completely obligatory procedure toprove/test some hypothesis ofgenetic affiliation ofalanguage. Thats why incurrent monograph two powerful typological tools are represented.




2.Prefixation Ability Index (PAI) allowsus tosee whether two languages can potentially be genetically related





2.1. PAI Method







2.1.1. PAI method background


A. P. Volodin pointed on the fact that all languages can be subdivided into two sets bythe parameter ofpresence/absence ofprefixation: one group has prefixation and the other has not (Volodin 1997:9).

The first set was conventionally named set ofAmerican type linear model ofwordform[7 - This type oflinear model ofword form is named American type since it has been described mainly on the material ofNative American languages (especially ofNorth America)].

According toVolodin American type linear model ofword form is the following:



(p) + (r) + R +(s).



The second one was conventionally named set ofAltai type linear model ofword form[8 - This type oflinear model ofword form is named Altai type since this linear model has been described mainly on the material oflanguages ofso called Altaic stock.].

According toVolodin it is the following:



(r) + R +(s)



(p prefix, s suffix, R main root, r incorporated root; brackets mean that corresponding element can be absent or can be represented several times inside aparticular form).



Volodin supposed that there was aborder between two sets and that languages belonging tothe same set demonstrate certain structural similarities. Also he supposed that typological similarities could probably tellus something about possible routes ofethnic migrations.




2.1.2. PAI hypothesis development


Having got Volodins notion about two types oflinear model ofword form, Ifor quite along time thought that there was apretty strict water parting between languages that have prefixation and those that have not. For instance, Iseriously thought that Japanese had no prefixes and tried toconsider all prefixes ofJapanese as variations ofcertain roots, i.e. as components ofcompounds; until one day Ifinally realized that so called variations ofroots actually could never be placed innuclear position and so they all should be considered as true prefixes, so strict dichotomy was broken and Ihad toelaborate new theory.

As far as any language actually has some ability tomake prefixation so there is no strict border between languages with prefixation and languages without prefixation and we should give up ideas ofstrict subdivision ofall existing languages into two sets that have no intersection.

Hence thereupon, linear model ofword forms have the following structures:



(P) + (R) + r + (s) linear model ofword form ofAmerican type;



(p) + (r) + r + (S) linear model ofword form ofAltaic type.



Capital letters are markers ofpositions that are used more than positions marked bysmall letters.



Thereby, there is no principal structural difference between languages ofAmerican type and Altaic type, difference is indegree ofmanifestation ofcertain parameters and so, inorder toour conclusion will not be speculative, we should speak about degree ofprefixation producing ability / prefixation ability degree / prefixation ability index, i.e.: ofcertain measure ofprefixation.



Isuppose that each language has its own ability toproduce prefixation and that this ability doesnt change seriously during all stages ofits history.

Also Isuppose that prefixation ability demonstrates itself inany circumstances, i.e.: it is manifested byany means: bymeans oforiginal morphemes existing inacertain language or byborrowed morphemes.

If alanguage has certain prefixation ability it is shown anyway. Thats why Idont make difference between original and borrowed affixes.



Also for current consideration is not principal whether this or that affix is derivative or relative: if we take into account relative affixes only, then, for instance, Japanese is alanguage without prefixes.

Thats why we should define prefixes not byits derivative or byits relative role but byits positions inside word form, prefix is any morpheme that meets the following requirements:



1)it can be placed only left from nuclear position;

2) it never can be placed upon nuclear position;

3) between this morpheme and nuclear cant be placed any meaningful morpheme with its clitics (i.e.: between nuclear root and prefix cant be placed ameaningful morpheme with its auxiliary morphemes).



Iam specially tonote that there are no so called semi-prefixes. If amorpheme can be placed innuclear position it is meaningful morpheme and any combinations with it should be considered as compounds.



Thus can be resumed the following:



1)Each language has its own ability toproduce prefixation and this ability doesnt change seriously during all stages ofits history.



2) Prefixation ability is manifested byany means: bymeans oforiginal morphemes existing inacertain language or byborrowed morphemes. Thats why the method doesnt suppose distinction between original and loaned affixes.



3) Genetically related languages are supposed tohave rather close values ofPrefixation Ability Index.




2.1.3. PAI calculation algorithm


How Prefixation Ability Index (here and further inthis text abbreviation PAI is used) can be measured?



Value ofPAI is portion ofprefixes among affixes ofalanguage.



Hence, inorder toestimate portion/percentage ofprefixes ofacertain language we should do the following:



1) Count total number ofprefixes;

2) Count total number ofaffixes;

3) Calculate the ratio oftotal number ofprefixes tothe total number ofaffixes.



Why is it important tocount total number ofprefixes and then calculate the ratio tothe total number ofaffixes but not toestimate PAI byfrequency ofprefix forms inarandom text?

Acertain language can have quite high value ofPAI but inaparticular text word forms with prefixes can be oflow frequency. Our task is toestimate portion ofprefixes ingrammar but not portion ofprefix forms inarandom text. Prefixes/World index estimated byGreenberg was exactly that estimation ofprefix forms frequency inatext (Greenberg 1960).

Ofcourse, that index also can give some general notion ofprefixation ability ofalanguage, though it is extremely rough and inaccurate since inarandomly chosen text can be very little amount ofwords with prefixes: the longer text is the more precision is the conclusion but anyway error ofsuch estimation still remains very high; while when we count all exiting affixes ofacertain language potential error is extremely low and even if we occasionally forget some affixes it doesnt influence seriously on our results.

Moreover Iam tonote that despite Greenberg made great work on the field oftypology he didnt actually use those results inhis research; he was an adept ofmegalocomparison and made his conclusions basing on mass comparison oflexis but not on structural correlations; his interest intypology was aglass beads game and was separated from his actual field ofstudies.



Tothose who think, that its impossible toestimate number ofmorphemes since living language always changes, Iam totell that living language doesnt invent new morphemes every day, especially auxiliary morphemes. The fact that learning alanguage we can use descriptions ofits grammar written some decades ago is the best proof that grammar is avery conservative level ofany language.

Hence, we can estimate total number ofaffixes ofaliving language as far as we can get its description where all stable forms are represented. And there is no need tocare ofwhat can be inacertain language infuture, i.e.: we consider current stage ofliving language and dont care ofpossible future stages since they simply dont existyet.

As for possibility ofcount, Iam totell that even set ofwords is countable set while set ofmorphemes and especially auxiliary morphemes is not just countable set but also is finiteset.




2.1.4. PAI method testing: from ahypothesis toward atheory


Inorder totest PAI hypothesis Ipaid attention tosome languages offirmly assembled stocks: Austronesian, Indo-European and Afroasiatic.



2.1.4.1. PAI oflanguages ofAustronesian stock



Polynesian group



Eastern Polynesian Subgroup



Hawaiian 0.82(calculated after Krupa 1979)



Maori 0.88(calculated fater Krupa 1967)



Tahitian 0.66(calculated after Arakin 1981)



Samoan-Tokelauan subgroup



Samoan 0.5(calculated after Arakin 1973)



Tongic subgroup



Niuean 0.8(calculated after Polinskaya 1995)



Tongan 0.78 (calculated after Fell 1918)



Philippine group



South Mindanao subgroup



Tboli 0.72(calculated after Porter 1977)



Northern Luzon subgroup



Pangasinan 0.6(calculated after Rayner 1923)



Malayo-Sumbawan group



Malay subgroup



Indonesian 0.53(calculated after Ogloblin 2008)






Pic. 2. Map representing location ofAustronesian languages mentioned incurrent chapter: languages are marked byred, place names are maked byblack.



Chamic subgroup



Cham 0.6(calculated after Aymonier 1889; Alieva, B?i 1999)



Formosan group



Bunun 0.8(calculated after De Busser 2009)



Eastern Barito group



Malagasy 0.74(calculated after Arakin 1963)



2.1.4.2. PAI oflanguages Indo-European stock



German group



Dutch 0.49(calculated after Donaldson 1997)



German 0.51(calculated after Donaldson 2007)



English 0.61(calculated after Barhkhudarov etal. 2000)



Icelandic 0.63(calculated after Einarsson 1949)



Slavonic group



Czech 0.52(calculated after Harkins 1952)



Polish 0.57(calculated after Swan 2002)



Celtic group



Irish 0.67(McGonage 2005)



Welsh 0.35(calculated after King 2015)



Roman group



Latin 0.26(calculated after Bennet 1913)



Spanish 0.34(calculated after Kattn-Ibarra, Pountain 2003)



2.1.4.3. PAI oflanguages ofAfroasiatic stock



Semitic group



Central Semitic subgroup



Arabic (Classical) 0.26(calculated after Yushmanov 2008)



Phoenician 0.26(calculated after Shiftman 2010)



Eastern Semitic subgroup



Akkadian (Old Babylonian dialect) 0.2(calculated after Kaplan 2006)



Egypt group



Coptic (Sahidic dialect) 0.87(calculated after Elanskaya 2010)






Pic. 3. Diagram representing PAI values ofsome firmly assembled stocks




2.1.5. PAI ofagroup/stock


PAI ofagroup or astock can be calculated as arithmetical mean and its quite precise for rough estimation.



One can probably say that just arithmetic mean is quite rough estimation and inorder toestimate PAI inamore precise way it would be better totake values ofPAI ofparticular languages with coefficients that show proximity ofparticular languages tothe ancestor language ofthe stock. Coefficient ofproximity is degree ofcorrelation ofgrammar systems.



Lets test this hypothesis and see whether itso.



For instance, inthe case ofAustronesian it would be somehow like the following:



Malagasy^PAN[9 - PAN means Proto-Austronesian; ^ is sign ofgrammar/structure correlation] ?0.5;



Bunun^PAN ?0.8;



Philippine group^PAN ?0.7;



Indonesian^PAN ?0.6;



Cham^PAN ?0.4;



Polynesian languages^PAN ?0.5.



Indexes show degree ofproximity oflanguages (grammatical systems). Incurrent case these indexes are not results ofany calculations but just approximate speculative estimation ofdegrees ofproximity ofmodern Austronesian languages with Proto-Austronesian; it is supposed that Formosan languages and so called languages ofPhilippines type are the closest relatives ofPAN among modern Austronesian.



If we take each particular PAI value with corresponding coefficient ofproximity we get that PAI ofAustronesian is about 0.44.



If we take just arithmetical mean without proximity coefficients we get0.6.



0.6is obviously closer toreal values ofPAI ofAustronesian languages than 0.44. Hence thereby its possible tostate that just arithmetical mean is completely sufficient way tocalculate PAI ofagroup/stock while PAI calculated with use ofproximity coefficients gives results that differ seriously from reality.




2.1.6. PAI indiachrony


It can be supposed that PAI doesnt change much indiachrony.



PAI ofLate Classical Chinese is 0.5(calculated after Pulleyblank 1995).



PAI ofContemporary Mandarin is 0.5. (calculated after Ross, ShengMa 2006).



PAI ofEarly Old Japanese is 0.13(calculated after Syromyatnikov 2002).



PAI ofcontemporary Japanese is 0.13(calculated after Lavrentyev 2002).



Probably it should be also tested on other examples but even on the material ofthese examples we can see that PAI ofalanguage is same indifferent stages ofits history.




2.1.7. Summary ofPAI method


One can probably say that Coptic has broken our hypothesis, but actually PAI just has shownus that group ofCoptic language and Semitic group diverged very long ago, probably inNeolithic epochyet.



However, the tests have shown that values ofPAI ofrelated languages are actually rather close, i.e.: they do not differ more than fourfold (pic.3).



Thus, it is possible tosay that PAI is something alike safety valve ofcomparative linguistics: if its values dont differ more than fourfold then PAI has no distinction ability and actually there are no obstacles for further search for potential genetic relationship; but if values ofPAI differ fourfold and more, then should be found absolutely ferroconcrete proves ofgenetic relationship.



Also Iam specially tonote that PAI method doesnt require estimation ofmeasurement error as far as PAI allows fourfold gap ofvalues.




2.2. Why is it possible toprove that languages are not related?


2.2.1. Root ofproblem is changing ofconcepts



One can probably say that it is impossible toprove unrelatedness oftwo languages so Iam tomake some explanation on why it is possible.



Incontemporary comparative linguistics there is aweird presupposition that it is impossible toprove that certain languages are not genetically related. As Ican understand this point ofview was inspired byGreenberg as well as some other obscurantist ideas ofcontemporary historical linguistics. It seems quite weird that it is possible toprove relatedness but it is not possible toprove unrelatedness. Lets check whether it isso.



First ofall, Iam tonote that statement about impossibility ofproving unrelatedness is actually sophism based on changing ofconcepts, i.e.: when they speak about proves ofrelatedness then relatedness means tobelong tothe same stock and it is regular and normal meaning ofthe concept ofrelatedness inlinguistics; however, when they speak about unrelatedness then meaning ofrelatedness suddenly changes: they start tosuppose that actually all existing languages are related since they are supposed tobe derivates ofsame proto-language that existed inavery distant epoch inpast and due tothis fact we cant prove unrelatedness but can just state that alanguage doesnt belong toastock.



2.2.2. Concepts ofrelatedness and unrelatedness from the point ofview ofother sciences



Inorder toclear the meaning ofthe concept ofrelatedness its useful topay some attention toother sciences where this concept also is used. If we take alook at, for instance: biology, physics or technical sciences we can see that many items are distributed byclasses/classified despite they obviously have common origin; and considering them it is completely normal tospeak about relatedness and unrelatedness. All being have common origin and so they all are relatives inavery deep level but this fact doesnt mean they cannot be classified into kingdoms, phylums, classes, orders, suborders, families, subfamilies; the fact that ant, bear, pine tree, whale, sparrow have common ancestor doesnt mean it is impossible todistinguish bear from whale and whale from pine tree.



However, as far as languages arent self replicating systems like biological systems and are closer toartifacts so any parallels between biological systems and language always should be made with certain degree ofawareness since they are more allegories than analogies while correlations between languages and some artificial items are more precise, for instance: all existing cars are derivates ofsteam engine that existed inthe middle of19


 century, but it doesnt mean we cant classify cars/engines and speak ofrelatedness and unrelatedness ofcertain types.



These examples evidently showus the following:



1)When they say about an item that is related with another it means they both belong tothe same class.



2) It is possible tospeak about relatedness and unrelatedness ofcertain items even though all classes ofthem have common origin.



2.2.3. Concepts ofrelatedness and unrelatedness from point ofview ofset theory and abstract algebra



Concept ofrelatedness is actually equivalence relation since it meets necessary and sufficient requirements for abinary relation tobe considered as equivalence relation:



1) Reflexivety: a~ a: ais related witha;

2) Symmetry: if a~ b then b ~ a: if ais related with b then b is related witha;

3) Transitivity: if a~ b and b ~ c then a~ c: if ais related with b and b is related with c then ais related withc.



If an equivalence relation is defined on aset then it necessarily supposes grouping ofelements ofthe set into equivalence classes and these classes arent intersected (Hrbacek, Jech: 1999).



2.2.4. Particular conclusions on the concepts ofrelatedness and unrelatedness for linguistics



When it is said that certain languages are genetically related (or simply related) it means that these languages belong tothe same stock or even tothe same group.



Taking into the consideration what has been said in2.2.2we should keep inmind that inthe case oflanguages there are actually no positive evidences that all languages existing nowadays originated from the same ancestor, i.e.: monogenesis is still an unproved hypothesis, though anyway even if all languages can be reduced tothe same proto-language that existed inavery distant past it doesnt mean yet we cant speak oftheir relatedness/unrelatedness.



Then, taking into consideration what has been said in2.2.3we can say the following:



The set oflanguages existing nowadays on the planet is rather well described: we know that there are about 7102languages and about 151stocks and 83isolated languages (Ethnologue: 2015), so we can speak about 234stocks; and we hardly can expect discovering ofsome new unknown languages. Thus, we can say that we have rather complete image ofset oflanguages and that there are about 234classes ofequivalence/relatedness.



If we take an X stock, we obviously can show many languages which dont belong tothe stock, i.e.: languages which are not related with language x (arandom language ofX stock), for example: inthe case ofIndo-European stock there are many languages which are not related with English: Arabic, Basque, Finnish, Georgian, Turkish, Chinese, Japanese, Hawaiian, Eskimo, Quechua and so on. Inthe case ofSino-Tibetan stock there are many languages which arent related with Chinese: Arabic, English, Eskimo, Finnish, Japanese, Turkish, Vietnamese and soon.



Thus, we can conclude the following:



1)Relatedness means language belongs toastock unrelatedness means language doesnt belong toastock.



2) If set of234classes/stocks has been set up then it obviously supposes that there should be apossibility ofclassification, i.e.: we can say whether alanguage belongs toastock; moreover, we always can show some languages which dont belong tothe stock. If possibility toprove unrelatedness is denied then we actually cant establish scopes ofstocks and cant distinguish one stock from another; then even asingle stock hardly could have been assembled.



3) Any two randomly chosen languages can be related or not related, i.e.: there can be no third variant since relatedness/unrelatedness supposes the existence ofclasses which dont intersect. If alanguage ofX stock is related toalanguage ofY stock it means that these stocks are related.



4) Possible objection can be the following: one can probably say that it is impossible tomake precise conclusions inlinguistics. Actually, Idont think someone can seriously say this, however, if someone would speak out something like this Ican only point on the fact that very long ago people thought that precise conclusions are impossible inphysics. Possibility ofprecise estimations and precise conclusions depends on scholars will and on scholars intellectual courage only, but not on material itself; any material can be represented as item that cant be formalized, and many items have already been successfully formalized.



2.2.5. An important consequence: transitivity ofrelatendness/unrelatedness



If we have proven unrelatedness ofan x language belonging toX stock with y language that belongs toY stock then, due totransitivity ofunrelatedness, it means that x is not related with the whole Y stock.




2.3. Applying PAI method tosome unsettled hypotheses







2.3.1. PAI against Nostratic hypothesis


Basing on comparison oflexis adepts ofNostratic hypothesis state that Indo-European stock, Kartvelian stock, Uralic stock and Turkic stock are relatives. Lets see and test whether, for instance, Indo-European and Turkic stocks could be relatives.



PAI ofIndo-European is about 0.5(calculated after data represented in2.1.4.2);



PAI ofTurkic stock is about 0.012(calculated after Yazyki mira. Tyurkskiye yazyki 1996).



Values ofPAI differ more than tenfolds so these stocks evidently cant be genetically related.




2.3.2. Whether Ainu belongs toAltaic stock?


Having compared some randomly chosen lexemes, Patrie states that Ainu is arelative ofJapanese and Korean and thus belongs toAltaic stock (Patrie 1982).



Whether Japanese and Korean are part ofAltaic stock is still adiscussed issue and even relationship ofJapanese and Korean is still actually questionable. However, lets accept Patries proposition and lets look at PAI ofthese languages.



PAI ofAinu is 0.75(calculated after Tamura 2000);



PAI ofJapanese is 0.13(see 2.1.6);



PAI ofKorean is 0.13 (calculated after Mazur 2004).



Values ofPAI ofAinu and Japanese/Ainu and Korean differ sixfold.



Inthe case ofCoptic language and Semitic group values ofPAI differ fourfold and if there were no firm structural evidences relationship ofCoptic language and Semitic group would be very problematic.



Inthe case ofAinu and Altaic stock serious difference ofPAI values is obviously proof ofabsence ofrelatedness. Ainu and Korean, Ainu and Japanese are completely unrelated like, for instance, Spanish and Basque.



Moreover, we should keep inmind that Japanese and Korean have probably the highest values ofPAI among languages ofAltai stock so if we compare Ainu with some true languages ofAltaic stock the difference is much more striking.



And also the fact there is almost no structural correlation between Ainu and Japanese and between Ainu and Korean corroborates conclusion made with the use ofPAI.




2.3.3. PAI suggests that Buyeo stock seems tobereal


Japanese and Korean seem tobe closer relatives than it has been thought usually, since their PAI values completely coincide (see 2.3.2). And this fact correlates well with their structural and material correlation.

Anyway after discovering closeness ofPAI values proximity ofgrammar systems should be shown.

The question ofJapanese and Korean relationship is considered in(5.2).




2.3.4. PAI against Mudraks hypotheses


Mudrak believes that such languages as: Ainu, Nivkh, Chukchi-Koryak, Itelmen and Eskimo-Aleut are genetically related (Mudrak 2013).



2.3.4.1. Whether Ainu and Nivkh could be relatives



According toMudrak Ainu and Nivkh not just belong tothat hypothetical stock but belong tosame group inside the stock (pic.4).






Pic. 4. Scheme representing genetic relationships ofPaleosiberian stock languages accordingto Mudrak (source: [ ] http://polit.ru/article/2013/04/22/ps_mudrak_linguistics/ accessed December 2015)



PAI ofNivkh is 0.07 (calculated after Gruzdeva 1997);



PAI ofAinu is 0.75.



Values differ more than tenfold.



Also grammars ofAinu and grammar ofNivkh show serious differences.



Hypothesis ofNivkh and Ainu relationship is same as for instance hypothesis ofcommon ancestor ofEstonian and Latvian spoken out byNivkh or Ainu scientists (if Nivkh or Ainu would have scientists and European languages would be indigenous languages). Its completely na?ve and its based only on very perfunctory impression ofsome cultural similarities ofSakhalin Nivkh and Sakhalin Ainu.



2.3.4.2. Whether Ainu and Eskimo-Aleut could be relatives?



PAI ofAleut group and its relatives is zero (Golovko 1997: 115; Menovschikov 1997: 77). PAI ofAinu is 0.75. We have seen some well assembled groups and stocks and know how values ofPAI can differ if languages really form astock. As far as our current math, that we use tocount values ofPAI and estimate correlation ofPAI values, doesnt know division byzero so we can ascribe tothe PAI ofAleut an obviously absurd value (for instance: 0.000001) inorder toshow the utmost absurdity ofany attempts torepresent Ainu and Aleut as languages belonging tothe same stock.






Pic. 5. Diagram representing PAI values oflanguages that dont form stocks.



2.3.4.3. Against term Paleosiberian



The term Paleosiberian languages was invented todesignate isolated languages ofSiberia and Far East; it doesnt mean ahypothetical stock but it is just aset ofgenetically unrelated languages assembled bytheir geographic location. Now it would be better toavoid use ofthis term as far as it doesnt help toanalyse and discover but just inspires development ofmegalocomparative obscurantism.



It would be better touse term isolated languages and stocks ofSiberia and Far East rather than toexplain every time true meaning ofterm Paleosiberian since it looks much alike name ofstock, it looks too mystic and/or intriguing for random amaterish people could properly understand its meaning.




2.3.5. Potential relatives ofAinu seem tobe inSouth


2.3.5.1. Ainu and Austronesian



Murayama believed that Ainu could be adistant relative ofAustronesian (Murayama 1993).

Despite na?ve lexicostatistic approach the idea potentially can be rather realistic since PAI ofAinu is 0.75and PAI ofAustronesian stock is about0.6.



2.3.5.2. Ainu and Mon-Khmer



Vovin tried toshow that Ainu was adistant relative ofAustroasiatic (Vovin 1993). As well as inthe case ofMurayama the idea isnt completely off base since PAI ofKhmer is 0.66that correlates well with that ofAinu. However, Iam tonote that such researches inthe field oflinguistics should be correlated with data ofother sciences.



Any hypothesis about relationship ofcertain languages should be correlated with correspondent contexts andwith data ofother related sciences: physical anthropology, population genetics, cultural anthropologyand archaeology: if acertain date has been set asan approximate time ofexistence ofaProto-Ainu thenhow words ofcontemporary Ainu can be found inpreceding epochs? Also if certain ethnic group is thought tohave influenced Ainu language then this group hardly could influence rice cultivating terminology (Nonno 2015:44).





2.3.6. Particular conclusion aboutPAI method


1.PAI is something alike safety valve ofcomparative linguistics: if its values dont differ more than fourfold then there is absolutely no obstacles for further research about genetic relationship; if values differ fourfold and more then should be found absolutely ferroconcrete proves ofgenetic relationship; if values differ sevenfold tenfold or even more then those languages belong todifferent stocks.

It is possible tosay that PAI shows direction inwhich looking for potential relatives ofcertain language can be perspectives.



2.PAI can be helpful method inthose areas where are many isolated languages/stocks: inNorth America, inPapua and inAfrica.




  .


   .

   ,     (https://www.litres.ru/alexander-akulov/manual-of-comparative-linguistics/)  .

      Visa, MasterCard, Maestro,    ,   ,     ,  PayPal, WebMoney, ., QIWI ,       .



notes








1


Also nobody actually cares that sometimes certain lexemes can look alike just bycoincidence: the shorter certain lexeme is the more is probability that it can look alike some random lexemes ofother languages.




2


Megalocomparison is term invented byJ. Matisoff (Matisoff 1990) specially todenote attempts toprove distant genetic relationship basing on comparison oflexis, i.e.: attempts toprove genetic relationship ofcertain languages inGreenberg style ofso called mass comparison.




3


Buyeo stock is still ahypothetical stock that includes Japanese, Korean and Okinawan languages (Buyeo stock is discussed inchapter5.2)




4


Inthis text Iintentially use term stock instead oftermfamilythat is used insuch context usually; languages are not self-replicating systems like biological systems, so Ithink any biological analogies should be avoided.




5


Waikuri is an extinct language that existed inSouthern part ofBaja California. Hokan stock is ahypothetical stock ofadozen small language families that were spoken mainly inCalifornia, Arizona and Baja California (pic.1).




6


Due tocompletely isolated position among languages ofthe world Ainu is especially attractive material for perfunctory and amateurish hypotheses.




7


This type oflinear model ofword form is named American type since it has been described mainly on the material ofNative American languages (especially ofNorth America)




8


This type oflinear model ofword form is named Altai type since this linear model has been described mainly on the material oflanguages ofso called Altaic stock.




9


PAN means Proto-Austronesian; ^ is sign ofgrammar/structure correlation


