Languages & Cultures Title (click on left side to go to Funredes Home)
Project Overview
Email FUNREDES
Languages & Cultures Home L1 Study L2 Study L3 Study L4 Study C1 Study C2 Study

QUICK MENU

PROLOGUE AND ANTECEDENTS

AUTHORS

RESULTS OF THE LINGUISTIC STUDY

DETAILS OF THE RESULTS

PROSPECTS FOR A FOLLOWING STUDY

REFERENCES OF RELATED WORKS

LIST OF APPENDICES

LIST OF TABLES

TABLE OF CONTENTS

L4 Title

THE PLACE OF LATIN LANGUAGES AND CULTURES ON THE INTERNET

 

3. Results of the linguistic study

The results are obtained throughout a sample of 57 terms, chosen in order to make the most rigorous comparison between the languages possible. For each term, the connection of latin languages and English is used as a random variable. The statistical techniques are then applied, assuming a common mathematical distribution of this random variable (the Gauss curve, also said «normal» distribution).

Two Internet areas are considered for the measurements : the WWW (the Web), reseacrhed with the HotBot search engine, and the Usenet (the news), researched with the DejaNews search engine.

The chapter called Internet Methodology explains the choice of search engines and comments upon the limits of the research deriving from downsides of these engines.

Thanks to the important efforts done in choosing the terms for the sample (which is explained in the chapter Linguistic methodology), statistically, the results are of high quality. This means that the dispersion of connections of each language to the English is relatively weak. This allows the creation of so called " interval of realibility" - or of a quite narrow space where the probability that the exact value will be included is 99/100. The results, from this point of view, are better for the WWW than for the Usenet, which is not surprising result if we keep in mind the frequent use of abbreviations in newsgroups.

The quality of results (and therefore the importance of the interval of realibility) varies depending on the language: it is very high for French, high for Spanish and Italian in both WWW and USenet, high for Portuguese in WWW , and low for Rumanian and Portuguese in Usenet. For further details, see Appendix 2 and Appendix 4.

3.1 Synthesis of the results

The next table presents the average ratio between each Latin language and English. It has been obtained by measuring the frequency of the sample terms in both fields of research. French has the best position on the Web, closely followed by Spanish. On the other hand, its' weak presence on the Usenet is a phenomenon worth mentioning.
 
 

Table 1 : Means of Latin languages
in comparison with English

WWW

USENET

SPANISH

3,37%

2,41%

FRENCH

3,75%

1,44%

ITALIAN

2,00%

2,54%

PORTUGUESE

1,09%

1,12%

RUMANIAN

0,20%

0,14%


3.2 Comments upon the absolute value of the presence of English

With so obtained results, we can estimate the presence of Latin languages in relation to English. However, before we can give a precise number of the absolute presence of the Latin languages in the Web, it’s necessary to build up an hypothsesis about the presence of English. Table 2 shows the values of its absolute presence. It is derived from means calculated for several different hypothesis on presence of English. The result written in red can be considered as the most probable and realistic, as the data has been extensively crosschecked.

Nowadays, no method is absolutely relaible, and no matter which one we choose, there will always be a difficulty in counting on multilingual pages. However, by crosschecking the collected data, it’s possible to estimate the English presence value, at least as a superficial approximation. The AltaVista language algorithm is, among others, an element that allows to determin this interval (see in L3 study the "method of the complement of the empty universe").

According to measurements effectued after this method, the percentage of pages in English can hardly be lower than 65%. It can also hardly be superior to 85%, because of other important languages are used: Japonese, German, French and Spanish, may have an amplitude superior to 15%.  Around 75% would be, today, a reasonable number, if we consider the percentage occupied by languages representing between 0,5% and 1% (between 7 and 10 languages for a total of 5%), then the percentage of the languages with very weak presence, as Rumanian (between 10 and 15 languages at 0,15% for a total of 2%) and, finally, numerous languages whose presence is still marginal. This number is also the most difficult to estimate. If we take the hypothesis of 200 languages at 0,025%, we reach a total of 5%…One of the great unknowns, which consequences for the future are still to evaluate, is the possible multiplication of the languages on the Internet. The total number of languages spoken today is significantly superior to that of the State-nations, which is slightly inferior to 200.

This number of 75% is applied to the Web. For the Newsgroups area, we will raise ti to 80%.
 

Table 2 : Absolute presence of the Latin languages in WWW area
If ENGLISH = 

90,00%

85,00%

80,00%

75,00%

70,00%

65,00%

then SPANISH =

3,03%

2,86%

2,69%

2,53%

2,36%

2,19%

then FRENCH =

3,37%

3,19%

3,00%

2,81%

2,62%

2,44%

then ITALIAN =

1,80%

1,70%

1,60%

1,50%

1,40%

1,30%

then PORTUGUSE =

0,98%

0,93%

0,87%

0,82%

0,76%

0,71%

then RUMANIAN =

0,18%

0,17%

0,16%

0,15%

0,14%

0,13%


3.3 Relation between the number of Latin people on the Web and their influence

It is evident that the absolute presence values are not perfect indicators of a vigour of a particular language in the networks. To obtain a relevant result, it is advisable to put in proportion the values that express the presence of languages on the Internet with the values of their presence in the real world. However, attempts of measuring the importance of the authentic use of different languages in the world constantly clash with discussions about which criteria to use1 - and the experts have never succeeded to agree on this subject. In the context of the present study, as well as in order to estimate the exact place occupied by the latin languages on a world scale as precisely as possible, the authors have arbitrarily chosen a method explained in Appendix 3 (Number of locutors of studied languages).

In order to make the present statistics representative of the sociolinguistic characteristics mentioned in that appendix, the authors have decided to add figures corresponding to :

  • number of people who master one of the languages of the study because it is their mother tongue or because it is an official or taught language,
  • number of people whose mother tongue is one of the languages of the study, but who are living in a country where it is not recognized.
Table 3 : The Latin languages weight (numerals rounded off to the millions)

English

Spanish

Portuguese

French

Italian

Romanian

Absolute presence (number of speakers)

630

375

190

130

60

30

Relative presence (worldwide percentage)

10,50%

6,25%

3,17%

2,17%

1%

0,50%

So, the relative presence of these languages is calculated without taking the «multilingualism» factor into account.

On the hypothsesis of a world population of 6 000 000 000, the presence of languages on the Internet is weighted by dividing the value of presence of the on the Web by the value of the relative presence in the world. A quotient equal to 1 is considered as a « normal » result; if it is lower than 1, as a weak and if it is higher than 1, as a respectable result.
 

Table 4 : Weighted presence of the Latin languages in the WWW area

Absolute WWW presence

Weighted WWW presence

ENGLISH

75,00%

7,14

SPANISH

2,53%

0,40

FRENCH

2,81%

1,30

ITALIAN

1,50%

1,50

PORTUGUESE

0,82%

0,26

RUMANIAN

0,15%

0,30


 

3.4 Relation between the number of Latin people and the weight of their presence on the Usenet

The next table rapresents the result of the statistical calculation done on the basis of the number of quotations of the sample terms on the Usenet. In Appendix 4 you can find the tables establishing the intervals of realibility .  

Table 5 : Absolute presence of the Latin languages on the Usenet
 If ENGLISH= 

90,00 %

85,00 %

80,00 %

75,00 %

70,00 %

65,00 %

then ENGLISH =

2,17 %

2,05 %

1,93 %

1,81 %

1,69 %

1,57 %

then FRENCH =

1,29 %

1,22 %

1,15 %

1,08 %

1,01 %

0,93 %

then ITALIAN =

2,29 %

2,16 %

2,03 %

1,91 %

1,78 %

1,65 %

then PORTUGUESE =

1,01 %

0,95 %

0,90 %

0,84 %

0,79 %

0,73 %

then RUMANIAN =

0,13 %

0,12 %

0,11 %

0,11 %

0,10 %

0,09 %


 

Table 6: Weighted presence of the Latin languages on the Usenet

Absolute 

Usenet presence

Weighted 

Usenet presence

ENGLISH

80,00%

7,62

SPANISH

1,93%

0,31

FRENCH

1,15%

0,53

ITALIAN

2,03%

2,03

PORTUGUESE

0,90%

0,28

RUMANIAN

0,11%

0,23

1. Do we have to consider only the first languages ? Do we have to consider the numbers of the official languages even if some people do not speak the language (as, for example, French in Haiti)? Do we have to recognize to some languages a supranational, and therefore a common language role?

previous2by2transparent.gif (43 bytes)Next

[BACK TO TOP]


[email protected]
Copyright © 1996-1999 AGENCE DE LA FRANCOPHONIE, UNION LATINE, FUNREDES
Created: 5 X 1998
Last Modified: 02 VII 1999

Back
L1
2by2transparent.gif (43 bytes) L22by2transparent.gif (43 bytes) L32by2transparent.gif (43 bytes) L42by2transparent.gif (43 bytes) C12by2transparent.gif (43 bytes) C2
Languages & Culture Home
Funredes Home