NETWORKS AND DEVELOPMENT FOUNDATION & ASSOCIATE WRITERS - FUNREDES & ALT

Homepage Français - Castellano Search Contact us: <[email protected]>


Measuring Languages in the Internet:

A Methodology Based on Counting Word Occurrences With Search Engines



The abstract was presented to answer the call for papers launched by the Internet Society for the INET'99 event. It was rejected, so the paper has not been written. Nevertheless, we wanted to make our work available on the Web. If you are interested in publishing about this topic, please write us (<[email protected]> and <[email protected]>). You may also want to visit the study which is mentioned in the text: http://funredes.org/LC.


Author Listing

Daniel Pimienta (<[email protected]>)
FUNREDES (Fundación Redes y Desarrollo)
Dominican Republic

Daniel Prado (<[email protected]>)
UL (Latin Union)
France



Many linguistic areas of the World are interested in knowing what the reality is regarding their projections on the Internet and what the progresses and trends are. The need is then for a replicable manner to measure languages and cultures presence. The paper presents an original methodology for such measurement. The method has improved since 1995, when it started with French and Spanish; it has been extended to all Latin languages and more recently to German.

The methodology makes use of the most powerful search engines to compute the number of occurrences of a selected set of words in the various languages chosen for the study. A number of obstacles exist which call for a minutious and systematic selection of the words. Once the word sample is established following the criteria, the statistical results are extremely convincing.

The first results of the study, in 1995, 1996 and 1997, which were oriented towards the comparison of French and Spanish to English, offered very approximate results... but there were the first ones on the field. The following step shows an impressive improvement by the systematic use of linguistical control, and provides a statistically sound result in the comparison of French, Spanish, Portuguese, Italian and Romanian to English (and hence together).

The actual step adds German into the picture in order to check the validity of the method to encompass other languages.


Homepage Français - Castellano Search Contact us: <[email protected]>

Copyright © 1998-1999 FUNREDES Last modified: Jul. 27, 1999
http://funredes.org/funredes/html/english/publications/measuringlanguage.html