These files describe the exact data from the Ruhlen database data for 2082 languages used in the paper ``A comparison of worldwide phonemic and genetic variation in human populations'' by N Creanza, M Ruhlen, TJ Pemberton, NA Rosenberg, MW Feldman, S Ramachandran.

Please see Supporting Information (SI) Appendix for details about data filtering and analyses.

*Version 1.0 of the package of files - created by SR and NC, January 6, 2015

Ruhlen_database_2082languages.txt - a file with 2082 rows and 737 tab-delimited columns.
The columns give the following information:
(1) Language record number in the Ruhlen database
(2) Language name in the Ruhlen database. Note that, when applicable, alternate name and dialect name were using to match Ruhlen language records to the Ethnologue (http://www.ethnologue.com/), but these are not listed in this file
(3) ISO code (ISO 639-3 code) matching the Ruhlen language record to a language in the Ethnologue
(4) ISO A3 code (ISO 639-3 + ISO 3166-1 alpha 3 code) matching the Ruhlen language record to a language in a specific country in the Ethnologue
(5) Highest-level language classification in the Ethnologue 
(6) Current speaker population size for the language in the Ethnologue
(7) Geographic region assignment, based on the United Nations geoscheme (as described in SI Materials and Methods).  Also see SI Materials and Methods (section ``Regression analyses using individual languages") for changes made to this assignment for calculations of geographic distances
(8) Latitude, Ethnologue
(9) Longitude, Ethnologue
(10-737) presence/absence of 728 phonemes (as described in Ruhlen_database_phonemes.txt)

Ruhlen_database_phonemes.txt - a file with 729 rows (including 1 header row) and 8 columns.
The columns give the following information, as indicated by the header row label:
(1) Column: column number in Ruhlen_database_2082languages.txt
(2) Phoneme: phoneme presented in Unicode
(3) Number_of_occurrences: frequency with which the phoneme is observed across 2082 languages in Ruhlen_database_2082languages.txt
(4) Consonant: a boolean indicator; 1 indicates the phoneme in column (2) is a consonant
(5) Vowel: a boolean indicator; 1 indicates the phoneme in column (2) is a vowel
(6) Modified_consonant: 1 indicates the phoneme in column (2) is a modified consonant
(7) Modified_Vowel: 1 indicates the phoneme in column (2) is a modified consonants
(8) Click: 1 indicates the phoneme in column (2) is a click

Columns 6-8 in Ruhlen_database_phonemes.txt were used for some regression analyses (See SI Appendix section 3.2 and Table S8 for more information). See SI Appendix section 1.2 ``Modified consonants and modified vowels'' for more details about columns 6 and 7.

RuhlenSources_DataCounts.xls - spreadsheet that indicates how we identified the 2222 languages with sources in the Ruhlen database and complete phonemic data (consonants and vowels) using a series of boolean variables. Please see SI Appendix section 1.6 ``Filtering of languages for analysis'' for more details.
