SUBTLEX- AL: Albanian word frequencies based on film subtitles

Dr.Sc. Rrezarta Avdyli, Dr.Sc. Fernando Cuetos


Recently several studies have shown that word frequency estimation based on subtitle files explains better the variance in word recognition performance than traditional words frequency estimates did. The present study aims to show this frequency estimate in Albanian from more than 2M words coming from film subtitles. Our results show high correlation between the RT from a LD study (120 stimuli) and the SUBTLEX- AL, as well as, high correlation between this and the unique existing frequency list of a hundred more frequent Albanian words. These findings suggest that SUBTLEX-AL it is good frequency estimation, furthermore, this is the first database of frequency estimation in Albanian larger than 100 words.


word frequency; subtitles; Albanian; word recognition;


Alameda, J., & Cuetos, F. (1995). Diccionario de frecuencias de las unidades lingüísticas del castellano.

Baayen, R., Piepenbrock, R., & Gulikers, L. (1995). The CELEX lexical database (release 2)[cd-rom]. Philadelphia, PA: Linguistic Data Consortium, University of Pennsylvania [Distributor].

Baayen, R., Piepenbrock, R., & Van Rijn, H. (1993). The celex lexical database (cdrom). Linguistic Data Consortium, University of Pennsylvania, Philadelphia, PA.

Balota, D.A., Cortese, M.J., Sergent-Marshall, S.D., Spieler, D.H., & Yap, M.J. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology: General, 133, 283-316.

Blair, I., Urland, G., & MA, J. (2002). Using Internet search engines to estimate word frequency. Behavior Research Methods, Instruments, & Computers, 34(2), 286-290.

Brysbaert, M., & New, B. (2009). Moving beyond Ku era and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior research methods, 41(4), 977.

Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kucera and Francis. Behavior Research Methods Instruments and Computers, 30, 272-277. Cai, Q., &

Brysbaert, M. (2010). SUBTLEX-CH: Chinese Word and Character Frequencies Based on Film Subtitles. PloS one, 5 (6).

Cortese, M.J., & Khanna, M.M. (2007) Age of acquisition predicts naming and lexical decision performance above and beyond 22 other predictor variables: An analysis of 2,342 words. Quarterly Journal of Experimental Psychology, 60, 1072-1082.

Cuetos, F., González-Nosti, M., Barbón, A., & Brysbaert, M. (2011). Spanish word frequencies based on film subtitles. Psicológica, 32, 133-143.

De Mauro, T., Vedovelli, F., & Voghera, M. (1993). Lessico di frequanza dell´italiano parlato. Milano: Etaslibri.

Dimitropoulou, M., Duñabeitia, JM., Avilés, A., Corral, J., & Carreiras, M. (2010) Subtitle-based word frequencies as the best estimate of reading behavior: the case of Greek. Frontiers in Language Sciences 1.

Füredi, M., & Kelemen, J. (1989). A mai magyar nyelv szépprózai gyakorisági szótára [A frequency dictionary of the literary language of Hungarian]. Budapest, Hungary:

Akadémiai Kiadó. Imbs, P. (1971). Études statistiques sur le vocabulaire français: Dictionnaire des fréquences. Vocabulaire littéraire des XIXe et XXe siècles.

Juilland, A., & Chang-Rodríguez, E. (1964). Frequency dictionary of Spanish words: Mouton.

Keuleers, E., Brysbaert, M., & New, B. (2010). SUBTLEX-NL: a new frequency measure for Dutch words based on film subtitles. Behaviour Research Methods, 42, 643-650.

Kučera, H., & Francis, W. (1967). Computational analysis of present-day American English: Brown University Press Providence, RI.

Ktori, M., and Pitchford, N. J. (2008). Effect of orthographic transparency on letter position encoding: a comparison of Greek and English monoscriptal and biscriptal readers. Lang. Cogn. Process. 23, 258–281.

New, B., Pallier, C., Brysbaert, M., & Ferrand, L. (2004). Lexique 2: A new French lexical database. Behavior Research Methods, Instruments, & Computers, 36 (3), 516.

New, B., Brysbaert, M., Veronis, J., and Pallier, C. (2007). The use of film subtitles to estimate word frequencies. Applied Psycholinguistics 28, 661–677.

Thorndike, E., & Lorge, I. (1944). The teacher's word book of 30,000 words.

Sebastián-Gallés, N., Martí, M., Carreiras, M., & Cuetos, F. (2000). LEXESP: Una base de datos informatizada del español. Universitat de Barcelona, Barcelona.

Snodgrass, J. C., and Vanderwart, M. (1980). A Standardized Set of 260 Pictures: norms for name agreement, image agreement, familiarity, and visual complexity. J. Exp. Psychol. Hum. Learn. 6, 174–215.

Spahiu, A. (2010). 100 fjalët më të shpeshta në gjuhën shqipe. Retrieved from

Yap, M.J., & Balota, D.A. (2009). Visual word regonition of multisyllabic words. Journal of Memory & Language, 60, 502-529.

Xiao, R., Rayson, P., & McEnery, T. (2009). A Frequency Dictionary of Mandarin Chinese.

Full Text: PDF

DOI: 10.21113/iir.v3i1.112

Article Metrics

Metrics Loading ...

Metrics powered by PLOS ALM


  • There are currently no refbacks.

Copyright (c) 2016 Rrezarta Avdyli, Dr.Sc. Fernando Cuetos

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.