Keywords: Corpora, text analysis, wordlist, keyness, dispersion plot, corpus building


Aim. The aim of this paper is to present and exemplify a number of basic uses of corpus-based text analysis tools that can supplement and provide additional insight for an otherwise qualitative analysis of a text. I attempt to show that nowadays certain corpus tools are easily accessible to any researcher and can be used to enrich the results of studies concerned with texts. 

Methods. This paper comprises the basics of corpus building, the main types of data that can be drawn from a simple corpus and a detailed description of four methods that can aid text analysis: wordlists, concordances, dispersion plots and keywords. Each of those four methods is thoroughly described, including a number of examples of its applications and indicates its possible limitations.

Results. The examples provided suggest that even performing a very simple corpus analysis of a text might unveil certain trends and phenomena not noticeable through the classic qualitative text analysis methods (e.g. close reading). The paper argues that corpus research can hence work as an extension of a quantitative analysis (or be its starting point) by examining themes and keywords present in a given text and enrich the results of a qualitative study with a fresh perspective. Finally, the paper claims that basic corpus analysis can, in fact, be successfully employed by researchers who do not have any prior experience with statistics or corpora.


Author Biography

Jędrzej Olejniczak, University of Wrocław

1. Doctoral Student (University of Wrocław)
2. Corpus linguistics, digital humanities, translation studies, literary and academic translation, translation teaching methodology
3. Language editor: Miscellanea Posttotalitariana Wratislaviensia, Between


[1] Al-Mosaiwi & M., Johnstone, T. (2018). In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation. Clinical Psychological Science, 1-14.
[2] Anthony, L. (2018). AntConc (Version 3.5.7) [Computer Software]. Tokyo: Waseda University. Available from
[3] Baker, M. (1993). Corpus Linguistics and Translation Studies: Implications and Applications. In Baker, M., Francis, G. and Tognini-Bonelli, E. (Eds.), Text and Technology: In Honour of John Sinclair (233-250). Amsterdam: John Benjamins Publishing Company.
[4] Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing 8(4), 243-257.
[5] Davies, M. (2004-). BYU-BNC. (Based on the British National Corpus from Oxford University Press). Available online at
[6] Fischer-Starcke, B. (2010). Corpus Linguistics in Literary Analysis: Jane Austen and her Contemporaries. London: Continuum Publishing.
[7] Jeans, C. (2016) Vegetarian Tigers of Paradise. Aberystwyth: Honno Welsh Women's Press.
[8] Milizia, D. (2010). Keywords and Phrases in Politcial Speeches. In Bondi, M., Scott, M. (Eds.), Keyness in Texts (127-145). Amsterdam/Philadelphia: John Benjamins Publishing Company.
[9] O’Sullivan, J., Bazarnik, K., Eder, M. & Rybicki, J. (2018). Measuring Joycean Influences on Flann O’Brien. Digital Studies, 8(1), 1–25.
[10] Project Gutenberg. (n.d.). Retrieved on 24 April 2018 from
[11] Quirk, R. & Greenbaum, S. (1973). A University Grammar of English. London: Longman.
[12] Rauscher, J., Swiezinski, L, Riedl, M. & Biemann, C. (2013). Exploring Cities in Crime: Significant Concordance and Co-occurrence in Quantitative Literary Analysis. In Kazantseva, A. & Szpakowicz, S. (Eds.), Proceedings of the Computational Linguistics for Literature Workshop at NAACL-HLT 2013 (61-71). Atlanta, GA, USA: Association for Computational Linguistics.
[13] Rybicki, J. (2012). The great mystery of the (almost) invisible translator: stylometry in translation. In Oakley, M. and Ji, M. (Eds.), Quantitative Methods in Corpus-Based Translation Studies (231-248). Amsterdam: John Benjamins.
[14] Scott, M. (2010a). WordSmith Tools Manual. Retrieved on 16 April 2018
[15] Scott, M. (2010b). WordSmith Tools Manual. Retrieved on 12 April 2018
[16] Scott, M. (2010c). WordSmith Tools Manual. Retrieved on 24 April 2018 from
[17] Scott, M. (2016). WordSmith Tools Version 7, Stroud: Lexical Analysis Software. Available from
[18] Sinclair, J. (2005). Corpus and text — basic principles. In Wynne, M. (Ed.) Developing linguistic corpora: A guide to good practice (1–16). Oxford: Oxbow Books.
[19] Stubbs, M. (2010). Three Concepts of Keywords. In Scott, M. and Bondi, M. (Eds.), Keyness in Texts (21-42). Amsterdam: John Benjamins Publishing.
[20] Vonnegut, K. (1969). Slaughterhouse-Five or the Children's Crusade. New York: Bantam Doubleday Dell Publishing Group.
How to Cite
Olejniczak, J. (2018). USING CORPORA TO AID QUALITATIVE TEXT ANALYSIS: AN INTERDISCIPLINARY APPROACH. Journal of Education Culture and Society, 9(2), 154-164. Retrieved from