|Part of a series on|
A vocabulary (also known as a lexicon) is a set of words, typically the set in a language or the set known to an individual. The word vocabulary originated from the Latin vocabulum, meaning "a word, name." It forms an essential component of language and communication, helping convey thoughts, ideas, emotions, and information. Vocabulary can be oral, written, or signed and can be categorized into two main types: active vocabulary (words one uses regularly) and passive vocabulary (words one recognizes but doesn't use often). An individual's vocabulary continually evolves through various methods, including direct instruction, independent reading, and natural language exposure, but it can also shrink due to forgetting, trauma, or disease. Furthermore, vocabulary is a significant focus of study across various disciplines, like linguistics, education, psychology, and artificial intelligence. Vocabulary is not limited to single words; it also encompasses multi-word units known as collocations, idioms, and other types of phraseology. Acquiring an adequate vocabulary is one of the largest challenges in learning a second language.
Definitions and usage
One's vocabulary typically refers to the set of words that an individual knows and uses in a particular language. It is a fundamental aspect of language acquisition and literacy development.
In linguistics, vocabulary refers to all the words in a language or in a person's lexical repertoire. It encompasses both a speaker's passive vocabulary, which includes the words they recognize or understand, and their active vocabulary, which includes the words they use regularly in speech and writing.
In the context of education, vocabulary refers to the body of words, including their meanings and use, that a student learns and uses. Vocabulary acquisition is a central aspect of language education, as it directly impacts reading comprehension, expressive and receptive language skills, and academic achievement.
Within psychology, especially cognitive psychology, vocabulary is understood as a measure of language processing and cognitive development. It can serve as an indicator of intellectual ability or cognitive status, with vocabulary tests often forming part of intelligence and neuropsychological assessments.
Computational Linguistics and Artificial Intelligence
In computational linguistics and artificial intelligence, a vocabulary is a predetermined set of words used for natural language processing tasks, such as speech recognition or text analysis. In machine learning models, the vocabulary is the set of unique words from the training dataset, which forms the basis for feature extraction and model training.
In semiotics, vocabulary refers to the complete set of symbols and signs in a sign system or a text, extending the definition beyond purely verbal communication to encompass other forms of symbolic communication.
Definition of "word"
Word has a variety of meanings, and our understand of ideas such as vocabulary size differ depending on the definition used.
The most common definition equates words with lemmas (the inflected or dictionary form; this includes walk, but not walks, walked or walking). Most of the time lemmas do not include proper nouns (names of people, places, companies, etc.). Another definition often used in research of vocabulary size is that of word family. These are all the words that can be derived from a ground word (e.g., the words effortless, effortlessly, effortful, effortfully are all part of the word family effort). Estimates of vocabulary size range from as high as 200 thousand to as low as 10 thousand, depending on the definition used.
The type-token distinction is a concept in linguistics that pertains to the counting or measuring of words in a text. It's useful for studying language and discourse, assessing complexity and richness of a vocabulary, or for certain computational applications.
The token count in a text is the total number of words, without any consideration of their uniqueness. Each individual occurrence of a word is counted separately, so if a word repeats, each instance is counted. For example, in the sentence I heard you when you called your son, there are eight tokens. This is because there are eight individual words (I, heard, you, when, you, called, your, son).
The type count includes unique words only, and usually, this is unique lemmas. If a lemma appears multiple times in a text, it is only counted once in a type count. So, in the same sentence I heard you when you called your son, there are six types. This is because there are six unique lemmas in the sentence (I, heard, you, when, called, son). The lemma you is only counted once despite appearing three times in the sentence as you, you, and your.
Vocabulary in an individual
Productive and receptive knowledge
The first major change distinction that must be made when evaluating word knowledge is whether the knowledge is productive (also called achieve or active) or receptive (also called receive or passive); even within those opposing categories, there is often no clear distinction. Words that are generally understood when heard or read or seen constitute a person's receptive vocabulary. These words may range from well known to barely known (see degree of knowledge below). A person's receptive vocabulary is usually the larger of the two. For example, although a young child may not yet be able to speak, write, or sign, they may be able to follow simple commands and appear to understand a good portion of the language to which they are exposed. In this case, the child's receptive vocabulary is likely tens, if not hundreds of words, but their active vocabulary is zero. When that child learns to speak or sign, however, the child's active vocabulary begins to increase. It is also possible for the productive vocabulary to be larger than the receptive vocabulary, for example in a second-language learner who has learned words through study rather than exposure, and can produce them, but has difficulty recognizing them in conversation.
Productive vocabulary, therefore, generally refers to words that can be produced within an appropriate context and match the intended meaning of the speaker or signer. As with receptive vocabulary, however, there are many degrees at which a particular word may be considered part of an active vocabulary. Knowing how to pronounce, sign, or write a word does not necessarily mean that the word that has been used correctly or accurately reflects the intended message; but it does reflect a minimal amount of productive knowledge.
Degree of knowledge
Within the receptive–productive distinction lies a range of abilities that are often referred to as degree of knowledge. This simply indicates that a word gradually enters a person's vocabulary over a period of time as more aspects of word knowledge are learnt. Roughly, these stages could be described as:
- Never encountered the word.
- Heard the word, but cannot define it.
- Recognizes the word due to context or tone of voice.
- Able to use the word and understand the general and/or intended meaning, but cannot clearly explain it.
- Fluent with the word – its use and definition.
Depth of knowledge
The differing degrees of word knowledge imply a greater depth of knowledge, but the process is more complex than that. There are many facets to knowing a word, some of which are not hierarchical so their acquisition does not necessarily follow a linear progression suggested by degree of knowledge. Several frameworks of word knowledge have been proposed to better operationalise this concept. One such framework includes nine facets:
- orthography – written form
- phonology – spoken form
- reference – meaning
- semantics – concept and reference
- register – appropriacy of use or register
- collocation – lexical neighbours
- word associations
- syntax – grammatical function
- morphology – word parts
Types of vocabulary
A person's reading vocabulary is all the words recognized when reading. This class of vocabulary is generally the most ample, as new words are more commonly encountered when reading than when listening.
A person's listening vocabulary comprises the words recognized when listening to speech. Cues such as the speaker's tone and gestures, the topic of discussion, and the conversation's social context may convey the meaning of an unfamiliar word.
A person's speaking vocabulary comprises the words used in speech and is generally a subset of the listening vocabulary. Due to the spontaneous nature of speech, words are often misused slightly and unintentionally, but facial expressions and tone of voice can compensate for this misuse.
The written word appears in registers as different as formal essays and social media feeds. While many written words rarely appear in speech, a person's written vocabulary is generally limited by preference and context: a writer may prefer one synonym over another, and they will be unlikely to use technical vocabulary relating to a subject in which they have no interest or knowledge.
The American philosopher Richard Rorty characterized a person's "final vocabulary" as follows:
All human beings carry about a set of words which they employ to justify their actions, their beliefs, and their lives. These are the words in which we formulate praise of our friends and contempt for our enemies, our long-term projects, our deepest self-doubts and our highest hopes… I shall call these words a person's "final vocabulary". Those words are as far as he can go with language; beyond them is only helpless passivity or a resort to force. (Contingency, Irony, and Solidarity p. 73)
During its infancy, a child instinctively builds a vocabulary. Infants imitate words that they hear and then associate those words with objects and actions. This is the listening vocabulary. The speaking vocabulary follows, as a child's thoughts become more reliant on their ability to self-express without relying on gestures or babbling. Once the reading and writing vocabularies start to develop, through questions and education, the child starts to discover the anomalies and irregularities of language.
In first grade, a child who can read learns about twice as many words as one who cannot. Generally, this gap does not narrow later. This results in a wide range of vocabulary by age five or six, when an English-speaking child will have learned about 1500 words.
Vocabulary grows throughout one's life. Between the ages of 20 and 60, people learn about 6,000 more lemmas, or one every other day. An average 20-year-old knows 42,000 lemmas coming from 11,100 word families. People expand their vocabularies by e.g. reading, playing word games, and participating in vocabulary-related programs. Exposure to traditional print media teaches correct spelling and vocabulary, while exposure to text messaging leads to more relaxed word acceptability constraints.
- An extensive vocabulary aids expression and communication.
- Vocabulary size has been directly linked to reading comprehension.
- Linguistic vocabulary is synonymous with thinking vocabulary.
- A person may be judged by others based on their vocabulary.
- Wilkins (1972) said, "Without grammar, very little can be conveyed; without vocabulary, nothing can be conveyed."
Estimating average vocabulary size poses various difficulties and limitations due to the different definitions and methods employed such as what is the word, what is to know a word, what sample dictionaries were used, how tests were conducted, and so on. Native speakers' vocabularies also vary widely within a language, and are dependent on the level of the speaker's education.
A 2016 study shows that 20-year-old English native speakers recognize on average 42,000 lemmas, ranging from 27,100 for the lowest 5% of the population to 51,700 lemmas for the highest 5%. These lemmas come from 6,100 word families in the lowest 5% of the population and 14,900 word families in the highest 5%. 60-year-olds know on average 6,000 lemmas more. 
According to another, earlier 1995 study junior-high students would be able to recognize the meanings of about 10,000–12,000 words, whereas for college students this number grows up to about 12,000–17,000 and for elderly adults up to about 17,000 or more.
For native speakers of German, average absolute vocabulary sizes range from 5,900 lemmas in first grade to 73,000 for adults.
The effects of vocabulary size on language comprehension
The knowledge of the 3000 most frequent English word families or the 5000 most frequent words provides 95% vocabulary coverage of spoken discourse. For minimal reading comprehension a threshold of 3,000 word families (5,000 lexical items) was suggested and for reading for pleasure 5,000 word families (8,000 lexical items) are required. An "optimal" threshold of 8,000 word families yields the coverage of 98% (including proper nouns).
Second language vocabulary acquisition
Learning vocabulary is one of the first steps in learning a second language, but a learner never finishes vocabulary acquisition. Whether in one's native language or a second language, the acquisition of new vocabulary is an ongoing process. There are many techniques that help one acquire new vocabulary.
Although memorization can be seen as tedious or boring, associating one word in the native language with the corresponding word in the second language until memorized is considered one of the best methods of vocabulary acquisition. By the time students reach adulthood, they generally have gathered a number of personalized memorization methods. Although many argue that memorization does not typically require the complex cognitive processing that increases retention (Sagarra and Alba, 2006), it does typically require a large amount of repetition, and spaced repetition with flashcards is an established method for memorization, particularly used for vocabulary acquisition in computer-assisted language learning. Other methods typically require more time and longer to recall.
Some words cannot be easily linked through association or other methods. When a word in the second language is phonologically or visually similar to a word in the native language, one often assumes they also share similar meanings. Though this is frequently the case, it is not always true. When faced with a false friend, memorization and repetition are the keys to mastery. If a second language learner relies solely on word associations to learn new vocabulary, that person will have a very difficult time mastering false friends. When large amounts of vocabulary must be acquired in a limited amount of time, when the learner needs to recall information quickly, when words represent abstract concepts or are difficult to picture in a mental image, or when discriminating between false friends, rote memorization is the method to use. A neural network model of novel word learning across orthographies, accounting for L1-specific memorization abilities of L2-learners has recently been introduced (Hadzibeganovic and Cannas, 2009).
The keyword method
One way of learning vocabulary is to use mnemonic devices or to create associations between words, this is known as the "keyword method" (Sagarra and Alba, 2006). It also takes a long time to implement — and takes a long time to recollect — but because it makes a few new strange ideas connect it may help in learning. Also it presumably does not conflict with Paivio's dual coding system because it uses visual and verbal mental faculties. However, this is still best used for words that represent concrete things, as abstract concepts are more difficult to remember.
Several word lists have been developed to provide people with a limited vocabulary for rapid language proficiency or for effective communication. These include Basic English (850 words), Special English (1,500 words), General Service List (2,000 words), and Academic Word List. Some learner's dictionaries have developed defining vocabularies which contain only most common and basic words. As a result, word definitions in such dictionaries can be understood even by learners with a limited vocabulary. Some publishers produce dictionaries based on word frequency or thematic groups.
Focal vocabulary is a specialized set of terms and distinctions that is particularly important to a certain group: those with a particular focus of experience or activity. A lexicon, or vocabulary, is a language's dictionary: its set of names for things, events, and ideas. Some linguists believe that lexicon influences people's perception of things, the Sapir–Whorf hypothesis. For example, the Nuer of Sudan have an elaborate vocabulary to describe cattle. The Nuer have dozens of names for cattle because of the cattle's particular histories, economies, and environments[clarification needed]. This kind of comparison has elicited some linguistic controversy, as with the number of "Eskimo words for snow". English speakers with relevant specialised knowledge can also display elaborate and precise vocabularies for snow and cattle when the need arises.
- Bilingual lexical access
- Differences between American and British English (vocabulary)
- Language proficiency: The ability of an individual to speak or perform in an acquired language
- Longest word in English: Many of the longest words in the English language
- Mental lexicon
- "Vocabulary". Longman Dictionary of Contemporary English.
- Matthews, Peter (2014). The concise Oxford dictionary of linguistics. Oxford paperback reference (3rd ed.). Oxford: Oxford Univ. Press. ISBN 978-0-19-967512-8.
- Grabe, William; Stoller, Fredricka L. (18 January 2018), "Teaching Vocabulary for Reading Success", The TESOL Encyclopedia of English Language Teaching, Hoboken, NJ, USA: John Wiley & Sons, Inc., pp. 1–7, doi:10.1002/9781118784235.eelt0773, ISBN 9781118784228, retrieved 18 May 2023
- Corsini, Raymond J. (2002). The dictionary of psychology. New York: Brunner-Routledge. ISBN 978-1-58391-328-4.
- Collin, S. M. H. (ed.). Dictionary of Computing (6th ed.). Bloomsbury.
- Danesi, Marcel (2000). Encyclopedic dictionary of semiotics, media, and communications. Toronto studies in semiotics. Toronto: Univ. of Toronto Press. ISBN 978-0-8020-8329-6.
- Brysbaert M, Stevens M, Mandera P and Keuleers E (2016) How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant's Age. Front. Psychol. 7:1116. doi: 10.3389/fpsyg.2016.01116 
- Barnhart, Clarence L. (1968).
- The World Book Dictionary. Clarence L. Barnhart. 1968 Edition. Published by Thorndike-Barnhart, Chicago, Illinois.
- "Final vocabulary". OpenLearn. Retrieved 6 April 2019.
- "Vocabulary". Sebastian Wren, Ph.D. BalancedReading.com http://www.balancedreading.com/vocabulary.html
- Brysbaert, Marc; Stevens, Michaël; Mandera, Paweł; Keuleers, Emmanuel (29 July 2016). "How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant's Age". Frontiers in Psychology. 7: 1116. doi:10.3389/fpsyg.2016.01116. PMC 4965448. PMID 27524974.
- Joan H. Lee (2011). What does txting do 2 language: The influences of exposure to messaging and print media on acceptability constraints (PDF) (Master's thesis). University of Calgary. Retrieved 20 November 2013.
- Stahl, Steven A. Vocabulary Development. Cambridge: Brookline Books, 1999. p. 3. "The Cognitive Foundations of Learning to Read: A Framework", Southwest Educational Development Laboratory, , p. 14.
- Wilkins, David A. (1972). Linguistics in Language Teaching. Cambridge, MA: MIT Press, 111.
- Goulden, Robin; Nation, Paul; Read, John (1 December 1990). "How Large Can a Receptive Vocabulary Be?" (PDF). Applied Linguistics. 11 (4): 341–363. doi:10.1093/applin/11.4.341.
- D'Anna, Catherine; Zechmeister, Eugene; Hall, James (1 March 1991). "Toward a meaningful definition of vocabulary size". Journal of Literacy Research. 23 (1): 109–122. doi:10.1080/10862969109547729. S2CID 122864817.
- Nation, I. S. P. (1993). "Using dictionaries to estimate vocabulary size: essential, but rarely followed, procedures" (PDF). Language Testing. 10 (1): 27–40. doi:10.1177/026553229301000102. S2CID 145331394.
- Milton, James; Treffers-Daller, Jeanine (29 January 2013). "Vocabulary size revisited: the link between vocabulary size and academic achievement". Applied Linguistics Review. 4 (1): 151–172. doi:10.1515/applirev-2013-0007. S2CID 59930869.
- Zechmeister, Eugene; Chronis, Andrea; Cull, William; D'Anna, Catherine; Healy, Noreen (1 June 1995). "Growth of a functionally important lexicon". Journal of Literacy Research. 27 (2): 201–212. doi:10.1080/10862969509547878. S2CID 145149827.
- Segbers, J.; Schroeder, S. (28 April 2016). "How many words do children know? A corpus-based estimation of children's total vocabulary size". Language Testing. 34 (3): 297–320. doi:10.1177/0265532216641152. S2CID 148512023.
- Adolphs, Svenja; Schmitt, Norbert (2003). "Lexical Coverage of Spoken Discourse" (PDF). Applied Linguistics. 24 (4): 425–438. doi:10.1093/applin/24.4.425.
- Laufer, Batia (1992). "How Much Lexis is Necessary for Reading Comprehension?". In Bejoint, H.; Arnaud, P. (eds.). Vocabulary and Applied Linguistics. Macmillan. pp. 126–132.
- Laufer, Batia; Ravenhorst-Kalovski, Geke C. (April 2010). "Lexical threshold revisited: Lexical text coverage, learners' vocabulary size and reading comprehension" (PDF). Reading in a Foreign Language. 22 (1): 15–30.
- Hirsh, D.; Nation, I.S.P. (1992). "What vocabulary size is needed to read unsimplified texts for pleasure?" (PDF). Reading in a Foreign Language. 8 (2): 689–696.
- Sagarra, Nuria and Alba, Matthew. (2006). "The Key Is in the Keyword: L2 Vocabulary Learning Methods With Beginning Learners of Spanish". The Modern Language Journal, 90, ii. pp. 228–243.
- Hadzibeganovic, Tarik; Cannas, Sergio A (2009). "A Tsallis' statistics-based neural network model for novel word learning". Physica A. 388 (5): 732–746. Bibcode:2009PhyA..388..732H. doi:10.1016/j.physa.2008.10.042.
- Paivio, A. (1986). Mental Representations: A Dual Coding Approach. New York: Oxford University Press.
- Bogaards, Paul (July 2010). "The evolution of learners' dictionaries and Merriam-Webster's Advanced Learner's English Dictionary" (PDF). Kernerman Dictionary News (18): 6–15.
- "The Oxford 3000". Oxford Learner's Dictionaries.
- "Clear Definitions". Macmillan Dictionary.
- Routledge Frequency Dictionaries
- (in German) Langenscheidt Grundwortschatz
- (in German) Langenscheidt Grund- und Aufbauwortschatz
- (in German) Hueber Grundwortschatz
- Miller (1989)
- Barnhart, Clarence Lewis (ed.) (1968). The World Book Dictionary. Chicago: Thorndike-Barnhart, OCLC 437494
- Brysbaert M, Stevens M, Mandera P and Keuleers E (2016) How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant's Age. Front. Psychol. 7:1116. doi: 10.3389/fpsyg.2016.01116.
- Flynn, James Robert (2008). Where have all the liberals gone? : race, class, and ideals in America. Cambridge University Press; 1st edition. ISBN 978-0-521-49431-1 OCLC 231580885
- Lenkeit, Roberta Edwards (2007) Introducing cultural anthropology Boston: McGraw-Hill (3rd. ed.) OCLC 64230435
- Liu, Na; Nation, I. S. P. (1985). "Factors affecting guessing vocabulary in context" (PDF). RELC Journal. 16: 33–42. doi:10.1177/003368828501600103. S2CID 145695274.
- Miller, Barbara D. (1999). Cultural Anthropology(4th ed.) Boston: Allyn and Bacon, p. 315 OCLC 39101950
- Schonell, Sir Fred Joyce, Ivor G. Meddleton and B. A. Shaw, A study of the oral vocabulary of adults : an investigation into the spoken vocabulary of the Australian worker, University of Queensland Press, Brisbane, 1956. OCLC 606593777
- West, Michael (1953). A general service list of English words, with semantic frequencies and a supplementary word-list for the writing of popular science and technology London, New York: Longman, Green OCLC 318957