Introduction
In our increasingly globalized world, it is becoming more common for children to grow up with two or more languages. Extensive research has established that children's linguistic abilities play a crucial role in their academic achievements (e.g., Cunningham & Stanovich, Reference Cunningham and Stanovich1997; Walker et al., Reference Walker, Greenwood, Hart and Carta1994; Whitehurst & Lonigan, Reference Whitehurst and Lonigan1998). Therefore, the lexical development, which refers to the acquisition and expansion of vocabulary, holds significant importance in overall language development for both monolingual and bilingual individuals. Understanding the factors influencing lexical development in bilingual children is essential for supporting their language skills and academic success. Vocabulary knowledge is typically distinguished into two categories: productive and receptive knowledge (Nation & Meara, Reference Nation, Meara and Schmitt2002). The focus is on comprehending the components of vocabulary, organizing lexical resources, and attaining proficiency and fluency in both receptive and productive language usage. Productive knowledge specifically pertains to the capacity to autonomously produce words during speaking or writing (Schmitt & Schmitt, Reference Schmitt and Schmitt2020). The process extends beyond mere word familiarity, encompassing a comprehensive grasp of the multifaceted aspects of productive vocabulary acquisition and its associated constructs (Nation, Reference Nation2013).
Evidence has shown that monolinguals and bilinguals use one common system of conceptual lemma representations (Costa et al., Reference Costa, Miozzo and Caramazza1999; Kroll & Stewart, Reference Kroll and Stewart1994; Lee & Williams, Reference Lee and Williams2001), although the lexemes of the two languages seem to be stored in separate lexical storage systems (Kroll & Stewart, Reference Kroll and Stewart1994; Potter, Reference Potter1979; Snodgrass, Reference Snodgrass1984). This divergence in conceptual representations leads to differences between monolinguals and bilinguals in the lexical processing – as measured, for instance, by vocabulary tasks. One such assessment methods are naming tasks, which are commonly used as a diagnostic tool for evaluating vocabulary, especially in bilingual children. These tasks involve participants naming photographs or line drawings. In contrast to the assessment of the receptive vocabulary, this active process of production is more demanding and requires subjects to demonstrate more aspects of knowledge. While the word comprehension process is primarily characterized by the recognition of the phonological representation and activation of the related semantic concept stored in long term memory, the production process involves not only the selection of the appropriate lemma but also the retrieval and maintenance of the phonological representation throughout the planning and execution of the phonetic plan (Dapretto & Bjork, Reference Dapretto and Bjork2000). Consequently, we get more information about the extent to which the children can actively use the words. For that reason, assessing productive vocabulary through picture naming tasks is a widespread and effective method to evaluate the vocabulary of bilingual children (Webb, Reference Webb2008).
Even high-proficient bilinguals usually perform more poorly in these tasks than monolinguals and show not only less accuracy but also a time delay in picture naming in both the non-dominant and the dominant language (e.g., Gollan et al., Reference Gollan, Montoya and Bonanni2005, Reference Gollan, Montoya, Cera and Sandoval2008; Ivanova & Costa, Reference Ivanova and Costa2008). Furthermore, bilinguals appear to have more difficulties in activating language specific representations (Gollan & Acenas, Reference Gollan and Acenas2004), and they show a naming disadvantage in comparison to monolingual children, which increases with age (Klassert et al., Reference Klassert, Gagarina and Kauschke2014). Because of these differences between monolingual and bilingual children, the use of common naming tasks might be problematic, especially when the instruments to assess the vocabulary have not been specifically designed for and normed on bilingual children.
The aim of our study was to illustrate the construction of a language assessment inventory that balances well established factors from language acquisition research in order to generate parallel and fair versions for L1 and L2. Here we will focus on the effects of these factors and analyze how they interact with the linguistic background of children in the societal language, by comparing the structure and item features of the naming task for monolingual German and bilingual Turkish—German children. We demonstrate how to validate a test by using explanatory item-response models (De Boeck & Wilson, Reference De Boeck and Wilson2004) and to examine the construct validity, if the test was to be used not only for monolingual but also for bilingual children based on a productive vocabulary task. In the following, we first discuss features that are known to influence the access to words, especially for bilingual children – namely, lexical category, frequency, age of acquisition, and complexity. We address the limitations of existing diagnostic approaches in the assessment of vocabulary. We then turn to present the principles of constructions used in our productive vocabulary task.
Influences on lexical access and phonological encoding
Transforming an idea into spoken words requires first to activate a lexical concept and to retrieve a lemma from the mental lexicon (Levelt et al., Reference Levelt, Roelofs and Meyer1999). Levelt (Reference Levelt1989) defined a lemma as a lexical item that contains semantic and syntactic specification. In the end of this conceptual processing, an activation of semantically related concepts is linked to specific lexemes. The language production process and its conceptual processing is generally agreed to consist of the two major components of lexical access and phonological encoding (e.g., Dell, Reference Dell1986; Levelt, Reference Levelt1992; Levelt et al., Reference Levelt, Roelofs and Meyer1999). Lexical access involves the search and selection of appropriate lemma within the mental lexicon (Levelt et al., Reference Levelt, Roelofs and Meyer1999), and phonological encoding provides acoustic and articulatory properties of the words (Wheeldon & Levelt, Reference Wheeldon and Levelt1995). To say a word like frog, the connected lexical nodes and its syntactic features must be activated in the semantic system and the phonological information must be encoded (Caramazza, Reference Caramazza1997; Levelt, Reference Levelt1989). When acquiring vocabulary, the corresponding information must be integrated into the language system, making it susceptible to various influencing factors. The following section will discuss significant factors that influence the word retrieval process, with a focus on those that have been reported as significant and for which there are available findings. These factors that influence the word retrieval process operate at various levels, as illustrated in Figure 1a, and can be broadly considered as language specific or non-language specific. At the lexeme level, for instance, language-specific information is involved, such as word length and complexity (Heuven & Dijkstra, Reference Heuven and Dijkstra2010).
Age of Acquisition (AoA)
Age of Acquisition (AoA) in the context of language learning refers to the age at which a concept or a lexical item is acquired (Hernandez & Li, Reference Hernandez and Li2007). Early-learned words are processed differently from late-learned words (Gilhooly & Watson, Reference Gilhooly and Watson1981), but the reasons for this difference are not yet fully understood. One hypothesis, known as the semantic locus hypothesis, proposes that early learned words have a semantic advantage over those learned later because they enter the representational network first and influence the semantic representations of subsequently learned words (Brysbaert et al., Reference Brysbaert, Van Wijnendaele and De Deyne2000). Specifically, lexical nodes that are acquired earlier have more connections within the lexical system than later established nodes, establishing a basic semantic structure that allows later word learning to accelerate (Steyvers & Tenenbaum, Reference Steyvers and Tenenbaum2005). As a result, early-learned words lead to a more frequent activation, better accuracy and faster response latencies in picture-naming tasks (e.g., Barry et al., Reference Barry, Morrison and Ellis1997; Carol & White, Reference Carol and White1973; Cuetos et al., Reference Cuetos, Ellis and Alvarez1999; Meschyan & Hernandez, Reference Meschyan and Hernandez2002; Snodgrass & Yuditsky, Reference Snodgrass and Yuditsky1996). Accordingly, we consider AoA to be a significant predictor of naming tasks.
Frequency
In language learning, words that have been known for a longer time have been encountered more frequently, and higher word frequency is associated with greater exposure. A hypothesis known as the cumulative frequency hypothesis suggests that the total number of times a word has been encountered can account for both AoA and frequency effects (Lewis et al., Reference Lewis, Gerhand and Ellis2001). However, there are also suggestions to treat the frequency effect as an independent factor that influences lexical access (Barry et al., Reference Barry, Morrison and Ellis1997; Carol & White, Reference Carol and White1973; Snodgrass & Yuditsky, Reference Snodgrass and Yuditsky1996). This robust effect, first described by Oldfield and Wingfield (Reference Oldfield and Wingfield1965), refers to the fact that the production of less frequent words is substantially slower than the production of frequent words. Jescheniak and Levelt (Reference Jescheniak and Levelt1994) described a modular two-stage model with frequency-coded lexemes’ activation. According to this model, an initial phase with the activation and the selection of lemmas and their semantic-syntactic information is followed in a second stage by word form retrieval. Unlike lemmas, lexemes appear to be affected by word frequency. Familiarity with the words seem to make the access of the word forms easier. The activation threshold for word forms is low for words with a high frequency and high for words with a low frequency (Jescheniak & Levelt, Reference Jescheniak and Levelt1994).
Lexical category
Another aspect of language production processes is the lexical category. For example, nouns are acquired before verbs in most languages because of their referential bindings (Gentner, Reference Gentner1982). Children form stable object concepts during their first year of life, and the reference to objects allows a transparent semantic mapping to the perceptual world (Spelke, Reference Spelke1990). Empirical studies showed a processing advantage of concrete words in contrast to abstract words (De Groot et al., Reference De Groot, Dannenburg and Van Hell1994; Schwanenflugel et al., Reference Schwanenflugel, Harnishfeger and Stowe1988) because of their perceptual referents (Bolognesi & Steen, Reference Bolognesi and Steen2018). First acquired nouns are concrete concepts and thus children may retrieve them easier than verbs (Chiarello et al., Reference Chiarello, Shears and Lund1999). The conceptual components of verbs seem to be more difficult to detect, as well as to combine and to organize them. Thus, children learn to access them later compared to nouns (Gentner, Reference Gentner, Hirsh-Pasek and Golinkoff2006). Accordingly, toddlers’ vocabulary consists mainly of nouns and few verbs (Gentner, Reference Gentner1982; Nelson, Reference Nelson1973). This propensity to learn nouns compared to verbs is supported by findings that show a strong effect of lexical category. Haman et al. (Reference Haman, Łuniewska, Hansen, Simonsen, Chiat, Bjekić, Blažienė, Chyl, Dabašinskienė, Engel de Abreu, Gagarina, Gavarró, Håkansson, Harel, Holm, Kapalková, Kunnari, Levorato, Lindgren and Armon-Lotem2017) analyzed data from monolingual preschool children across 17 languages in lexical tasks and found accuracy for nouns to be higher than for verbs.
Complexity of words
At some point in the language acquisition process, children start to use compounds to differentiate their semantic knowledge, to build finer-grained categories, and to thus increase their vocabulary (Clark et al., Reference Clark, Gelman and Lane1985). Since compounds in the German language are common, they are acquired very early, from an age of about 2 years (Clark, Reference Clark, Spencer and Zwicky1998). The production of compound words, lexemes that consist of more than one stem, requires linguistic and conceptual knowledge about the constituents (Gagné & Spalding, Reference Gagné and Spalding2009). To date, no general consensus exists about how compounds are represented in the mental lexicon (Li et al., Reference Li, Jiang and Gor2017). Depending on how they are represented, however, it should affect the retrieval. Stored as a whole word should result in frequency effects (Janssen et al., Reference Janssen, Bi and Caramazza2008) and the retrieval should not differ from the simple words. However, representation of the constituents has found strong support (Marelli et al., Reference Marelli, Zonca, Contardi and Luzzatti2014) and understanding this representation requires knowing about the processing and how the constituents are organized and combined. Compounds are an important component of vocabulary, and they contribute to understanding its organization and development.
Word length
The effects mentioned above refer primarily to the lexical access, whereas word length belongs more to the phonological encoding (Jescheniak & Levelt, Reference Jescheniak and Levelt1994). Speakers retrieve lexeme information from a syllable based articulatory program. The phonological encoder first selects segments and then combines them into the word form (Levelt et al., Reference Levelt, Roelofs and Meyer1999). Accordingly, effects of word length should be found not only on response latency but also on accuracy in speech production (Vance et al., Reference Vance, Stackhouse and Wells2005). Speech onset latencies for long words are longer than for short words and this happens because speakers first must phonologically encode the entire word and activate all articulatory programs before speaking (Meyer et al., Reference Meyer, Roelofs and Levelt2003). All these findings on lexical access and phonological encoding are considered independent of the language, yet important differences across languages must still be addressed when constructing testing material and should thus also be included in the analysis.
The bilingual speech production
Strong evidence has shown that bilingual children comparably organize semantic content across their dual lexicons (Holowka et al., Reference Holowka, Brosseau-Lapré and Petitto2002). Based on a hierarchically related representation of the children's memory proposed by Kroll and Stewart (Reference Kroll and Stewart1994), Potter (Reference Potter1979) and Snodgrass (Reference Snodgrass1984), we assume bilingual children to have a common semantic concept for both languages (Figure 1b). However, concrete material has a processing advantage and the processing might depend on lexical category, which means that concrete translation pairs tend to share a conceptual representation more often than abstract translation pairs (Van Hell & De Groot, Reference Van Hell and De Groot1998a). Possible reasons for this difference might be that concrete words have an imaginal referent (e.g., De Groot, Reference De Groot1989; Plaut & Shallice, Reference Plaut and Shallice1993; Van Hell & De Groot, Reference Van Hell and De Groot1998a), whereas abstract words depend more on a linguistic context and are thus more language-specific (Van Hell & De Groot, Reference Van Hell and De Groot1998b). At the lexical level, researchers widely agree that units correspond to words and mediation occurs between the conceptual-semantic level and the level of individual phonemes (Rapp & Goldrick, Reference Rapp and Goldrick2000). In contrast to monolinguals, bilinguals are assumed to have a separate lexical store for each of the two languages and these lexicons contain language-specific information (Dijkstra & Van Heuven, Reference Dijkstra and Van Heuven2002). The content of these stores depends largely on the input the children receive, and the input differences result in different acquisition rates for each of the languages (Rinker et al., Reference Rinker, Budde-Spengler and Sachse2017). Accordingly, we assume AoA to be affected by the input the children receive, with strong influences on the lexicon from the environment – for example, from grandparents, friends, or caregivers. This assumption is supported by findings about a strong Turkish dominance in children raised in Germany who acquire both Turkish and German (Rinker et al., Reference Rinker, Budde-Spengler and Sachse2017). The authors examined very young children at the age of 2 and 3 years and showed a dominance of lexical items in Turkish. They attributed this dominance to a more frequent exposure and a higher quality of the input (Rinker et al., Reference Rinker, Budde-Spengler and Sachse2017). Cultural aspects are assumed to play an important role on the variation of lexical content as well (Caselli et al., Reference Caselli, Casadio and Bates1999) – for example, specific foods and drinks that are less common in the respective languages (Rinker et al., Reference Rinker, Budde-Spengler and Sachse2017).
Assessing vocabulary in bilingual children
Studies comparing the vocabulary of bilingual children across languages are relatively scarce (Bornstein et al., Reference Bornstein, Cote, Maital, Painter, Park, Pascual, Pêcheux, Ruel, Venuti and Vyt2004). Moreover, only a small number of studies have investigated the development of vocabulary in bilingual children from the same perspective (Cote & Bornstein, Reference Cote and Bornstein2014). Often, assessments are merely translated without being adapted, which leads to inadequacies such as linguistically uncontrolled test items and culturally inappropriate targets, pictures or procedures (Schaefer et al., Reference Schaefer, Ehlert, Kemp, Hoesl, Schrader, Warnecke and Herrmann2019). Given the lack of appropriate testing material, practitioners and researchers often need to rely on instruments that were developed for a specific population – usually monolinguals or unselected population representative samples – to assess lexical abilities in children with bilingual background. In some cases, rating scales completed by educators are used to assess vocabulary. One of these instruments is the MacArthur-Bates Communicative Development Inventories (MB-CDI; Fenson et al., Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007) that examine productive vocabulary of young children by using parental vocabulary checklists. The original English version has been adapted for several other languages. According to the authors, the instrument provides a reliable and valid assessment of productive vocabulary in theses languages (Fenson et al., Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007). For the Turkish language, Acarlar et al. (Reference Acarlar, Aksu-Koç, Küntay, Maviş, Sofu, Topbaş, Turan, Ay, Aydın, Ergenç, Gökmen, İşsever S and Peçenel2009) adapted and standardized the MB-CDI on monolingual Turkish children (Türkçe İletişim Gelişimi Envanterleri; TIGE) and for a subgroup of bilingual Turkish (—German) children raised in Germany, the Türkce Ifade ve Lisan Dizelges Araştırması (TILDA; Sachse et al., Reference Sachse, Budde-Spengler and Rinker2016) was developed. Parental checklists and reports are an economic assessment tool that can certainly contribute additional information about children's vocabulary, but they suffer from drawbacks of subjective rating scales – for example, positively biased parental ratings. Consequently, an objective view on the status of the child's language development should be substantiated at least by one reliable and valid performance assessment. Our aim was to show by way of example that the construct validity of a vocabulary test must also be checked for non-native speakers to be able to use this test for multilingual children. Despite the large number of bilingual Turkish—German children, hardly any assessment tools are available to date.
Rationale of the present study
In the current study, we demonstrate how a language assessment inventory can be constructed in a balanced way to adequately capture productive vocabulary in mono- and bilingual children. We ensured the comparability of tests across multiple languages to facilitate cross-linguistic comparisons. To achieve this, we implemented a parallel construction of linguistic twin elements, characterized by consistent semantic and syntactic features. Utilizing a standardized language corpus (Corpora Collection Leipzig, 2011), we verified that both the original items and their twin counterparts exhibited comparable frequencies within their respective languages. To this end, we describe the construction process and test the construct validity of German word items by jointly modelling item parameters and person features by means of explanatory item-response models. Specifically, we examined whether observable item difficulties can be predicted by item features and the language background of 3 to 6 year-old mono- and bilingual German- and Turkish-speaking children in a German picture-naming task.
We expected the theoretically based item difficulties to be reflected in the children's performance in the productive vocabulary tests and relied primarily on the item characteristics known to affect vocabulary production. Since the AoA is an important predictor of productive vocabulary skills, we expected an influence on the item difficulty in our picture-naming task, with early-acquired words being easier in terms of accuracy than words that were acquired later (Hypothesis 1a). Further, the described word-forms activation threshold for low versus high-frequency words (Jescheniak & Levelt, Reference Jescheniak and Levelt1994) should result in a better performance of the children for higher-frequency words. Accordingly, word frequency should determine picture-naming accuracy as well, with highly frequent words resulting in a lower difficulty and thus higher naming accuracy (Hypothesis 1b). The performance in the naming task furthermore depends on the lexical category (Haman et al., Reference Haman, Łuniewska, Hansen, Simonsen, Chiat, Bjekić, Blažienė, Chyl, Dabašinskienė, Engel de Abreu, Gagarina, Gavarró, Håkansson, Harel, Holm, Kapalková, Kunnari, Levorato, Lindgren and Armon-Lotem2017). We expected the production of nouns to be easier in terms of accuracy than the production of verbs (Hypothesis 1c). Finally, we expected word length to affect item difficulty such that longer words should be associated with lower picture-naming accuracy (Hypothesis 1d). Finally, compounds should be associated with lower picture-naming accuracy than simple word forms (Hypothesis 1e). If these item features significantly predict the empirical item difficulties, this finding would not only confirm findings on language development, but it would also be an indicator of the test's construct validity (Hartig & Frey, Reference Hartig and Frey2012).
We analyzed the effect of the person characteristics of age and language background. As children improve their vocabulary with age, this effect should be reflected in the data as well (Hypothesis 2). Although environmental factors play an important role in acquiring a second language, the language development process for very young children that are raised with two languages is also subject to biologically controlled mechanisms (Petitto et al., Reference Petitto, Katerelos, Levy, Gauna, Tétreault and Ferraro2001). Accordingly, we expected an identical pattern in the acquisition of vocabulary regardless of language background, reflected by the same factors exhibiting significance in both populations as outlined in Hypotheses 1a to 1d. To be more precise, the test should show construct validity when being modelled for both mono- and bilingual children. However, bilingual children might start accessing the society language later and use it less often than monolingual children. Thus, we expected a bilingual disadvantage because of the lack of content in the lexicon of these children. Consequently, bilinguals should perform worse compared to the monolinguals, and language background should negatively affect the performance in the naming task (Hypothesis 3).
Method
Participants
Participants were 126 preschool children (62 girls) from Switzerland and Southern Germany. The 75 German monolingual and 51 Turkish—German bilingual children aged between 33 and 78 months (M = 51.35, SD = 7.88) were recruited through daycare centers and through contacts in the Turkish speaking community. The highest educational attainment of the parents was coded to determine their socioeconomic status, using the International Standard Classification of Education (ISCED; UNESCO Institute for Statistics, 2012) as a reference. Educational level obtained in Germany or Turkey was indicated on a 7-step scale (level 1 = primary education, level 7 = university/tertiary education). The mothers’ education ranged from level 1 to level 6 (M = 4.28, SD = 1.99). Similarly, the fathers’ education ranged from level 1 to level 7 (M = 4.89, SD = 2.36). Characteristics of the study sample are provided in Table 1. Sociodemographic data, such as age and sex of the children and information about the parents’ graduation and academic achievement, were collected via a parental questionnaire. Only children with written parental consent participated in the study. Responses of eight children were excluded from analyses because of technical problems with the audio recordings or children terminating the test session early, amounting to a loss of 6.3% of the children's data (Table 1).
Note. The table provides information about the study sample. Standard deviations are given in brackets. SLL = Single language learners (monolinguals); DLL = Dual language learners (bilinguals).
Procedure
The productive vocabulary test was part of a cross-national study assessing language, metacognition, and socio-emotional competences in 3–6 year-old mono- and bilingual children. The tests were completed by the children in their homes, childcare facilities, or in an examination room of the university. Children were tested on two separate days with individual assessment sessions of 40–60 min per day by research assistants. All the bilingual children were tested by a native speaker in the dominant language. The testing was embedded in a story of crocodile ‘Sammy’ who wanted to find a treasure and asked the children to help her. One part of the testing, including the productive vocabulary task, was computerized and conducted on a 14” Windows 10 convertible. The other part of the test consisted of offline tasks that also were accompanied by Sammy as a hand puppet.
The instruction in the productive vocabulary task was: “I am going to show you a picture and you tell me what you see in the picture. Are you ready?” The children first received two practice items. If the child had trouble understanding the instructions, they were repeated. During the test application, children were asked to specify their answer in case of underspecified responses, overspecification, part-whole problems (e.g., “bird” instead of “wings”) or noun-verb problems (e.g., “Lion” instead of “to roar”) by a standardized prompt such as “Do you know another word for it?” or “And what does she do?” If the child named the first or the second exercise object incorrectly, Sammy said: “Oops that was unfortunately not quite right. Try it again.” The task was still performed, even if the child had solved both exercise items incorrectly.
Instruments
Productive vocabulary was evaluated using a computer-based task that comprised 32 items, each varying in difficulty level based on factors such as awareness, frequency, and complexity. The task aimed to assess vocabulary knowledge across domains including activities (e.g., drink, draw), animals (e.g., cow, badger), everyday objects (e.g., sponge, dice), and other terms relevant to the children's immediate or broader life experiences. Children were presented with pictures designed specifically for this study, which they had to name (Figure 2). The scale featured an adaptive testing approach with a routing set of 16 items across different difficulty levels. The categorization into routing, basic, and advanced sets was based on information on the words’ AoA, length, class, and frequency. All children were required to complete the routing set. Depending on the number of items solved correctly in the routing set, the children proceeded with a set of advanced items (8 items), or they were assigned to the basic item set (8 items). In total, each child completed 24 items. The scale displayed a good expected posterior reliability/plausible value (EAP/PV) in the one-parameter logistic model (1PL) item response theory (IRT; De Boeck & Wilson, Reference De Boeck and Wilson2004; Hartig et al., Reference Hartig, Frey, Nold and Klieme2012; Schindler et al., Reference Schindler, Richter, Isberner, Naumann and Neeb2018) scaling of r = .84. An overview of the items and detailed information regarding item features can be found in the online supplementary materials.
Items were selected based on the effects of AoA, frequency, lexical category, word length, and complexity of words. The items should reliably represent as wide a range as possible within this selected item features. The item characteristics were operationalized as follows. For the operationalization of the frequency, we applied the normed lemma frequencies from the Childlex Corpus (Schroeder et al., Reference Schroeder, Würzner, Heister, Geyken and Kliegl2015. Missing values were implemented from the Core Corpus for the German Language of the 20th Century (DWDS), a large database for German word frequency standards. The frequency in one million words in the corpus was used as the basis for statistical evaluations (Geyken et al., Reference Geyken, Habermas, Klemperer, Kraus and Lenz2006). For the AoA, we used estimations from Birchenough et al. (Reference Birchenough, Davies and Connelly2017). German-speaking people over 18 years evaluated these AoA standards on a 7-point-Likert scale for 3,259 German words. We completed missing values on the basis of the Kuperman et al. (Reference Kuperman, Stadthagen-Gonzalez and Brysbaert2012) AoA ratings, which are ratings for 30,000 English content words from the SUBTLEX corpus. For the length of words, we used the number of phonemes and adjusted these values for phonological processes – that is, counting the sounds that were actually spoken. In the following section, we report the procedure and results of the explanatory item-response models for response accuracy as dependent variable for mono- and bilingual children separately.
Data analysis strategy
The objective of our study was to develop a vocabulary test that incorporates established factors from language acquisition research effectively, aiming to create equitable and comparable versions for both L1 and L2. Our focus will be on investigating the impact of these factors and examining their interaction with the linguistic background of the children in the societal language. We will accomplish this by comparing the structure and item characteristics of the naming task between monolingual German children and bilingual Turkish—German children. We investigated whether the item characteristics can explain a substantial proportion of the variance in item difficulties with a three-step procedure (see Hartig & Frey, Reference Hartig and Frey2012; Hartig et al., Reference Hartig, Frey, Nold and Klieme2012; Schindler et al., Reference Schindler, Richter, Isberner, Naumann and Neeb2018). In Step 1, empirical item difficulties were estimated by using a 1PL IRT model for the accuracy data, implemented as a Generalized Linear Mixed Model (GLMM; Dixon, Reference Dixon2008). The items were dummy-coded with one item that was not represented by a dummy-coded fixed effect, as the reference item. The items were included as predictors (fixed effects) for logit-transformed response accuracy as dependent variables. The intercept was allowed to vary randomly between participants. The 1PL IRT model estimated in Step 1 for accuracy was as follows:
In Step 2, response accuracy was modeled in a Generalized Linear Mixed Model (GLMM; Dixon, Reference Dixon2008) as a function of the item features (lexical category, AoA, frequency, word length, complexity). Equation 2 shows the estimation of the theoretically derived item features predictors in a maximal model (Barr et al., Reference Barr, Levy, Scheepers and Tily2013).
In Step 3, we assessed model fit by correlating predicted and empirical item difficulties. A significant and substantial R2 would indicate that empirical item difficulties can be predicted from our selected item features. The reflection of individual differences in response accuracy can be considered as evidence of construct validity.
For analyzing the effect of age (Hypothesis 2), we included centered age in months as a predictor into the model. To test whether the vocabulary item features have the same effect on item difficulty in both populations, in mono- and bilingual children, we used the three-step-procedure of explanatory-item-response-modeling for response accuracy as dependent variable for the total sample (Model 1) and for the two subgroups – monolinguals (Model 2) and bilinguals (Model 3). We included AoA, lexical category, word length, complexity, and frequency as fixed effects. Intercept and slopes for item features were allowed to vary randomly between participants (random intercept and slopes). If development had occurred in parallel with both subgroups, then both models should show significant effects. Assuming that bilingual children should perform more poorly compared to the monolingual subgroup, we relied on interaction analysis to explore whether a significant interaction existed between the individual item features and whether the child spoke one or two languages (Hypothesis 3). As item features, we included AoA, lexical category, word length, complexity, and frequency into the model.
All models were estimated with the software packages lme4 (Bates et al., Reference Bates, Maechler, Bolker and Walker2015) and lmerTest (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017) for the statistical software R (R Core Team, 2019; Version 3.6.1). Parameters were estimated with Maximum Likelihood (ML). All significance tests were based on a Type I error probability of .05.
Results
The estimation of the empirical item difficulties by using a 1PL IRT model for the accuracy data, implemented as a Generalized Linear Mixed Model (GLMM, Dixon, Reference Dixon2008; step 1), yielded a distribution of item difficulties ranging from -1.84 (“Katze”; “cat”) to 2.08 (“Korken”; “cork”). As expected and intended, the set of routing items covered a wider span of difficulty levels, whereas the basic items set was easier on average, and the advanced set was more difficult. The parameter estimates for the fixed and random effects of the GLMM with logit-transformed response accuracy as dependent variable (Step 2) are provided in Table 2.
Note. The table displays the fixed effects and variance components in the LLTM for the response accuracy in the picture-naming task. Age of acquisition in month was centered and lexical category dummy coded (noun = 0; verb = 1). Word length specifies the number of phonemes (phonological processes- adjusted) and complexity was again dummy coded (simplex = 0; compound = 1). Frequency per millions tokens.*p < .05, **p < .01, ***p < .001.
AoA (Hypothesis 1a), frequency (Hypothesis 1b), lexical category (Hypothesis 1c) and the complexity of the vocabulary (Hypothesis 1e) as main effects reached significance. As expected, the earlier vocabulary was acquired the easier it was accessed as indicated by higher naming accuracy (β = −0.65; z = −8.36; p < .001). Children were less accurate in naming verbs compared to nouns (β = −0.79; z = −4.41; p < .001). The higher the word frequency, the more accurately children named the word (β = 0.48; z = 6.11; p < .001). In addition, compounds (Hypothesis 1e) were named less accurately than simple words (β = −0.73; z = −2.31; p = .021). Word length (Hypothesis 1d), however, seemed to not affect naming accuracy. Overall, the regression model explained 47% of the variance in the empirical data, (Step 3). To put it in different words, item characteristics were able to successfully predict empirical item difficulties, which is an indicator of the scale's construct validity.
We also hypothesized the person characteristic age to be an important predictor of picture-naming accuracy (Hypothesis 2). As expected, we found a significant age effect (β = 0.05; z = 3.92; p < .001) indicating that children's performance in the picture-naming task increased with increasing age. However, no further interaction of age and the item features was found. Thus, the effects of the item features remained constant across age with the items being generally easier for older children.
We expected similar influences of item features on item difficulties for both populations – for the monolingual German and bilingual German-Turkish children. Accordingly, we computed the GLMM with response accuracy as dependent variable for the total sample (Model 1) and for the two subgroups – monolinguals (Model 2) and bilinguals (Model 3). Table 2 shows the coefficients and variance components for response accuracy as dependent variable. AoA, lexical category, word length, complexity, and frequency were modeled as fixed effects. In both populations, we found an effect of AoA on the performance of the picture naming, which means that vocabulary acquired earlier was easier to access for both the monolingual (β = −0.61; z = −7.60; p < .001) and the bilingual children (β = −0.88; z = −5.87; p < .001). Naming nouns was easier than naming verbs for monolinguals (β = -0.72; z = −3.60; p < .001) and bilinguals (β = −0.96; z = −3.01; p = .003). Somewhat unexpected but in accordance with the results from Model 1, no significant effect was found for word length in both subsamples. Compounds were more difficult to name than simple word forms for the bilinguals (β = −1.26; z = −2.07; p = .038), whereas no significant effect was found for the monolinguals. As expected, we found a significant effect of frequency in both subsamples, indicating that frequency was positively associated with naming accuracy for the monolinguals (β = 0.37; z = 4.37; p < .001) and for the bilinguals (β = 0.71; z = 6.27; p < .001).
To test differences between the mono- and bilingual children, we additionally modeled the interaction terms between language background and the main effects. The model revealed no significant differences. We found only main effects of language background and of the item features lexical category (β = −0.79; z = −3.91; p < .001), AoA (β = −0.65; z = −8.10; p < .001), frequency (β = 0.39; z = 4.51; p < .001), and complexity (β = −0.64; z = −2.07; p = .039). Word length approached significance (β = 0.21; z = 1.77; p = 0.08) as well as frequency (β = 0.25; z = 1.84; p = .066). Overall, bilingual children performed slightly more poorly than monolingual children. These findings indicate that development between both groups is similar, albeit time-delayed.
Discussion
The objective of this study was to investigate the factors that influence the performance of productive vocabulary in order to contribute to our understanding of vocabulary development in monolingual and bilingual individuals. We developed a productive vocabulary task and used explanatory item-response models to demonstrate the validity of construct, making it suitable not only for monolingual but also for bilingual children. The items were selected based on various factors, including age of acquisition, frequency, lexical category, length, and complexity. Our findings suggest that both mono- and bilingual children's lexicon are influenced by the same factors. Furthermore, the results indicate that our measurement of vocabulary is valid for bilingual children raised in a bilingual context.
As expected for the total sample, vocabulary acquired early and words that are more frequent were easier to access as well as nouns in comparison to verbs. Complexity of words negatively affected the performance in the naming task, and compounds were more difficult to name for bilingual children than for monolingual children. Furthermore, we assumed that children improve their vocabulary with age, which was confirmed by our analyses. Although the nature of the data collection is cross sectional and thus the results simply represent an age effect, they nonetheless at least indicate parallel development in monolingual and bilingual children. Influences of the item features were expected to be similar for the subgroups, which means that effects of frequency, AoA, lexical category, complexity, and word length should be reflected in the results of both groups. Indeed, our analysis showed this similarity for frequency, AoA, lexical category, and complexity of words.
We expected the results to depend on the language background of the children, with a bilingual disadvantage in the number of items solved (e.g., Gollan et al., Reference Gollan, Montoya, Cera and Sandoval2008; Rinker et al., Reference Rinker, Budde-Spengler and Sachse2017), but further analyses between the language background and item features revealed no interactions. This finding suggests that the two language groups have no structural differences in vocabulary. The construction of this picture-naming task and its theoretically based test items with systematically varied item features allows for an interpretation in terms of individual differences in the vocabulary of mono- and bilingual children. In the remaining discussion, we discuss the results of our analysis and underline its importance for the construction of vocabulary tests assessing the language abilities of single and dual language learners.
Dual language learners are not necessarily confronted with the same words as monolingual children, or the frequency of words can interact in a complex way depending on the topic and life circumstances (e.g., Caselli et al., Reference Caselli, Casadio and Bates1999; Rinker et al., Reference Rinker, Budde-Spengler and Sachse2017). For example, dual language learners might be confronted with words that depict household items or body parts in the foreign language, but they might learn academic word material in institutional settings. The question remains whether words acquired early are generally more frequent. No general consensus exists on the relationship of these influential factors. In a sample of 161 students that performed different tasks on word processing in Dutch language, Ghyselinck et al. (Reference Ghyselinck, Lewis and Brysbaert2004) showed AoA and frequency to be highly correlated. Given that the conditions of language learning in a bilingual context obviously differ from those of the monolinguals, we should be careful in interpreting the relationship between AoA and frequency. The reference for the AoA in the second language is the age at which the bilingual child begins to learn this language and makes use of it in interaction with other speakers. This may differ considerably from the chronological age, which leads to some implications in the language development process. When bilingual children start to access a second language, they can draw on existing structures of the first language, which adds new words to their lexicon that are not necessarily high-frequency words but are important for everyday life. Nonetheless, we hypothesized that words acquired early and those with high frequency in everyday usage have a semantic advantage within the lexical system (Brysbaert et al., Reference Brysbaert, Van Wijnendaele and De Deyne2000; Steyvers & Tenenbaum, Reference Steyvers and Tenenbaum2005). Consequently, the retrieval of these words should be easier and more accurate compared to words with later AoA or low-frequency words (Hernandez & Li, Reference Hernandez and Li2007; Jescheniak & Levelt, Reference Jescheniak and Levelt1994), as demonstrated in our study. This effect is expected to be even more pronounced in terms of response latency, as these words are expected to have a positive impact on retrieval speed due to their more stable connections. However, when conducting studies with very young children around the age of 3, practical limitations arise, particularly in measuring retrieval speed, which becomes challenging.
In principle, a difference between age and the period of acquisition of the second language can also be assumed with the influence of the lexical category (e.g., Hernandez et al., Reference Hernandez, Li and MacWhinney2005; Montrul & Foote, Reference Montrul and Foote2014). As argued before, we expected an advantage in naming nouns over verbs and this aligns with our findings. We found a significant effect of lexical category for monolinguals and bilinguals, albeit stronger for the bilingual group. The development seems to be similar, but the influence of the lexical category differs slightly in the bilingual context. The acquisition of verbs probably starts at a point when the children already have some syntactic structures at their disposal and their communication intentions are more complex. Accordingly, children need to access verbs: which has implications about the order of the acquisition of verbs in relation to nouns. Investigations of Naigles and Hoff-Ginsberg (Reference Naigles and Hoff-Ginsberg1998) on the syntactic environments of verbs showed that the variety of sentence structures affected the acquisition of verbs. Verbs that appeared in many different syntactic frames were acquired earlier in comparison to verbs that children heard in fewer forms of sentence structures. Assuming that bilingual children develop in a more complex language environment, the acquisition of verbs should occur earlier than for monolingual children, which could explain our results. All the bilingual children in our study were raised with Turkish as their home language. When comparing Turkish and German, clear structural differences can be observed, posing an additional challenge for bilingual children. Turkish, being an agglutinating language, has fewer but longer words compared to German (Daller et al., Reference Daller, Yıldız, de Jong, Kan and Başbaĝi2011). The structural disparities between these two languages specifically draw the bilingual child's attention to the word material during the language acquisition process, resulting in a positive impact on metalinguistic knowledge. Bilingual children can make use of this metalinguistic knowledge, which might facilitate the acquisition of the language, including the acquisition of verbs (Bialystok, Reference Bialystok2001).
Interestingly, we found an effect of word complexity only for the bilingual group. These children seemed to have more problems with accessing compounds compared to the monolingual group. We suppose that monolingual children in our study were in an age range when the influences of this feature are less important. Data from Birchenough et al. (Reference Birchenough, Davies and Connelly2017) and Kuperman et al. (Reference Kuperman, Stadthagen-Gonzalez and Brysbaert2012) show for this group that the AoA for the compound spiderweb (“Spinnennetz”) was 4.90, and bathtub (“Badewanne”) was 3.84. The question of how compounds are stored in this context also seems to be important. Compounds are multimorphemic words that are composed of at least two free morphemes. Assuming that first compounds are stored as a whole word representation, children should have little difficulty formulating these complex words. Access to the whole word meaning eliminates the need to combine the meaning of the word's constituents and activate structural knowledge, which allows for more economic processing, especially for frequent word material. To our knowledge, research on the access of compounds in the second language (L2) is scarce. In principle, we expect some differences in the formation and structure of compounds in contrast to the first Language (L1), which should be an additional hurdle for the bilingual child. Given the assumption of a whole-word representation for well-known and frequent compounds, the worse performance of the bilingual group might also substantiate the time-delayed development course for the bilingual child. Thus, the effect for bilinguals (and not monolinguals) in our study might indicate a lag in their linguistic development.
Notably, our analyses showed no significant effects of word length in addition to the other vocabulary features for monolingual and bilingual children. The length of the items related to the phonemes, which were adjusted to the phonological processes. A consistent strong effect of word length on the encoding process can be found in the literature (Vance et al., Reference Vance, Stackhouse and Wells2005). Our null finding of this effect may be due to the age range of our sample. This effect is particularly significant at the very beginning of language acquisition and weakens as language acquisition progresses. The children we tested were between 3 and 6 years-old. At this age, we assume that these effects are less important for the access of vocabulary. Moreover, we used relatively few compounds compared to simple words, which might explain the lack of effect. Nevertheless, age was an important factor that influenced the performance in the naming task. As expected, we found that children performed better with age, which means that the test items are sensitive to developmental differences.
The findings of the present study indicate that the structure of language acquisition develops in parallel for L1 and L2 learners. In addition to a general delay in the development of L2, further important factors influence the acquisition of vocabulary. Considering the benefits of first language competencies in the language learning process, a delayed but parallel acquisition of the second language should not necessarily be disadvantageous for the bilingual child. The lack of significant interaction effects in our findings strongly indicates structural validity and that the development process is comparable. Since the children can draw on existing concepts, it makes sense, for example, when learning or supporting L2, to offer appropriate vocabulary material to augment the lexicon. The acquisition of L1 cannot be transferred absolutely to the acquisition of L2. Thus, it is not a question of imitating the acquisition of L1 but rather of supplementing it accordingly in the acquisition of L2.
Limitations
A cross-sectional research design is limited in providing information about the development process in the vocabulary of bilingual children that a longitudinal study could provide. In addition to methodological challenges associated with comparing test results across languages with different structures, there is a broader issue concerning the sample selection. The use of a between-subject design to compare the performance of bilingual individuals with demographically matched monolingual counterparts is a common approach in bilingualism research. While such designs can reveal fundamental performance differences between groups, they often lack sensitivity to variations within each group. Finding a homogeneous group of participants who share all demographic variables except for language experience is particularly challenging, as language experience itself correlates with other factors (Luk & Bialystok, Reference Luk and Bialystok2013). Further data collection is required not only for accuracy but also for latency analyses to allow for conclusions on the automation and stability in the word retrieval process. Our results might have also been influenced by the circumstances stemming from the Corona pandemic. During the testing period, the living conditions of many of the participating children had become more difficult. For example, they were unable to attend childhood facilities for a long period, or the conditions of the survey changed because of governmental regulations. These external variables could have adversely affected the comparability of the data collected, although we assume the effect was the same for mono- and bilingual children. Lastly, given the collection of many parameters, the test session was very long for the children. Consequently, the number of items for the productive vocabulary was also small and therefore the data should be interpreted with caution.
Conclusion
The present study corroborates the body of research on the main determinants of vocabulary acquisition. The findings further underscore that these factors seem to influence vocabulary acquisition comparably for mono- and bilingual children. The study also demonstrated that valid assessment tools can be developed to identify individual differences in both groups of children. These findings have direct practical relevance, given the increasing number of bilingual or multilingual children and the need to examine these children with appropriate tests – for example, for school aptitude diagnostics. An important point is that these tests should not only be validated for monolingual children but also that they can map developmental differences in bilingual children, as was shown with our analysis. The results suggest that the structure of vocabulary acquisition in L2 is similar to the acquisition in L1, and children can draw on previous knowledge from L1. Our findings are directly and practically relevant because they contribute to understanding and providing support for bilingual children in their language learning process.
Data availability
Raw data were generated across the universities of Würzburg (Germany), Basel (Switzerland), Bern (Switzerland), and Neuchâtel (Switzerland). Derived data supporting the findings of this study are available from the corresponding author Madlen Mangold upon reasonable request.
Acknowledgements
The study reported in the present paper was funded by the Schweizerischer Nationalfonds (SNF). We have no conflicts of interest to disclose. The authors would like to thank all members of the CROCODILE Project (https://www.crocodile-study.ch) who made this work possible, and we are grateful to the families who participated in this project.