Introduction
The acquisition of spoken language occurs relatively rapidly: within two years from birth typically developing children are able to understand basic adult language, have a substantial vocabulary and can form simple sentences. However, despite the fact that young children may easily make themselves understood by adults, their speech production is not necessarily adult-like. Rather, it takes some time to achieve adult-like production, and this seems to be the case for certain linguistic aspects more than for others. The present study is focused on young Italian speaking children’s ability to produce the appropriate duration of vowels preceding/following a consonant, either singleton or geminate, when the vowel is in a stressed or unstressed syllable. While the vast majority of studies of children’s speech production have focused on segmental phonology and on English, our study makes an important contribution in focusing on suprasegmental phonology and a language other than English. Our investigation of Italian allows us to examine certain aspects of speech that are not found in English, thereby advancing our knowledge of the complexities of human speech production more generally.
In Italian, geminate consonants (fato/fatto, ‘fate/fact’; note/notte, ‘notes/night’; biscotto ‘cookie’; mamma ‘mummy’; nanna ‘sleep’) are frequent and commonly found in children’s everyday language. Phonetically, a geminate consonant is realized as a long consonant. The young child’s speech production is generally slow, therefore long consonants are fairly frequent in the first words, even in languages in which they have no phonologically contrastive role (Payne, Post, Astruc, Prieto & Vanrell, Reference Payne, Post, Astruc, Prieto and Vanrell2012). However, in order for the child to correctly master the geminate/singleton contrast, acquisition of the phonological distinction between singletons and geminates is required.
This partly depends on characteristics of the context language. Vihman and Velleman (Reference Vihman and Velleman2000) measured the length of medial consonants in the early vocalizations of three groups of children, speaking English, Finnish or French. English and French do not use consonant length contrastively. However, geminates are very frequent in Finnish. Vihman and Velleman (Reference Vihman and Velleman2000) were interested, among other things, in finding out whether the length of the medial consonant of the words/nonwords the children produced changed during development, depending on the ambient language. They found that in languages in which singleton/geminate consonants are not phonologically contrastive, like English, the mean duration of the medial consonant produced by young children decreased rapidly over development, and there was also much less variability in the duration values. In Finnish, where geminate consonants are very frequent and contrastive, there was larger variability in the duration of the medial consonants, and the mean duration increased over development. Moreover, the proportion of geminates in the Finnish children’s productions was superior to the proportion in the input language (Vihman & Velleman, Reference Vihman and Velleman2000). A similar result was found by Vihman and Majorano (Reference Vihman and Majorano2017) in Italian infants. These differences in developmental patterns show how children, adapting to their own language, gradually master the phonological singleton/geminate distinction, when required by the characteristics of their native language.
In many languages, children face the task of mastering the variability in duration associated with the geminate/singleton consonant as well as duration variability associated with other contextual factors: vowel duration changes depending on whether they precede/follow, a geminate or a singleton (Esposito & Di Benedetto, Reference Esposito and Di Benedetto1999; Hassan, Reference Hassan2033; Homme, Reference Homme1981). In Italian the phonological geminate/singleton contrast is characterized not only by the consonant length, but also by the length of the preceding vowel, which is greater before singletons than before geminates. An additional contextual factor is lexical stress. In Italian, vowels in syllables exhibiting primary stress have a longer duration than those which do not carry stress.
Geminates, vowels, and stress
In Italian the geminate consonant forms both the coda of the preceding syllable, and the onset of the following consonant. For example, in the word bottone (/botˈtone/, ‘button’) the first syllable is bot-, including the first part of the geminate as a coda, and the second part of the geminate is the onset of the second syllable (–to). Linguistically this has been interpreted as reflecting the phonological representation of geminates “as composed of two identical segments belonging to different syllables” (Loporcaro, Reference Loporcaro, Hurch and Rhodes1996, p. 149). This view has been supported by several authors (Bertinetto, Reference Bertinetto1981, Reference Bertinetto1985; Loporcaro, Reference Loporcaro, Hurch and Rhodes1996; Mioni, Reference Mioni1973; Vogel, Reference Vogel1982). Alternative views have also been proposed, whereby, for example, the geminate is formed by the same unit which is part of two different syllables, with the consonant “initiated in the first syllable and released in the second syllable” (Borowsky, Itô & Mester, Reference Borowsky, Itô, Mester, Jones and Sells1984). According to another view, geminates are phonologically single segments entirely syllabified at the onset of the following syllable (Luschützky, Reference Luschützky1984). While syllabification issues concerning geminates are of great interest, they are not the focus of the current study.
How are the acoustic characteristics of geminates and of their surrounding phonetic context realized during adult production in Italian? In many languages, like Finnish, Arabic, Icelandic and Italian, geminates tend to occur in middle position within a word (Maddieson, Reference Maddieson1985; see also Khattab & Al-Tamimi, Reference Khattab and Al-Tamimi2008 for Lebanese Arabic), where they are preceded and followed by a vowel. Esposito and Di Benedetto (Reference Esposito and Di Benedetto1999) investigated the perceptual and production characteristics of geminates compared to singleton consonants in adult speakers of Italian. They measured different acoustic parameters relative to the stop consonants (/b/, /d/, /g/, /p/, /t/, /k/) and vowels of disyllabic nonwords in sequences like V1C1V2 and V1CC1V2 where C represents a stop consonant and CC represents the corresponding geminate. The main findings pertained to the duration of the consonant and of the vowel preceding it (V1), while there was no apparent effect of gemination on V2. They confirmed that the geminate consonant is produced with a longer duration, compared to a single consonant, and this further supports data from other studies with other types of consonants (fricative, liquid and nasals; Argiolas, Macrì & Di Benedetto, Reference Argiolas, Macrì and Di Benedetto1995; Mattei & Di Benedetto, Reference Mattei and Di Benedetto2000). In particular, the duration of the vowel that precedes a geminate (V1) was found to be on average 25% shorter compared to a V1 preceding a single consonant.
In general, the duration of Italian vowels depends on several factors, among which lexical stress, syllable position, and type of vowel (open or closed) are included. Maximal duration is only in penultimate position, and in particular, vowels are long in penultimate open stressed syllables (e.g., /bottˈone/ –‘button’; /bambino/ – ‘child’; D’Imperio & Rosenthal, Reference D’Imperio and Rosenthal1999; Krämer, Reference Krämer2009). Vowel duration decreases in stressed pre-penultimate open syllables, and it is short everywhere else. In particular, stressed vowels’ duration tends to be reduced by the presence of a consonant coda, compared to stressed vowels in open syllables. Overall, however, as found by Esposito and Di Benedetto (Reference Esposito and Di Benedetto1999), the stressed vowel’s duration is shortest before intervocalic geminate consonants (Farnetani & Kori, Reference Farnetani and Kori1986; Fava & Magno Caldognetto, Reference Fava and Magno Caldognetto1976; Loporcaro, Reference Loporcaro, Hurch and Rhodes1996).
These characteristics of the acoustic features of vowels and consonants that have been revealed in studies of adult speakers are not necessarily present in young children. As we noted earlier, several studies have indicated that, in general, children’s speech (be it segments, words or phrases) tends to be slower (i.e., have a longer duration) and contains more variability than adults’ (DiSimoni, Reference DiSimoni1974; Kent & Forner, Reference Kent and Forner1980; Payne et al., Reference Payne, Post, Astruc, Prieto and Vanrell2012; Smith, Reference Smith1978; but see Smith, Reference Smith1992 and Smith, Kenney & Hussain, Reference Smith, Kenney and Hussain1996). The speech production differences between adults and children are generally assumed to be based on differences in speech motor control abilities, sensorimotor development and other contextual factors, like familiarity (e.g., Arciuli & Colombo, Reference Arciuli and Colombo2016; Arciuli & Ballard, Reference Arciuli and Ballard2017; Ballard, Djaja, Arciuli, James & van Doorn, Reference Ballard, Djaja, Arciuli, James and van Doorn2012; Schwartz, Reference Schwartz1995). Considering the geminate/singleton consonant contrast, some of the existing cross-linguistic studies suggest that children may not have completely mastered this contrast in an adult-like way at the end of the one-word period (Khattab & Al-Tamimi, Reference Khattab and Al-Tamimi2015; Kunnari, Nakai & Vihman, Reference Kunnari, Nakai and Vihman2001), even when allowing for variability due to the frequency of exposition in the input and the amount of distinctiveness present in the adults’ language. For example, Finnish children acquire the length distinction earlier than Japanese, because geminates are about twice as frequent in Finnish as in Japanese, and because in the latter there is greater variability and more flexibility in the length distinctions produced by speakers (Kunnari et al., Reference Kunnari, Nakai and Vihman2001). Interestingly, the acoustic measurements of the duration of geminates and singletons in adults and children speaking Jordanian Arabic words showed no age differences, even in children as young as two years: ratios between the geminates’ and the singletons’ durations did not differ significantly at the different ages (Mashaqba, Huneety, Al-Khawaldeh & Thnaibat, Reference Mashaqba, Huneety, Al-khawaldeh and Thnaibat2021). Also, neither adults nor children showed any sign of “temporal compensation” (i.e, shorter duration of the pre-geminate vowel; but see, for example Aldubai, Reference Aldubai2015, for contrasting results in a dialect of Yemeni Arabic). Very few studies have investigated the characteristics of Italian children’s productions of geminates and of the vowels preceding and following them, at least to our knowledge.
As we have discussed, the literature on the acoustic characteristics of geminates and their vowel context shows that Italian geminate consonants are realized by adult speakers with a longer duration compared to single consonants. As mentioned, the vowel preceding a geminate tends to be on average 25% shorter than the vowel preceding a singleton (Esposito & Di Benedetto, Reference Esposito and Di Benedetto1999). According to Krull, Traunmüller and Bertinetto (Reference Krull, Traunmüller and Bertinetto2006), the shorter duration of V1 is an important perceptual cue related to the singleton vs geminate distinction in Italian listeners, in addition or as an alternative to the duration of the consonant, which may vary depending on the rhythm of the speech context (i.e., local speech rate; Pickett, Blumstein & Burton, Reference Pickett, Blumstein and Burton1999). Bertinetto and Vivalda (Reference Bertinetto and Vivalda1978, cited in Krull et al., Reference Krull, Traunmüller and Bertinetto2006, pg. 81) claim that the reduction of V1 is a useful cue in particular if it is a stressed vowel, while V2 is not affected by the perceptual distinction between singleton and geminate (see also Esposito & Di Benedetto, Reference Esposito and Di Benedetto1999). Note that the materials used by Esposito and Di Benedetto (Reference Esposito and Di Benedetto1999) were disyllabic nonwords, in which the first syllable always carried stress, as in the majority of Italian disyllabic words (not in the case of trisyllabic and longer words).
The role of lexical stress may be important because the duration of the stressed vowel in Italian is longer, compared to the unstressed vowel (as mentioned above), while the presence of a following geminate tends to decrease the duration of V1. If V1 carries stress, its duration is increased to make it perceptible as the stressed vowel, but if it precedes a geminate its duration must also be shortened. Thus, there appear to be two opposing effects on the duration of V1 when it is a stressed vowel: a lengthening of its duration due to lexical stress, and a simultaneous reduction due to the following geminate. This marks a strong contrast with V1 preceding a singleton, because its duration is simply increased by being in a stressed syllable. We hypothesized that if the production of geminates is not completely independent of lexical stress, production of the vowels preceding and/or following the geminate should be different depending on the location of unstressed and stressed syllables close to the geminate consonants.
Next, we will consider the characteristics of lexical stress in Italian, and the development of the ability to differentiate the acoustic characteristics of stressed and unstressed syllables.
Acoustic investigations of the development of stress production
Lexical stress is based on the distinction between strong (also sometimes referred to as ‘tonic’ or ‘stressed’) and weak syllables within a word (also sometimes referred to as ‘unstressed’). Acoustically, stress can be realized as an increase in duration, fundamental frequency (pitch) and intensity of the vowel in the syllable carrying stress. Which of these acoustic characteristics are more important varies from language to language (Astruc & Prieto, Reference Astruc, Prieto, Hoffmann and Mixdorff2006; see Gordon & Roettger, Reference Gordon and Roettger2017, for a review). In some languages, the stressed syllable can be in different positions. For example, in English, stressed syllables can appear in any part of the word but the predominant pattern is on the initial syllable. In Italian, too, the position of the stressed syllable can vary, and is most frequently on one of the last three syllables. However, stress placement in these languages does not follow a strict rule – thus, it is unpredictable.
In Italian, there are two main stress patterns: a dominant pattern, on the penultimate syllable, occurring on ~75% of words (e.g., bamBIno, ‘child’), and a non-dominant pattern, on the antepenultimate syllable, occurring on ~18% of words (e.g., TAvolo, ‘table’; Colombo, Reference Colombo1992; Spinelli, Sulpizio & Burani, Reference Spinelli, Sulpizio and Burani2017). Stress on the last syllable is much less frequent (about 2–3%). The main acoustic correlate of stress in Italian is the increase in duration of the vowel, in particular in open non-final syllables (Bertinetto, Reference Bertinetto1980; D’Imperio & Rosenthal, Reference D’Imperio and Rosenthal1999). Moreover, the duration of the vowel is even longer when the tonic syllable is the penultimate syllable (caTEna, ‘chain’), compared to when it is the initial syllable (TAvolo; D’Imperio & Rosenthal, Reference D’Imperio and Rosenthal1999).
How do the acoustic correlates of stress develop? Is the child able to master the contrastivity between stressed and unstressed syllable so as to accurately produce words differing only for stress, and not for the phonological segments (PApa ‘pope’, paPÁ ‘daddy’; ANcora ‘anchor’, anCOra ‘still’)?
According to Schwartz, Petinou, Goffman, Lazowski, and Cartusciello (Reference Schwartz, Petinou, Goffman, Lazowski and Cartusciello1996), who investigated several acoustic characteristics of stress in two-year-old children with novel words, English speaking children are able to produce different forms for the stressed and the unstressed syllable already at 22 months of age (see also, Olivucci, Pasqualetto, Vayra & Zmarich, Reference Olivucci, Pasqualetto, Vayra, Zmarich, Savy and Alfano2016, who examined the recordings of Italian speaking children). However, the amount of contrastivity between the two types of syllables (i.e., the difference in several parameters, like duration, pitch and amplitude) may be different in child speech than in adult speech (see also Höhle et al., 2009; Jusczyk, Cutler & Redanz, Reference Jusczyk, Cutler and Redanz1993).
Ballard et al. (Reference Ballard, Djaja, Arciuli, James and van Doorn2012) investigated stress contrastivity in English-speaking children (3–7 years), by measuring the normalized pairwise variability index (PVI), where the pairwise difference between successive syllables is divided by the average value of the pair in each dimension. For example, the difference between the mean duration of V1 and V2 in the syllables ‘po’ and ‘ta’ of the word ‘potato’ would be divided by the sum of the two vowels duration in order to get the PVI. Ballard et al. found that children’s productions showed less contrastivity than adults’, but only for words beginning with a weak-strong pattern, like ‘potato’. The authors noted that most multisyllabic English words have a strong-weak pattern. Thus, they hypothesized that this difference in degree of contrastivity across successive syllables might be due to the different amount of exposure to words with the weak-strong pattern, and less practice in producing that pattern. Arciuli and Ballard (Reference Arciuli and Ballard2017) used the same methodology in their study of speech production in children aged 8–11 years. Results showed that even in these older children there remained aspects of speech production that were not adult-like.
This aspect of speech production was also investigated by Arciuli and Colombo (Reference Arciuli and Colombo2016), who examined Italian children aged 3–5 using trisyllabic words with penultimate (paTAta ‘potato’) and with initial syllable stress (MAcchina ‘car’). As noted, words with penultimate syllable stress are much more frequent in Italian so if the differential patterns of production for strong-weak and weak-strong syllables in English speaking children was due to practice effects, as proposed by Ballard et al. (Reference Ballard, Djaja, Arciuli, James and van Doorn2012) and Arciuli and Ballard (Reference Arciuli and Ballard2017), then Italian children should show the opposite pattern. That is, for Italian children contrastivity between the stressed and the unstressed syllable should be more adult-like for the most frequent stress type, the penultimate stress type, with an unstressed initial syllable, than for words with initial stress. In fact, Arciuli and Colombo (Reference Arciuli and Colombo2016) found that even three-year-olds’ productions were not significantly different in degree of contrastivity from those of adults, for both types of stress pattern. Thus, in Italian children, this feature appears to develop rather early, at least by age three.
In the current study, we measured acoustically the duration of V1 before a geminate or a singleton, in stressed and unstressed syllables (and, for the sake of completeness, we also measured V2 which followed a geminate or a singleton consonant). According to the account of adult speech by Bertinetto and Vivalda (Reference Bertinetto and Vivalda1978) there should be a greater reduction in the duration of V1 before a geminate than before a singleton in stressed syllables. This reduction may not be so great when V1 is not stressed (i.e., in penultimate stress nonwords). We were particularly interested in exploring the developmental trajectory of the ability to produce the appropriate durational contrast in the vowels surrounding the geminate/singleton. We know that the ability to produce a stress contrast is present even in Italian speaking three-year-old children (Arciuli & Colombo, Reference Arciuli and Colombo2016), but the vowel reduction in V1 required for the singleton/geminate contrast may be more subtle, and we hypothesized that this contrast may not be mastered by young Italian speaking children. Therefore, we included the age factor in the design. A protracted developmental trajectory would be reflected in a three-way interaction, type-of-consonant (geminate/singleton) by stress position (V1/ V2) by age.
We used stimuli in which nonwords with intermediate singleton/geminate stop consonants preceded/followed a stressed/unstressed vowel. Our elicitation task was nonword repetition (NWR) supported by pictures, where children orally repeated a nonword representing an unfamiliar object. Nonwords were used because this allowed for the manipulation of multiple factors: geminate/singleton and stressed/ unstressed syllable. Nonwords were trisyllabic with a CVCVCV or CVCCVCV structure, in which the medial consonant could be singleton or geminate (only stop consonants were used). Four different versions of each nonword were created, given by the manipulation of the two factors (presence of) geminate (yes/no) and stress (stressed/unstressed): for example, (/ˈbabasi/, (/ˈbabbasi/, (/baˈbasi/, (/baˈbbasi/). We also tested a sample of adults, to be used as a comparison group to the children, in order to examine any developmental trends towards adult-like mastery. Note that it is not possible to find picturable real words that exhibit all of the contexts we were interested in.
Method
Participants
Overall, 77 children from a pre-school in a small town in north-eastern Italy were recruited for the purposes of the research. Due to technical issues only the recordings from 74 participants were analyzed (42 female and 32 male). The age of the participants ranged from 3 to 6 years: 17 children aged 3 years (mean = 3.62, SD = 0.22), 24 children aged 4 years (mean = 4.54, SD = 0.23), 21 children aged 5 years (mean = 5.58, SD = 0.28) and 12 children aged 6 years (mean = 6.22, SD = 0.18). The recordings were performed individually and in a quiet room of the pre-school building. Only children whose parents signed an agreement form participated to the research.
Additionally, 28 adults from north-east Italy were recruited, ranging from 19 to 62 years of age (mean = 33.89 anni, SD = 13.94, 16 females and 12 males). Like the children, the adults were interviewed and recorded in an isolated and silent room.
Materials
The 20 nonwords used in the repetition task had no meaning in the Italian language. They were created according to either a CVCVCV or a CVCCVCV trisyllabic structure, half of which carried stress on the penultimate syllable and half on the initial syllable. The consonants were all stop consonants easily produced by even young children as that would make the acoustic measurement of onset and offset of the near vowels easy. Of the 10 nonwords with initial stress, five had a singleton medial consonant (/ˈbabasi/, /ˈdidipo/, /ˈkeketo/, /ˈpapaso/, /ˈtitimo/) and 5 had a geminate medial consonant (/ˈbabbasi/, /ˈdiddipo/, /ˈkekketo/, /ˈpappaso/, /ˈtittimo/). Of the 10 onwords with penultimate stress, 5 had a singleton medial consonant (/baˈbasi/, /diˈdipo/, /keˈketo/, /paˈpaso/, /tiˈtimo/) and 5 had a geminate medial consonant (/baˈbbasi/, /diˈddipo/, /keˈkketo/, /paˈppaso/, /tiˈttimo/). Each nonword was inserted in a carrier sentence “Questo si chiama bàbasi. Puoi ripetere? Bàbasi” (“This is called bàbasi. Can you repeat? Bàbasi”), which was included in a Powerpoint presentation. Each of the twenty nonwords was associated to a picture, which represented a colourful object of strange and unfamiliar animals. Each slide included a picture and the simultaneous presentation of the recorded sentence with the nonword associated to that picture. The presentation was manually advanced by the experimenter.
Procedure
The task was performed on a laptop using a Powerpoint presentation. Each picture was presented simultaneously with a recording of the sentence prompt. The recording was transmitted via earphones with a head mounted microphone (Reloop RUF-1NH/HS), to ensure a clear and undisturbed understanding of the pronunciation and of the relative production. Participant’s responses were audio-recorded using Audacity software. There were 20 pictures and each stimulus nonword was presented twice. A practice round with 4 filler stimuli was performed to ensure the participants had become familiar with the task. The audio files were finally converted from the file extension .aup, native to Audacity, to one compatible to the software PRAAT (.wav 32 bit).
We used PRAAT, version 6.0.28, to undertake acoustic measurements (Boersma & Weenink, Reference Boersma and Weenink2017). Waveforms and wide-band spectrograms with a 300-Hz bandwidth were generated for each sound file. Vowel segmentation was made considering concurrent information from amplitude traces, intensity curves and F0 contour. The onset of V1 was considered in correspondence with an increase in amplitude and appearance of the formant structure. The offset of V1 was in correspondence with a drop in amplitude and change in formant structure. The onset of V2 was in correspondence with the release burst of the preceding stop consonant and appearance of formant structure, and its end in correspondence with an intensity variation in the waveform and spectrogram (see Figure 1). All analyzed vowels were intermediate vowels.
Results
Data analyses
Perceptual evaluations were made by the first and second authors (both native speakers) in order to determine whether each production was correct. Both segmental and suprasegmental features were evaluated. Productions were considered correct if all segments were pronounced correctly, and with correct stress and distinctive single/geminate pronunciations. Evaluations of accuracy made by the first and second authors exhibited high agreement (Cohen’s K = .88; Ranganathan, Pramesh & Aggarwal, Reference Ranganathan, Pramesh and Aggarwal2017). Any contested productions were then re-examined by both authors until there was final agreement. Acoustic analyses were only undertaken on correct productions. We measured the initial vowels in the initial two syllables of each nonword because stress fell on only one of these two syllables. Means and standard deviations of the acoustic measurements are shown in Table 1.
Children data
In total, we collected 1480 data points (74 children x 20 nonwords). However, the data from one three-year-old were removed because he was not able to repeat most nonwords. Also, for the analysis of vowel durations, two nonwords, both with initial stress, (dìddipo and tìttimo) were removed because they were repeated by only 14 and 7 children, respectively. As noted earlier, only accurate repetitions were analyzed acoustically. For accuracy 1379 data points were available, and for the analysis of vowel duration 1037 data points were available from the children. For the analysis of vowel duration, the data from the children aged 3–6 were sorted into two age groups: 3–4 and 5–6, so as to have a simpler statistical design (e.g., fewer contrasts between age levels). The main pattern of results was not substantially different from that obtained in the analyses with all four age groups (see Supplementary materials, https://osf.io/dbemv/?view_only=cb69224126674000a0247358b563af1b).
Adult data. In total, 560 data points were collected (28 adults x 20 nonwords). After removing incorrect productions 549 data points were available for acoustic analyses. Data and scripts are available at: https://osf.io/dbemv/?view_only=cb69224126674000a0247358b563af1b.
Analyses
Accuracy
Out of the 1379 data points available from children, 881 were considered correct (64.%). Table 1 shows the trend of accuracy rate through the children’s ages for nonwords with single/geminate consonants. Accuracy was high on adult data (98%); therefore, these data were not further analyzed.
The analyses on children’s accuracy data were conducted using generalized linear mixed-effects modelling in R version 3.5.2 GUI (R Core Team, 2018), treating subjects and items as random effects and treating group (3, 4, 5 and 6, between-subjects and within-items), stress position (first vs second syllable, within-subjects and items) and type of consonant (singleton/geminate, within-subjects and items) as fixed factors. The statistical model for the analysis with accuracy as dependent variable was: Children Accuracy = glmer (Accuracy ~ Age + Stress + Geminate + (1|Participant) + (1|Word)). The comparison between the model with main effects and the three-way interaction was not significant. The ANOVA showed significant effects for age (χ2 =18.83, p = 0), and type of consonant (χ2 = 7.9, p = .001). No other effect was significant. Mean accuracy rates are displayed in Table 2. The analysis of the main effect of type of consonant showed that nonwords with single consonants were produced more accurately (76%) than those with geminates (52%). The main effect of Age was further analyzed with pairwise contrasts.
Contrasts among the different age groups showed that the differences in accuracy between children aged three and four years and between five and six years were not significant. The other contrasts were significant: between three and five, p = .01; between three and six, p = .01; between four and five, p < .001; between four and six, p < .001.
For inaccurate productions, we also calculated the proportion of phonemic errors, geminate errors and stress errors, in order to see how each of these factors impacted on the probability to make a wrong production. Table 3 shows that phonemic errors (omissions, substitutions and insertions) were the most frequent at all ages, in particular for the nonwords containing the vowel /i/: (tittimo, diddipo). In a relatively high number of cases children pronounced the geminate as a singleton, according to the experimenters’ evaluations – that is, in these cases the duration of the geminate was not perceived by evaluators (first and second authors) as adult-like. Apparently, however, the proportion of each type of error remained relatively constant from three- to six-year-old children.
Analysis of vowel duration
The analyses of duration were conducted using linear mixed-effects modelling in R version 3.5.2 GUI (R Core Team, 2018), treating subjects and items as random effects and treating group (3–4, 5–6, and adults, between-subjects and within-items), stress position (first vs second syllable, within-subjects and within-items) and type of consonant (singleton/geminate, within-subjects and within-items) as fixed factors1.
The model was fit by maximum likelihood with the Laplace approximation technique. The lme4 package, version 1.1–21 (Bates, Mächler, Bolker & Walker, Reference Bates, Mächler, Bolker and Walker2015), was used to run the linear mixed-effects model. The statistical model for the analysis of the three-way interaction with V1 and V2 duration as dependent variables was: V duration= lmer(V_Duration ~ Stress * Group * Geminate + (1|Participant) + (1|Word)). When random slopes were included in the model for the within-participants variables (Group for items; stress and geminate for participants), the model did not converge. Besides, a comparison of the Akaike information criterion (AIC, Akaike, Reference Akaike, Parzen, Tanabe and Kitagawa1998) did not show any advantage for the full model – therefore, it was decided to keep the simplest possible model (following the advice of Barr, Levy, Scheepers & Tily, Reference Barr, Levy, Scheepers and Tily2013).
The mean duration of V1 and V2 based on the by-participant data is presented in Table 1 (see also Figure 2). The separate analysis on the duration of V1 and V2 showed that the model including the three-way interaction explained more variance compared to the model with only the main effects, χ2 = 40.71, p = 0, for V1 and χ2 = 33.59, p = 0, for V2. Separate analyses were then carried out on the data for nonwords with stress on the penultimate and initial syllable.
Penultimate syllable stress – V1 duration
In the analysis of nonwords with penultimate stress, each factor was added to the model with just the random effects, participants and items, and the different models were compared. The model that explained more variance included the main effect of group, χ2 = 42.27, p < .001, and the two-way interaction, χ2 = 13.69, p < .001, while the effect of geminate did not explain any further variance, χ2 = 1.64, p = .20 (see Figure 3). Planned contrasts were conducted to examine the geminate/singleton conditions in each group, but differences between singletons and geminates were not significant in any group. The interaction came about because the differences in the durations of V1 between each group for geminates and singletons were not the same. For geminate contexts, the duration of V1 was shorter in adults than in older children (t = 5.52, p < .0001), or in younger children (t = 9.50, p < .0001). Also, the duration of V1 was shorter in older than in younger children (t = 3.01, p = .008). For singletons the duration of V1 was again shorter for adults than for older (t = 5.15, p < .0001) and younger children (t = 5.56, p < .0001) but the difference between older and younger children was not significant (t = .08, p < .1).
Penultimate syllable stress V2 duration
V2 is the stressed syllable in the nonwords with penultimate stress, but it comes after the singleton/geminate consonant. Therefore, we expected no particular influence of the geminate on V2, but only an effect of group. Indeed, the effect of group was significant, χ2 = 42.49, p = 0. The effect of geminate was not significant, χ2 = .04. The interaction between group and geminate/singleton was, however, significant, χ2 = 8.54, p < .05. Planned contrasts between singletons and geminates were run separately for each group. None of the geminate/singletons contrasts was significant. When the consonant was a singleton, V2 duration was significantly longer in older children than in adults, t = 4.99, p < .0001, and adults also differed from younger children, t = 5.57, p < .0001, but older children did not differ from younger children, t = .01, p = .99. When the consonant was a geminate, the same trend appeared in the comparisons of adult-older, adult-younger, and older-younger (t = 6.40, p < .0001; t = 8.85, p < .0001; t = 1.49, p = .29, respectively.) The interaction came about because the difference between adults and the child groups was larger in the geminate condition, and also with a reversed pattern, compared to V1: numerically longer durations for the geminate compared to the singletons (see Figure 2).
The results of the analyses on V1, which is unstressed in nonwords with penultimate syllable stress, showed an interaction between type of consonant and age, mainly due to the fact that the older children’s V1 duration decreased more rapidly when preceding a geminate than a singleton (fig. 3). For singletons, the older and younger groups’ V2 duration was the same.
We did not expect effects of the geminate condition on V2, but its duration tended to be longer than that for the singletons in all groups, and the consonant type by group interaction was again significant, showing a larger difference between the children and the adults in the geminate than in the singleton condition.
Initial syllable stress V1 duration
The analysis of V1 duration on initial stress nonwords showed a significant effect of group, χ2 = 71.91, p = 0, and a main effect of geminate/singleton, χ2 = 6.73, p < .001. There was also a significant interaction, χ2 = 14.39, p = 0 (see Figure 1). Planned contrasts were run between the geminate and singleton conditions in each group. V1 was significantly shorter in the geminate than in the singleton in adults (t = 2.34, p < .05), and the older children group (t = 2.39, p < .05), but not in the younger children group (t = 1.19, p = .26).
Initial syllable stress V2 duration
The analysis on V2 duration showed a significant effect of group (χ2 = 42.01, p = 0). V2 durations were shorter for adults compared to older children (t = 3.12, p = 0.005), and to younger children (t = 6.38 < .0001), and for older compared to younger children (t = 2.60, p = 0.02).
The influence of geminates was clearly apparent in nonwords in which V1 carries stress, (i.e., nonwords with initial stress). For V1, besides the longer durations of the vowels in children compared to adults, there was a main effect of geminates, with V1 shorter before geminates than before singletons, but only in adults and older children, not in younger children.
General Discussion
In the present study, we examined the duration of the vowel preceding and following singleton and geminate consonants in stressed/unstressed syllables in the nonword productions of children and adults. The aim was to see if children’s productions were adult-like, and if not, to examine the developmental trajectory towards mastery.
Our review of the extant literature did not allow a straightforward prediction. However, we anticipated that an adult-like production of vowels in the presence of the geminate/singleton contrast might not be found in young children due to a protracted developmental trajectory of this aspect of speech production. Previous cross-linguistic work with Catalan, English and Spanish has shown that, overall, children produce vowels with longer durations compared to adults, and suggests a delay in mastering vowel reduction in unstressed syllables (Payne et al., Reference Payne, Post, Astruc, Prieto and Vanrell2012; see also Allen & Hawkins, Reference Allen, Hawkins, Bell and Hooper1978, for English).
The acoustic analyses we conducted on the duration of the vowels preceding and following the medial consonant (V1 and V2) suggest that children are progressively acquiring knowledge about the vowel’s duration required to distinguish between singletons and geminates. This is reflected by the developmental trend in the duration of V1 to produce an adult-like phonetic realization of geminates. Although, overall, the children’s vowel duration is longer than the adults’, there are differences when the vowel precedes and follows a singleton versus a geminate. In particular, it is interesting to note that the greater effect of gemination emerges in correspondence with the stressed vowel. V1 of the nonwords with initial stress (e.g., /ˈpapaso/, /ˈpappaso/, where the stressed vowel precedes the geminate) was produced with a significantly shorter duration in the geminate, compared to the singleton condition in the older children and in adults, but not in the three and four-years-old children. When V1 is unstressed, as in nonwords with penultimate stress – e.g., (/paˈpaso/, paˈppaso/), which is the most frequent stress pattern in Italian – its duration before a geminate has a steeper decrease through the three age levels compared to when it precedes a singleton (see Figure 1). In other terms, very young children do not show differences in the duration of V1 depending on the singleton-geminate conditions, while older children are able to differentiate them, and more so when V1 is stressed.
Considering V2, although there is no effect of gemination in adults, confirming Esposito and Di Benedetto’s (Reference Esposito and Di Benedetto1999) results, some effects are apparent in children’s data. Here, the effect of the geminate/singleton status is more evident in nonwords with penultimate stress, where V2 is in the stressed syllable. Indeed, children’s duration of V2 in stressed syllables was significantly longer than in adults after the geminate than after the singleton, as shown by the two-way interaction type of consonant and group. This effect only occurred in penultimate stress nonwords, while the duration of V2 in initial syllable stress nonwords was not affected by the geminate/singleton status of the preceding consonant. The reason for the longer productions of V2 by younger children is not clear, and here we could only speculate that it might be related to the long consonant preceding a stressed (and therefore longer) vowel. It is interesting to emphasize again that, as noted in the introduction, the stressed vowel in Italian is realized mainly by its increase in duration compared with an unstressed vowel. Thus, the present study shows an interesting interaction between gemination and stress, particularly evident when comparing adult speech with child speech. A previous study which examined the spontaneous speech production of very young Italian children (age range 1;3–1;9), showed that the children produced higher proportion of words with geminates, compared to the proportion present in the child-directed speech by adults (Vihman & Majorano, Reference Vihman and Majorano2017). From this sample it was found that geminates are frequent in the mothers’ speech (about one third of the words have geminates), but in the child’s spontaneous speech the frequency of words with medial geminates reaches 50% (see also Caselli & Casadio, Reference Caselli and Casadio1996; Rinaldi, Barca & Burani, Reference Rinaldi, Barca and Burani2004). The present study, however, shows that while geminates are frequently used in spontaneous child speech, the effects on the surrounding vowel context of the singleton/geminate status is not mastered in an adult-like way before six years of age. This indicates that while at the phonetic level geminates are easy to pronounce for children, the phonological distinction between geminates and singletons, which must also accommodate the vowel position context and the stress pattern, is relatively more complex, and therefore more difficult to acquire.
In addition to the acoustic data, the pattern of errors from the present study provides interesting information. The data show that overall children’s productions were less accurate on nonwords including a geminate as opposed to a singleton consonant, and this pattern remained constant across ages 3–6 (see Table 3). This data, and the acoustic analyses, suggest that children in the age range we examined here have not yet developed adult-like mastery of the ability to produce geminates. In particular, their phonological representation of the complex relationship between gemination, stress and vowel context has not been acquired in an adult-like way. It is interesting to note that a similar pattern for errors was found for stress (although the effect was not statistically significant), with more errors on nonwords with initial (less frequent than stress on the penultimate syllable, in Italian) than with penultimate (dominant) stress. In a previous study, Arciuli and Colombo (Reference Arciuli and Colombo2016)) examined the production of real words (via picture naming) in children in the 3–6 age range, but no developmental effect of stress was found. Thus, it was concluded that children had completely mastered the ability to produce the adult-like stress contrastivity. Apparently, this mastery is present mostly with familiar words. Similarly, for the acquisition of the phonological representation of the geminate/singleton contrast, the ability to provide the correct vowel length contrastivity between singleton and geminates in V1 is present even in three-year-old children when they have to name familiar words, rather than repeating nonwords (Infanti, Reference Infanti2018). A similar dissociation between words and nonwords has been found by Keren-Portnoy, Vihman, DePaolis, Whitaker, and Williams (Reference Keren-Portnoy, Vihman, DePaolis, Whitaker and Williams2010) in 26-month-old children who were presented with real words and with nonwords including consonants that were (IN) or were not (OUT) part of the child’s production repertoire. The children’s performance was better with familiar words than nonwords, even though the real words included unfamiliar consonants, and worst with nonwords that only included unfamiliar consonants. The authors underlined the strong influence of familiarity in both perception and production with both the single phonemes and the sequences in which they were embedded.
It is possible that the mechanisms used by children to produce the repetition of nonwords might be different from those used to produce real words. Thus, the production of real words may be realized by retrieving the memory traces of specific tokens (Johnson, Reference Johnson, Johnson and Mullennix1997; Pierrehumbert, Reference Pierrehumbert, Bybee and Hopper2001). This might also occur for nonwords that have just been heard, but for already acquired words there are numerous memory traces (exemplars) stored in long term memory, and their retrieval and production may be easier because the stored phonological representation is stronger and less variable (Vihman & Keren-Portnoy, Reference Vihman and Keren-Portnoy2013). Just heard nonwords, instead, are retrieved from short term memory, and according to some accounts short term phonological memory does not develop fully until age 12–13 (Baddeley, Reference Baddeley2003; Baddeley, Gathercole & Papagno, Reference Baddeley, Gathercole and Papagno1998). Perhaps, young children’ s short term memory traces of nonwords are not strong enough to enable them to produce a subtle distinction between nonwords with a geminate or a singleton (see also Brown & Hulme, Reference Brown and Hulme1995). According to other, more recent accounts, nonword repetition is strongly influenced by lexical phonological knowledge acquired by a child (Jones, Reference Jones2016) and, as geminates are, overall, less frequently found than singletons, they are also less easily produced.
Limitations and Future Directions
Most previous studies of the acquisition of geminates have focused on the acoustic characteristics of the geminate itself rather than on vowel context, and most previous studies of the acquisition of vowel production in stressed and unstressed syllables have not considered geminate/singleton contexts. Here, we showed that there are differences between the adults’ and the children’s vowel production that depend on the presence/absence of a geminate and also on lexical stress patterns. In Italian at least the ability to produce adult-like vowel contrasts when repeating nonwords shows a developmental trajectory and is not completely mastered by young children. A more exhaustive investigation will require examining acoustically children’s performance on the singleton/geminate consonant. Moreover, analyses of the data from production of real words using picture naming might provide more specific information about the word/nonword production difference. Based on the present data, future research might also explore several potentially important issues for language acquisition, like whether the richness of a child’s vocabulary can influence subtle phonological distinctions, and/or help the formation of strong phonological memory traces.