As one of only a few potentially universal musical behaviors, singing represents a major form of engagement with music (Singh & Mehr, Reference Singh and Mehr2023). Singing is unique as a musical production behavior as it exists widely in the general population, does not require a physical instrument, and develops alongside language production ability (Dalla Bella et al., Reference Dalla Bella, Giguère and Peretz2007). Singing also has several neurocognitive and health benefits across the lifespan. Singing competency has been associated with greater feelings of social inclusion and connectedness in children, which are important for psychological wellbeing (Papageorgi et al., Reference Papageorgi, Saunders, Himonides and Welch2022). Regular singing may confer protective benefits against neurodegeneration in older adulthood (Särkamo, Reference Särkamo2018), with recent longitudinal evidence suggesting that it may maintain or even improve specific cognitive functions, such as verbal fluency (Pentikäinen et al., Reference Pentikäinen, Kimppa, Pitkäniemi, Lahti and Särkämö2023). It may also protect against age-related physiological declines in vocal quality, such as loss of control over the vocal folds (Lortie et al., Reference Lortie, Rivard, Thibeault and Tremblay2017; McHenry & Evans, Reference McHenry and Evans2022). Thus, singing can have positive impacts across the lifespan, and understanding the mechanisms that influence its development can inform how it increases and shapes our mental and physical wellbeing.
While aesthetic evaluations of singing ability often vary, a primary component of accurate singing is intonation, or the ability to sing in tune (Watts, Barnes-Burroughs et al., Reference Watts, Barnes-Burroughs, Andrianopoulos and Carr2003). Attesting to this, under most conditions even nonmusicians in the general population can reproduce familiar tunes with a reasonable level of accuracy. However, wide individual differences are evident (Dalla Bella & Berkowska, Reference Dalla Bella and Berkowska2009; Dalla Bella et al., Reference Dalla Bella, Giguère and Peretz2007; Pfordresher, Reference Pfordresher2022) and we still have limited understanding of what predicts the ability to sing accurately.
Genetic factors likely play a crucial role in the development of singing, although the exact mechanisms involved remain unknown (Park et al., Reference Park, Lee, Kim, Ju, Shin, Hong, von Grotthuss, Lee, Park, Kim, Kim, Yoo, Cho, Sung, Lee, Kim and Seo2012; Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). Two studies have shown that engaging in singing activities is partially heritable (20–71%), potentially reflecting predispositions for singing engagement (Coon & Carey, Reference Coon and Carey1989; Gustavson et al., Reference Gustavson, Friedman, Stallings, Reynolds, Coon, Corley, Hewitt and Gordon2021). A more recent twin study has shown that objective singing performance is also moderately heritable (∼41%), with similar estimates to what has been observed for singing engagement (Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). Likewise, Park et al. (Reference Park, Lee, Kim, Ju, Shin, Hong, von Grotthuss, Lee, Park, Kim, Kim, Yoo, Cho, Sung, Lee, Kim and Seo2012) found that vocal pitch-matching ability is partially heritable (40%), and identified genetic loci in large families related to this skill. Notably, studies investigating the genetic bases of other musical traits in humans have identified loci that are also known to regulate song-learning and cognition in songbirds (Nair et al., Reference Nair, Kuusi, Ahvenainen, Philips and Järvelä2019; Nair et al., Reference Nair, Raijas, Ahvenainen, Philips, Ukkola-Vuoti and Järvelä2020). This provides tentative evidence for a potential shared evolutionary basis of musical traits (Savage et al., Reference Savage, Loui, Tarr, Schachner, Glowacki, Mithen and Fitch2020; Singh & Mehr, Reference Singh and Mehr2023). These genetic factors may influence the maturation and development of both physiological and neurological processes underlying the song system of the human brain, including control of vocal articulatory systems (Harris et al., Reference Harris, Niven, Griffin and Scott2023) and networks involved in sensorimotor integration (Tsang et al., Reference Tsang, Friendly and Trainor2011).
Formal musical training and practice have also been shown to improve singing ability (Pfordresher, Reference Pfordresher2022; Pfordresher & Demorest, Reference Pfordresher and Demorest2021), but training is not a sufficient explanatory factor on its own (Ullen et al., Reference Ullen, Hambrick and Mosing2016; Watts et al., Reference Watts, Barnes-Burroughs, Estis and Blanton2006; Watts, Murphy et al., Reference Watts, Murphy and Barnes-Burroughs2003). Prior research points to several other environmental influences that shape singing, such as socioeconomic status (McPherson et al., Reference McPherson, Osborne, Barrett, Davidson and Faulkner2015), peer influence (Demorest, Kelley et al., Reference Demorest, Kelley and Pfordresher2017; Hall, Reference Hall2005), classroom-based instruction (Demorest, Nichols et al., Reference Demorest, Nichols and Pfordresher2017) and musical enrichment during childhood (Demorest, Kelley et al., Reference Demorest, Kelley and Pfordresher2017; Persellin, Reference Persellin2006; Theorell et al., Reference Theorell, Lennartsson, Madison, Mosing and Ullén2015). Although the relative importance of these different environmental influences remains unknown, in general, richer musical and singing environments are considered to foster greater singing participation and ability. While group singing is prevalent in most cultures (Shilton et al., Reference Shilton, Passmore and Savage2023), children raised in cultures where singing is a normative social behavior generally learn to sing at a faster rate (Kreuntzer, Reference Kreuntzer2001). Similarly, sociocultural beliefs such as stereotypes around singing may influence how likely individuals are to take up singing (Harrison, Reference Harrison2007). Within Western contexts, men are often discouraged from singing at a young age on account that it has been culturally identified as a feminine endeavour (Hall, Reference Hall2005; Harrison, Reference Harrison2007; Powell, Reference Powell2014). Since these differences in singing engagement can influence how singing ability develops over time, this may partly explain previously observed differences between male and female singing skills (Welch et al., Reference Welch, Saunders, Papageorgi, Himonides, Harrison, Welch and Adler2012).
Broadly, interactions between genetic and environmental factors are now considered to shape the development of singing ability (Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). In the general population singing ability has been found to follow a maturational trajectory of development across the lifespan. As shown in Figure 1, singing ability rapidly increases in the early years of life and plateaus in early adulthood (Pfordresher, Reference Pfordresher2022), with pitch and interval accuracy showing a small decrease with age after this point, potentially due to weakened control of vocal muscles (Pfordresher, Reference Pfordresher2022; Pfordresher & Demorest, Reference Pfordresher and Demorest2021; Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). However, the specific influence of genetic and environmental factors at different points of this trajectory has not been explored in detail, nor has the relative importance of different developmental periods for singing. Based on a large body of prior research (Ismail et al., Reference Ismail, Fatemi and Johnston2017), we might expect that childhood is an especially important time when neuroplasticity is greatest, potentially providing a sensitive period for singing development. Were this the case, enriched singing experiences early in life, such as regular singing in the home, classroom singing or formal training, would have the greatest impact on brain maturation, and in turn, on the developmental trajectory of singing (Figure 1). This might also account for the observation that formal training is neither sufficient, nor even necessary, to become a proficient singer (Pfordresher & Demorest, Reference Pfordresher and Demorest2021; Watts, Murphy et al., Reference Watts, Murphy and Barnes-Burroughs2003).
Both sensitive and critical periods have been observed for some musical traits and for language development. Here we use the accepted definition of sensitive periods as time windows where relevant environmental experience has the most impact on the development of a skill. In contrast, critical periods are windows where experience is necessary for development of a skill. Songbirds have critical periods early in life where species-specific songs must be learnt, which serves analogous social and communicative functions to language in humans (Bolhuis et al., Reference Bolhuis, Okanoya and Scharff2010; Doupe & Kuhl, Reference Doupe and Kuhl1998). Notably, the neurocognitive networks subserving singing and language in humans have been shown to be proximally located, with their overlap partly influenced by the perceptual or productive nature of the task and the individual’s level of speech proficiency or singing expertise (Brown et al., Reference Brown, Martinez and Parsons2006; Ozdemir et al., Reference Ozdemir, Norton and Schlaug2006; Pitkäniemi et al., Reference Pitkäniemi, Särkämö, Siponkoski, Brownsett, Copland, Sairanen and Sihvonen2023; Whitehead & Armony, Reference Whitehead and Armony2018; Wilson et al., Reference Wilson, Abbott, Lusher, Gentle and Jackson2011).
There is consensus that language acquisition in humans has a critical period (Friedmann & Rusou, Reference Friedmann and Rusou2015; Mayberry & Kluender, Reference Mayberry and Kluender2018; Werker & Hensch, Reference Werker and Hensch2015), as does the acquisition of pitch-verbal label associations necessary for the expression of absolute pitch (Bairnsfather et al., Reference Bairnsfather, Ullen, Osborne, Wilson and Mosing2022; Levitin & Zatorre, Reference Levitin and Zatorre2003; Wilson et al., Reference Wilson, Lusher, Martin, Rayner and McLachlan2012). Otherwise, critical periods in music have mainly been observed for basic sensory processes, like pitch perception (Lynch & Eilers, Reference Lynch and Eilers1992; Lynch et al., Reference Lynch, Eilers, Oller and Urbano1990) and rhythm perception (Hannon & Trehub, Reference Hannon and Trehub2005a, Reference Hannon and Trehub2005b). Both demonstrate perceptual narrowing in early life, where enculturation processes reduce infants’ sensitivity to unfamiliar tonal or metric systems (Penhune & De Villers-Sidani, Reference Penhune and De Villers-Sidani2014; Zhao et al., Reference Zhao, Llanos, Chandrasekaran and Kuhl2022). For other, more complex musical traits, recent reviews point to the importance of sensitive periods in childhood when neuroplasticity is greatest (Habib & Besson, Reference Habib and Besson2009; Penhune, Reference Penhune2020). For instance, benefits associated with early onset of music training have been reported for rhythmic synchronisation and melody discrimination (Bailey & Penhune, Reference Bailey and Penhune2012; Ireland et al., Reference Ireland, Iyer and Penhune2019), as well as closely related cognitive functions such as executive skills and motor learning (Chen et al., Reference Chen, Scheller, Wu, Hu, Peng, Liu, Liu, Zhu and Chen2021; Watanabe et al., Reference Watanabe, Savion-Lemieux and Penhune2007).
The effects of music training are also typically associated with greater changes in neural structure and function during childhood due to heightened neuroplasticity (Bailey & Penhune, Reference Bailey and Penhune2012; Penhune, Reference Penhune2020). These changes are most evident in regions activated during music perception and production tasks, such as the auditory and motor networks (Zatorre et al., Reference Zatorre, Chen and Penhune2007). However, Wesseldijk et al. (Reference Wesseldijk, Mosing and Ullen2021) recently reported that age of onset did not predict rhythmic or melodic perception after controlling for lifetime practice and familial confounding. Ireland et al. (Reference Ireland, Iyer and Penhune2019) also found no relationship between age of onset and discrimination of complex melodies or rhythms. As such, sensitive periods may not be evident for all forms of musical behaviour, highlighting that a sensitive period for singing warrants empirical investigation.
Several lines of evidence indicate a possible sensitive period for singing. First, while singing engages many regions of the brain, both structural and functional differences related to the amount of singing training have been observed (Zarate et al., Reference Zarate, Wood and Zatorre2010; Zarate & Zatorre, Reference Zarate and Zatorre2008), particularly in regions supporting sensorimotor integration of auditory and motor representations (Kleber et al., Reference Kleber, Zeitouni, Friberg and Zatorre2013; Zamorano et al., Reference Zamorano, Zatorre, Vuust, Friberg, Birbaumer and Kleber2023). Sensorimotor integration is a crucial neurocognitive process for accurate singing (Hutchins & Peretz, Reference Hutchins and Peretz2012; Pfordresher et al., Reference Pfordresher, Halpern and Greenspon2015; Tsang et al., Reference Tsang, Friendly and Trainor2011), with recent evidence suggesting that sensorimotor integration has a sensitive period for development (Penhune, Reference Penhune2020; Steele et al., Reference Steele, Bailey, Zatorre and Penhune2013; Vaquero et al., Reference Vaquero, Hartmann, Ripolles, Rojo, Sierpowska, Francois, Camara, van Vugt, Mohammadi, Samii, Munte, Rodriguez-Fornells and Altenmuller2016). Second, in a recent twin study we noted that the familial environment in early childhood may have a crucial role to play (Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). Specifically, we found that singing with family in childhood and being surrounded by music were important predictors of singing ability, after controlling for current singing with family. Combining these lines of evidence points to the possibility of a sensitive period for singing development early in life, where genetic and environmental factors interact and drive neuroplastic changes that maximise the development of this skill (Figure 1).
Intuitively, a sensitive period for singing would manifest in childhood alongside sensitive periods for other musical behaviors when neuroplasticity is greatest (Bailey & Penhune, Reference Bailey and Penhune2012). However, to our knowledge, the existence of a sensitive period for singing ability in humans has not been empirically tested. In addition, studies that examine sensitive periods for music generally focus solely on music training and do not account for other environmental factors (Penhune, Reference Penhune2011, Reference Penhune2020), such as the familial environment. Thus, examining how early and current familial singing influence singing ability relative to vocal training is valuable for understanding how this skill is shaped throughout the lifespan.
Given previous findings point to an early sensitive period for singing ability, we would expect that early shared familial singing predicts greater singing accuracy than current engagement in familial singing. Familial environments, however, are not all equal. Parents who sing accurately may be more likely to create enriched singing environments for their children. Likewise, current familial environments will vary according to an individual’s age, living arrangements and other factors that impact opportunity like socioeconomic status. Since all of these factors may influence how frequently people sing with their families, here we adopted a broad view of early and current familial singing across environments and contexts to provide an initial estimate of the general influence of familial singing.
In particular, in a large, previously described sample of Australian twins we used structural equation modelling (SEM) to explore the relative contributions of early and current singing with family and vocal training to singing ability, while also accounting for sex and age. We then explored observed effects further using bivariate twin modeling. Our analyses were based on a previously validated psychometric tool, the ‘Melbourne Singing Tool’, which produces a phenotypic index that reliably captures everyday singing ability (Tan et al., Reference Tan, Peretz, McPherson and Wilson2021; Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). We used SEM as it provides a powerful method of exploring hypothesised multivariate relationships (e.g., see Senn et al., Reference Senn, Bechtold, Hoesl, Jerjen, Kilchenmann, Rose, Baldassarre, Sigrist and Alessandri2023, for a recent example on musical groove). Using SEM, the current study expands on our previous findings by testing explicit hypotheses about the relative roles of vocal training, early and current familial singing on singing ability while accounting for their influence on each other. We hypothesised that:
-
H1: Early singing with family would predict singing ability in adulthood.
-
H2: Vocal training would predict singing ability in adulthood.
-
H3: Early familial singing would be a stronger predictor of singing ability than current familial singing.
Materials and Methods
Participants
Data from 1189 Australian twins from a previous study (Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022), collected between 2012−2019, were analysed in this study. For the SEM, the twins were treated as a general cross-sectional sample and thus, to rule out twin-specific effects we also ran our main structural model on two subsamples of statistically independent singletons. Both subsamples showed the same pattern of results as the full sample (see Supplemental Results), so we have reported results from the full sample here. Ethical approval was given by the University of Melbourne Office of Research Ethics and Integrity (ID: 1750061).
Participants were recruited with the assistance of Twins Research Australia, a national database of twins, as well as social media and print advertisements. Participants were provided with information through the Melbourne Singing Tool’s webpage and were informed that by completing the activity, they were providing informed consent to participate. Participants self-reported their age and biological sex in the first part of the tool. Based on this self-report, irrespective of age, we refer to males and females as men and women respectively from this point onwards. After removing participants with missing data on at least one variable, 1163 participants (855 self-reported female) were included in the final analyses. The mean age of the current sample was 43.5 years (SD = 16.5), with a range of 15−90 years. We used the R Shiny app pwrSEM (with 1,000 simulations; Wang & Rhemtulla, Reference Wang and Rhemtulla2021) for a priori power analysis, and estimated that a minimum of 268 participants was needed for at least 80% power to detect a minimum effect size of β = .20 for all three paths of interest, as described in further detail below (cf. Figure 2).
Materials and Procedure
Participants completed the Melbourne Singing Tool, which is an online tool designed for widescale singing research previously described by Tan et al. (Reference Tan, Peretz, McPherson and Wilson2021). In brief, the tool includes three singing tasks that capture everyday singing ability. The first asks participants to sing the familiar song ‘Happy Birthday’ on the syllable ‘dah’. The second is a vocal pitch-matching task where participants listen to individual notes and sing them back. The third is an unfamiliar tune task where participants listen to a series of 7-note tonal melodies and sing them back as accurately as they can (see Tan et al., Reference Tan, Peretz, McPherson and Wilson2021 for more details). To account for range, participants were presented stimuli in different octaves depending on their reported sex for the pitch-matching and unfamiliar tune tasks, with men presented stimuli an octave lower than women. Participants were able to start on any note for the Happy Birthday task.
From these tasks, five measures of singing performance were extracted, including three pitch accuracy measures (one for each task) and two interval accuracy measures (for the melodic tasks) (Tan et al., Reference Tan, Peretz, McPherson and Wilson2021). Pitch accuracy in this context refers to the absolute difference, or deviation, between each sung note and target note, measured in cents. Interval accuracy, on the other hand, describes the absolute difference between a sung interval (the difference between two adjacent notes) and the target interval. The five measures were extracted using the open-source program ‘TONY’, which uses the pYIN algorithm to automatically segment notes in audio and identify each note’s fundamental frequency (Mauch et al., Reference Mauch, Cannam, Bittner, Fazekas, Salamon, Dai, Bello and Dixon2015). Each participant’s audio was aurally and visually inspected for any misidentified or missing notes, and manually corrected where needed. TONY’s automated pitch segmentation method correlates very highly with manual segmentation (Tan et al., Reference Tan, Peretz, McPherson and Wilson2021). Pitch and interval accuracy measures were calculated for each note/interval in each trial (Tan et al., Reference Tan, Peretz, McPherson and Wilson2021; see Supplemental Methods for formulae), and then averaged over the task to provide a single accuracy value for each participant on the five measures described above. In previous work we showed that these five measures all load strongly onto one latent factor, which we termed the ‘Singing Phenotypic Index’ (SPI; Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). Higher scores on the SPI represent better performance on the tasks. Tan et al. (Reference Tan, Peretz, McPherson and Wilson2021) showed high test−retest reliability for the SPI over a period of 4.5 years (r = .65−.80). Further details on how these measures are extracted and calculated are available in Tan et al. (Reference Tan, Peretz, McPherson and Wilson2021) and Yeom et al. (Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022).
Participants were asked whether they had done any musical training (‘Have you ever had lessons on a musical instrument or voice?’). If they responded ‘yes’, they were then asked what their primary instrument was (i.e., the instrument they had the most engagement and experience with), as well as how many years of training they had on that instrument and what age they were when they began training. 109 (9.4%) participants reported voice as their primary instrument, 744 (64.0%) reported a nonvocal instrument, and 310 (26.7%) reported no instrumental training. We winsorized years of training to a maximum of 20 years to account for implausible training values. To create a measure of vocal training, we recoded the years of training variable for participants who reported voice as their primary instrument. Ten of the 109 participants (9.2%) had their years of training winsorized to 20 years. Participants who reported other instruments or no training were assigned a value of zero on this vocal training measure. We note that this is a relatively conservative measure of vocal training and would not include individuals who may have had vocal training but did not report voice as their primary instrument. For the familial variables, participants were asked to rate how often they sang with family in their childhood (early singing) and now (current singing) on a 5-point scale (1 = Not at all, 5 = A great deal). These two variables were selected as measures of familial environmental influence in twins, as they represented singing behaviors that twins reared in the same households would have likely experienced together (Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022).
Statistical Analyses
All analyses were conducted in R version 4.3.1 (R Core Team, 2023). SEM was performed using the lavaan package in R (version 0.6-17; Rosseel, Reference Rosseel2012). A maximum likelihood estimator was used, with robust standard errors and Satorra-Bentler corrections (MLR in lavaan) to account for potential multivariate non-normality in the twin data. We evaluated model fit using the robust comparative fit index (CFI), robust Tucker-Lewis Index (TLI), the robust root mean square error of approximation (RMSEA) and the standardized root mean square residual (SRMR).
Figure 2 summarises the model that was tested in this analysis. To test H1, we specified a direct path between early familial singing and the SPI. We also specified a path from early familial singing to current familial singing, and then from current singing to the SPI to test H3. To test H2, we included a path from vocal training to the SPI. We included age and sex to account for demographic effects in our model. We specified paths between the two demographic variables and both early familial singing and vocal training, as well as direct paths from them to the SPI. On average, participants who reported vocal training started at 13.8 years of age (median = 11; SD = 10.5). We therefore specified a covariance pathway between early singing with family to vocal training, to assess whether these variables shared a relationship.
Transparency and Openness
This study was not preregistered. The data for this study cannot be shared publicly, as ethical approval to openly share data was not sought at the time the study began. Therefore, the data are subject to restrictions by both our institutional review board and Twins Research Australia. Requests for the data should be directed to the corresponding author. The code used to run the analyses and create the figures is available on Figshare at the following link: https://doi.org/10.26188/23694378.
Results
Table 1 shows descriptive statistics for the variables used in the study. Table 2 shows Pearson’s correlations between all the variables in the SEM. Given the significant correlation between sex and age, we specified a covariation term between these two variables in the SEM to control for this relationship.
Note:
a Rated on a 5-point scale, where 1 = Never, 2 = Rarely, 3 = Sometimes, 4 = Often, 5 = Always.
b Years of vocal training is winsorized to 20 years.
Note: SPI, Singing Phenotypic Index. p < .01; **p < .001.
a Rated on a 5-point scale, where 1 = Not at all, 5 = A great deal.
b Years of vocal training is winsorized to 20 years.
Structural Equation Model
Model fit indices suggested adequate to good fit (robust CFI = .964, robust TLI = .821, robust RMSEA = .096, SRMR = .033). This model explained 24.3% of the variance in the SPI, and is shown with standardized coefficients in Figure 3. Sex (β = −.399, p < .001, 95% CI [−.553, −.245) and age (β = −.009, p < .001, 95% CI [−.013, −.005]) were significant predictors of early singing with family. Men sang less with their family in childhood, and participants who were older also tended to have sung less in early childhood. Age was a significant predictor of years of training, in that older participants also tended to have less vocal training (β = −.014, p = .002, 95% CI [−.023, −.005]). Sex remained a nonsignificant predictor of years of vocal training (β = .043, p = .821, 95% CI [−.333, .419]). Early singing with family significantly covaried with vocal training (β = .519, p < .001, 95% CI [.302, .737]), whereby participants who sang more with their family in childhood also had more years of vocal training.
Likewise, more early familial singing predicted more current familial singing (β = .543, p < .001, 95% CI [.497, .590]); however, current singing with family now showed a much smaller effect compared to early singing and no longer predicted the SPI, supporting H3 (β = .011, p = .677, 95% CI [−.040, .062]). Early singing with family directly predicted the SPI, in line with H1 (β = .345, p < .001, 95% CI [.293, .397]). To a lesser extent, years of vocal training also significantly predicted the SPI (β = .043, p < .001, 95% CI [.029, .058]), supporting H2. Sex also no longer predicted the SPI (β = −.072, p = .192, 95% CI [−.181, .036]) but age was a significant predictor, indicating a slight decline in singing ability with age (β = −.006, p < .001, 95% CI [−.009, −.003]). This pattern of results did not change when variables were mean-centered and scaled prior to running the model. As noted above, to rule out the possibility that twin-specific effects were driving these results we split the sample by randomized twin order and re-ran the model on each subsample. Both subsamples showed the same patterns of results (Figures S1−S2, Supplementary Material). Finally, when considering years of training on any instrument instead of vocal training only, the three hypothesised effects remained the same (Figure S3, Supplementary Material).
Exploratory Mediation Analyses
While bivariate correlations between sex and the SPI suggested a significant correlation (Table 2), the SEM indicated that this association disappeared when other paths involving these variables were taken into account. We performed an exploratory mediation analysis to formally test whether early singing with family mediated the sex difference in singing ability (N = 1163, Figure 4). Percentile-based bootstrapping of standard errors was used. Early singing significantly mediated the relationship between sex and the SPI (indirect effect = −.174, bootstrapped z = −5.589, p < .001, 95% CI [−.236, −.113]). In line with the SEM shown in Figure 3, men sang less with family during childhood than women, and this difference in early singing was associated with poorer performance on the SPI by men. The direct effect of sex on the SPI was now negligible after accounting for this indirect effect (β = −.102, bootstrapped z = -1.898, p = .06, 95% CI [−.209, .005]).
Bivariate Modeling of Early Familial Singing
To examine whether genetic and/or environmental factors were driving the relationship between early familial singing and objective performance, we fit an exploratory bivariate twin model with a Cholesky decomposition adjusted for age and sex. Bivariate twin models allow for the estimation of genetic and environmental influences on the covariation between two variables, which can unpack the factors that shape their association (de Vries et al., Reference de Vries, Van Beijsterveldt, Maes, Colodro-Conde and Bartels2021). Thus, applying a bivariate twin model to the relationship between early familial singing and objective ability allowed us to examine whether the observed effect from the SEM model likely reflected genuine environmental influence, or whether genetic effects were instead driving the correlation. Twin modeling was conducted using the OpenMx package (version 2.21.8; Neale et al., Reference Neale, Hunter, Pritikin, Zahery, Brick, Kirkpatrick, Estabrook, Bates, Maes and Boker2016).
Cross-twin cross-trait correlations were r = .47 (95% CI: [.41, .52]) for MZ twins and r = .43 (95% CI: [.34, .50]) for DZ twins, indicating that a bivariate ACE model was suitable for the data. The phenotypic (within-pair) correlation between early familial singing and the SPI was r = .47. The contributions of A, C and E to this phenotypic correlation were 16%, 83.3% and 0.7% respectively, suggesting that the relationship was largely explained by common shared environmental influences. The genetic correlation (r G = .34, 95% CI: [.34, 1]) and unshared environmental correlation (r E = .01, 95% CI: [−.09, .11]) between early familial singing and ability were low. In contrast, the shared environmental correlation was high (r C = .88, 95% CI: [.88, 1]), suggesting that shared environmental influences between early familial singing and subsequent ability were both highly shared and explained most of the phenotypic relationship.
Nested models that constrained the A, C and E components of the phenotypic correlation to zero were then fit to test whether each covariance component was significant. Each nested model was compared to the full bivariate model using likelihood ratio tests. The contributions of A (p = .301) and E (p = .821) to the phenotypic correlation were not significant and thus could be safely dropped without worse model fit. However, the contribution of C was significant (p < .001). Nested bivariate AE and CE models also led to significantly worse model fit (both p < .001), indicating that while both early familial singing and objective singing ability had univariate genetic and environmental influences, only shared environmental effects influenced the relationship between them.
Discussion
This study showed that regular singing with family in childhood is an important predictor of objective singing ability. Using SEM, we tested the contributions of early and current singing with family on singing ability while also accounting for vocal training, age and sex. Our hypotheses that both early singing with family and vocal training would predict objective singing performance were supported, with early familial singing showing the stronger effect. Crucially, while we initially observed a significant relationship between current familial singing and objective performance, this was entirely explained by early familial singing in our structural model. This relationship remained constant when factoring in vocal-specific music training. Moreover, using an exploratory mediation analysis we showed that sex differences in objective singing performance were mediated by early familial singing. Finally, using exploratory bivariate twin modeling we showed that the effect of early familial singing on objective ability was driven mainly by common shared environmental influences.
These findings lend support to the possibility of a sensitive period of singing development. Penhune (Reference Penhune2020) argued that engagement with music during sensitive periods early in life may heighten the receptiveness of neural networks to musical experience, thereby shaping the long-term development of musical skills. Consequently, engaging in singing during a sensitive period may accelerate the development of singing, in line with evidence for other music-related behaviors (Bailey & Penhune, Reference Bailey and Penhune2012; Penhune, Reference Penhune2020). If a sensitive period for singing does exist, a key question will be determining when the period opens and closes. The exact time windows of sensitive periods for other musical skills have not been precisely mapped, in part because sensitive periods are subject to individual differences (Penhune, Reference Penhune2020). However, both prior evidence from other musical skills and the role of early familial singing observed in our data suggest that on average, this period lies in early childhood. People who sing frequently in childhood, particularly with their family, may experience important benefits for the development of their singing voice, perhaps via enhanced learning of melody and prosody (Greenspon & Montanaro, Reference Greenspon and Montanaro2023). Prosody and singing both rely on sensorimotor integration mechanisms underlying the vocal motor system (Greenspon & Montanaro, Reference Greenspon and Montanaro2023). Thus, these shared vocal mechanisms may be enhanced by early exposure to regular singing. In contrast, engagement after this sensitive period may still improve singing ability, albeit at a slower rate and to a lesser degree.
The significant covariance between early familial singing and vocal training suggests that children in early pro-singing environments are also more likely to have vocal training, which may further amplify the development of singing ability. Familial environments serve as a unique source of environmental enrichment (Demorest, Kelley et al., Reference Demorest, Kelley and Pfordresher2017; Theorell et al., Reference Theorell, Lennartsson, Madison, Mosing and Ullén2015), particularly in situations where a child is exposed to singing with family members from a very young age (Yan et al., Reference Yan, Jessani, Spelke, De Villiers, De Villiers and Mehr2021). Strikingly, our exploratory twin modeling showed that the relationship between early familial singing and objective ability was largely explained by shared environmental rather than genetic influences, and that these environmental influences were common to both variables. These results indicate that the effect of early familial singing on subsequent ability is truly environmental in nature, with family environments that encourage singing together in childhood also encouraging the development of singing ability.
Notably, early non-familial singing (e.g., in church choirs, school) may also be important sources of enrichment, which we were not able to analyse here. Similarly, the early familial environment may simultaneously influence the development of other musical skills relevant to singing, such as absolute pitch. Early environments that emphasise fixed pitch-label associations have been implicated in the development of absolute pitch (Wilson et al., Reference Wilson, Lusher, Martin, Rayner and McLachlan2012), which may subsequently alter the development of singing ability (Dohn et al., Reference Dohn, Garza-Villarreal, Ribe, Wallentin and Vuust2014). Importantly, a robust longitudinal design with individuals who start singing both early and later in life would provide a confirmatory test of the existence of a sensitive period by comparing the rate of singing development between early- and late-onset singers. Such a design would also allow careful examination of the influence of other factors, such as non-familial environments, self-reported singing ability, other musical skills, and demographic characteristics like socioeconomic status and parental musicality.
Our data also showed sex effects. On average, men performed worse on the singing tasks than women; however, this discrepancy disappeared when controlling for childhood singing with family. We interpret this in light of broader findings of gendered participation in music (Harrison, Reference Harrison2007; Powell, Reference Powell2014). It is well documented that men in Australia face several sociocultural barriers to participation in singing (Harrison, Reference Harrison2007; Powell, Reference Powell2014). Within an Australian context, sport is tightly associated with masculinity while singing is not (Powell, Reference Powell2014). Similar social pressures have been documented in other Western societies, such as Europe (Freer, Reference Freer2015). As a corollary, because singing is seen as a stereotypically ‘feminine’ endeavour (Freer, Reference Freer2015; Hall, Reference Hall2005), women may be encouraged to engage in it, both with family members and through vocal training, with resulting benefits for their ability. Men, on the other hand, may only be encouraged to take up singing if they already show a predisposition for it (Wesseldijk et al., Reference Wesseldijk, Mosing and Ullén2019), otherwise social expectations around masculinity may discourage them from doing so. If men do not take up singing early in life this will likely limit their engagement beyond this sensitive period. Therefore, the developmental trajectory of singing ability may diverge for men and women (at least within an Australian/Western context), in part due to sociocultural factors relating to gender and singing. Our results, however, may not generalise to societies where singing is not subject to similar gendered stereotypes, and thus where we may expect familial singing to be taken up equally by men and women from childhood.
There are limitations in our study. First, our sample had more women than men. We have previously noted that Twins Research Australia, who helped facilitate recruitment, have a higher proportion of women than men in their database (Murphy et al., Reference Murphy, Lam, Cutler, Tyler, Calais-Ferreira, Li, Little, Ferreira, Craig, Scurrah and Hopper2019; Yeom et al., Reference Yeom, Tan, Haslam, Mosing, Yap, Fraser, Hildebrand, Berkovic, McPherson, Peretz and Wilson2022). In keeping with the constraints on generality noted above, the higher proportion of women is also likely to be symptomatic of a lack of men who are willing to participate in music; notably, men may have been less likely to participate in a singing study. Nevertheless, with 308 men in the sample our analyses of sex effects are not underpowered. Second, our early familial singing measure is subject to retrospective bias, and the data in this study are cross-sectional. The observed sex differences may also partially influence the accuracy of our self-report items, in that men who sang less with their family may also be less inclined to recall doing so. A prospective longitudinal study beginning in childhood would allow more precise estimates of early environmental singing effects. Moreover, longitudinal designs would allow more detailed examination of how familial environments change with age, in turn impacting how often people sing in these contexts and the relative importance of other non-familial environmental effects. Equally important, asking adult participants about their familial singing at multiple timepoints would provide data on the reliability of this measure. Third, our measure of vocal training was derived from participants who reported voice as their primary instrument but does not account for participants who may have had training on multiple instruments. In particular, participants who may have received vocal training but did not report voice as their primary instrument were not captured by our measure. Separate questions exploring training on more than one instrument would address this issue. Lastly, we cannot draw strong causal inferences from our current structural model, and there might be unmeasured variables outside of the current model, such as socioeconomic status, that may alter the interpretation of the findings.
Despite these limitations, the current study provides an initial test of the magnitude of the relationships between early and current familial singing, vocal training and objective singing performance. In particular, we show that early singing with family is an important predictor of singing ability, more powerful than current singing and vocal training. Moreover, the relationship between early singing and objective ability is largely shaped by the shared environment. Thus, our findings point to a sensitive period for the development of singing ability and may partly account for previously reported sex differences, with men who described less early familial singing showing poorer singing ability. In other words, malleable environmental factors can influence both people’s ability to sing and their propensity to do so, bringing new insights into how genetic and environmental factors may interact to develop this uniquely human and sociocultural skill. Future studies using more rigorous longitudinal designs will be crucial for confirming the existence of a sensitive period and identifying its opening and closing points.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/thg.2024.30.
Acknowledgments
We first thank the twins for their participation in the original study. We also thank Trisnasari Fraser for her assistance in recruiting the twins and processing the data, and Dr. Valerie Yap for her assistance with preparing the dataset.
Author contributions
Y. T. T., G. E. M. and S. J. W. designed the research. Y. T. T. performed the research and processed the data. D. Y., N. H. and S. J. W. analysed the data. D. Y., N. H., G. E. M., and S. J. W. wrote the article.
Financial support
This work was supported through two Australian Research Council Discovery Project Grants (DP170102479, DP200100961) and an Australian Government Research Training Program Scholarship. This research was facilitated through access to Twins Research Australia, a national resource supported by a Centre of Research Excellence Grant (ID: 1079102), from the National Health and Medical Research Council.
Competing interests
None.