1. Introduction
Variability between speakers in their language use occurs frequently (Jacewicz et al., Reference Jacewicz, Fox and Wei2010; Redi & Shattuck-Hufnagel, Reference Redi and Shattuck-Hufnagel2001; Westbury et al., Reference Westbury, Hashi and Lindstrom1998). For the sake of efficient communication, it seems beneficial to correctly map inter-speaker variation with the corresponding speakers. This mapping, which involves adjusting expectations based on a speaker’s preferences, could facilitate interpretation. Consider the term ‘probably’, for example. It can refer to different ranges of probabilities in the real world. When interpreting a sentence such as “It will probably rain tomorrow”, knowing the range of probabilities ‘probably’ conveys for the specific speaker who uttered this sentence, would help us to assess the likelihood of the event, and hence to decide whether or not to take an umbrella (see Schuster & Degen, Reference Schuster and Degen2020 for experimental results on speaker-specificity with ranges of probabilities).
Previous research showed that listeners can detect variability in language use across different speakers. It has been shown that listeners can recognize inter-speaker variability in phonetic characteristics (Creel et al., Reference Creel, Aslin and Tanenhaus2008; Eisner & McQueen, Reference Eisner and McQueen2005; Kraljic & Samuel, Reference Kraljic and Samuel2007; McLennan & Luce, Reference McLennan and Luce2005), in syntactic preferences (Kamide, Reference Kamide2012; although, see Ostrand & Ferreira, Reference Ostrand and Ferreira2019), in uncertainty expressions (Schuster & Degen, Reference Schuster and Degen2020), in disfluencies (Orena & White, Reference Orena and White2015; Yoon et al., Reference Yoon, Jin, Brown-Schmidt and Fisher2021), and in referential communication (Grodner & Sedivy, Reference Grodner, Sedivy, Pearlmutter and Gibson2011; Metzing & Brennan, Reference Metzing and Brennan2003; Pogue et al., Reference Pogue, Kurumada and Tanenhaus2016). Furthermore, previous studies report that listeners can use their knowledge about a specific speaker to apply different processing strategies and derive different meanings (Davies et al., Reference Davies, Porretta, Koleva and Klepousniotou2017; Khan et al., Reference Khan, Hartshorne and Snedeker2009; Maye et al., Reference Maye, Aslin and Tanenhaus2008; Skoruppa & Peperkamp, Reference Skoruppa and Peperkamp2011; Sobel et al., Reference Sobel, Sedivy, Buchanan and Hennessy2012). In these cases, the motivation to adapt to the language use of specific speakers is clear: such adaptation facilitates the derivation of meaning from a person’s utterance.
Speaker-specificity has been demonstrated also in studies of lexical entrainment, a phenomenon describing cases where speakers tend to align with respect to their lexical choices (Brennan & Clark, Reference Brennan and Clark1996; Brown-Schmidt, Reference Brown-Schmidt2009; Garrod & Anderson, Reference Garrod and Anderson1987; Metzing & Brennan, Reference Metzing and Brennan2003; Wilkes-Gibbs & Clark, Reference Wilkes-Gibbs and Clark1992). That is, speakers will use a certain word with one speaker but not with another, due to their previously established references in conversation. For example, participants who engaged in conversation with a confederate who used the uncommon term ‘penny loafer’ for a specific shoe (in a context with more than one shoe) tended to use the same term too, even when it was not required by context. However, when introduced to a new interlocutor/confederate, they revert to using the typical term ‘shoe’ (Brennan & Clark, Reference Brennan and Clark1996). These findings suggest that paying attention to speaker-specific language use and mapping inter-speaker variability may be used to establish common ground and facilitate meaning inference in an interactive setting. Moreover, it could assist the prediction of speakers’ utterances and thus make communication more efficient, either by reducing uncertainty as per the utterance’s intended meaning (Bangerter et al., Reference Bangerter, Mayor and Knutsen2020; Brennan & Clark, Reference Brennan and Clark1996) or by merely making specific utterances more available in memory (Barr & Keysar, Reference Barr, Keysar, Colston and Katz2005; Kronmüller & Barr, Reference Kronmüller and Barr2007; Pickering & Garrod, Reference Pickering and Garrod2004, Reference Pickering and Garrod2006).
1.1. Recognizing non-crucial speaker-specific language use
Variation in language use does not always entail critical meaning differences. For example, in discussing the publishing of a scientific work, speaker A might refer to it as ‘a paper’, while speaker B might refer to it as ‘an article’. Yet, misunderstanding is unlikely in this case. Some studies have investigated the recognition of speaker-specific language use in cases of non-crucial variation (Kroczek & Gunter, Reference Kroczek and Gunter2017; Ostrand & Ferreira, Reference Ostrand and Ferreira2019). In these studies, the researchers created inter-speaker variability at the sentence level that did not substantially alter meaningFootnote 1. Kroczek and Gunter (Reference Kroczek and Gunter2017) found that participants learned to associate a specific order with a specific speaker (for SOV versus OSV in German), whereas Ostrand and Ferreira (Reference Ostrand and Ferreira2019) did not observe any speaker-specific learning or alignment (for various syntactic structures, e.g., prepositional datives versus double object).
Some critical differences between the studies could account for their opposing results. First, the experimental tasks differed substantially. Kroczek and Gunter used comprehension questions throughout the experiment, which required participants to concentrate on the thematic relations within the sentence (i.e., by asking who is the object of the sentence, e.g., “Who was seen?” for the sentences in Footnote 1). These questions were likely to draw attention to the sentence structure and thus highlight the differences between the tested structures, resulting in better identification of the inter-speaker variation. Ostrand and Ferreira used a more implicit manipulation – an interactive picture-matching task, in which participants were required to choose the correct image based on their interlocutor’s description. Because on each trial there was only one image that could have been described by each description, variation in structure was not linked to semantic/contextual variance, and therefore the sentence structure was not likely to influence success.
Second, Kroczek and Gunter contrasted a high-frequency word order (SOV) with a low-frequency word order (OSV), while Ostrand and Ferreira used structures with smaller differences in frequency. Low-frequency structures are probably less expected to be produced because speakers are sensitive to the overall frequencies of linguistic material in their language (e.g., Bod et al., Reference Bod, Hay and Jannedy2003; Bybee & Hopper, Reference Bybee and Hopper2001; Ellis, Reference Ellis2002; Garnsey et al., Reference Garnsey, Pearlmutter, Myers and Lotocky1997; Lieven, Reference Lieven2010; MacDonald, Reference MacDonald1994). Unexpected structures, being more salient (Rácz, Reference Rácz2013), may draw more attention to inter-speaker variations. This, in turn, can result in better discrimination between speakers who use them and more conventional speakers.
1.2. Factors influencing the recognition of non-crucial speaker-specific language use
As discussed above, it is possible that people recognize and map inter-speaker variation even in cases where such variation is non-meaning-crucial. We have already mentioned facilitation in communication and linguistic unexpectedness as possible explanations for successful speaker-specific recognition in these cases. Below, we discuss these in more detail.
One possibility for mapping non-crucial inter-speaker variation in language could be the facilitation of communicational efficiency (Barr, Reference Barr2004; Metzing & Brennan, Reference Metzing and Brennan2003). It could either reduce uncertainty concerning the upcoming utterance when listening to a known speaker or increase a specific utterance’s salience in memory, thereby making it more accessible. Both views come from the literature on lexical entrainment. Although the current study concerns the phrasal level (see below), we discuss these ideas (coming from the lexical level) for their relevant aspects of speaker-specificity.
According to one view, it has been suggested that during collaborative tasks, interlocutors interactively create shared linguistic representations, which are sometimes referred to as conceptual pacts (Brennan & Clark, Reference Brennan and Clark1996). These pacts facilitate reference resolution because they reduce uncertainty in predicting an interlocutor’s utterance. This, in turn, can reduce cognitive load and thus be more efficient from the listener’s perspective. Moreover, these pacts have been claimed to be speaker-specific, such that using a new, pact-breaking, utterance will be more felicitous with a ‘new’ interlocutor than with an ‘old’ one with whom the pact is established (Metzing & Brennan, Reference Metzing and Brennan2003). This could mean that specific terms are actively stored in memory alongside the person who uttered them.
A different view poses that the alignment process is driven by automatic memory processes. Under this view, interlocutors tend to adopt each other’s terminology because it is more available in memory and is, therefore, more readily accessed (e.g., Barr & Keysar, Reference Barr, Keysar, Colston and Katz2005; Holtgraves & Barr, Reference Holtgraves, Barr and Holtgraves2014; Keysar et al., Reference Keysar, Barr and Horton1998; Kronmüller & Barr, Reference Kronmüller and Barr2007), making the retrieval process more efficient. According to this view, learning speaker-specific language use is not conversationally motivated. This view further suggests that successful collaboration does not necessarily involve explicitly storing speaker-specific language use, because interlocutors adhere to an egocentric strategy that is based on their memory demands alone (Keysar et al., Reference Keysar, Barr and Horton1998). However, since speaker-specific effects have been shown to appear in later stages of processing (e.g., Kronmüller & Barr, Reference Kronmüller and Barr2007; Kronmüller & Guerra, Reference Kronmüller and Guerra2020), it is possible that speaker-specific language use is somehow stored in these cases, as a by-product of the interaction.
Another possible explanation of successful speaker-specific recognition comes from Surprisal theory (Hale, Reference Hale2001; Levy, Reference Levy2008). This theory maintains that listeners use their linguistic experience to derive probabilities of encountering certain utterances and form expectations about the future linguistic input they will come across. That is, the more improbable (surprising) it is for a word to appear in a given context, the higher the cognitive load associated with accessing this word will be. Moreover, surprising events have been shown to be more memorable (Adler, Reference Adler2008; Upala et al., Reference Upala, Gonce, Tweney and Slone2007). Indeed, several studies have linked surprisal to memory and learning (Foster & Keane, Reference Foster and Keane2019; Futrell et al., Reference Futrell, Gibson and Levy2020; Munnich & Ranney, Reference Munnich and Ranney2019). Among these, learning speaker-specific language use has also been studied in the context of surprisal (Lai et al., Reference Lai, Rácz and Roberts2020). In that study, inter-speaker variation was manipulated using artificial alien languages. The researchers exposed participants to differential linguistic (morphological) input and showed that novel, and therefore unexpected, linguistic material at a later phase led to a better association of a morpheme with a specific social (alien) group. In that case, inter-speaker variation was not crucial for interpretation, but rather – as the authors suggested – learning was driven by the violation of expectations regarding the linguistic input. This process could also be linked to communicative efficiency because using words that are more predictable in a specific context facilitates processing (Futrell et al., Reference Futrell, Gibson and Levy2020). Therefore, mapping inter-speaker variability and storing speaker-specific language use could increase communication efficiency, by means of reducing unexpectedness or surprisal.
1.3. The current study
In the current study, we ask whether listeners can differentiate between speakers based on their stylistic preferences, where no apparent critical difference in meaning is presented in the linguistic choices of the speakers. We further ask whether this mapping of non-crucial speaker-specific language use is mediated by unexpectedness. To examine this, we selected the preferences of adjective ordering in Hebrew. Adjective ordering preferences are considered robust, and according to some studies, their variation may be meaningful (for a detailed discussion see Scontras, Reference Scontras2023). Below we discuss these aspects in relation to Hebrew.
As mentioned above, adjective ordering preferences are considered robust in many languages (Cinque, Reference Cinque, Cinque, Koster, Pollock, Rizzi and Zanuttini1994; Danks & Glucksberg, Reference Danks and Glucksberg1971; Dixon, Reference Dixon1982; Laenzlinger, Reference Laenzlinger2005; Martin, Reference Martin1969a, Reference Martin1969b; Martin & Molfese, Reference Martin and Molfese1972; Scontras et al., Reference Scontras, Degen and Goodman2017, Reference Scontras, Degen and Goodman2019; Scott, Reference Scott2002; Svenonius, Reference Svenonius, McNally and Kennedy2008; Whorf, Reference Whorf1945), with some cross-linguistic variance (Cinque, Reference Cinque, Cinque, Koster, Pollock, Rizzi and Zanuttini1994; Sproat & Shih, Reference Sproat, Shih, Georgopoulos and Ishihara1991). However, a recent study (Trainin & Shetreet, Reference Trainin and Shetreet2021) suggests that adjective ordering preferences in Hebrew (a post-nominal language where adjectives appear after the noun they modify) are not as robust as has been previously claimed (Shlonsky, Reference Shlonsky2004; for similar results in Spanish, another post-nominal language, see Rosales & Scontras, Reference Rosales and Scontras2019). In production, naturalness rating, and forced-choice tasks, ordering preferences in Hebrew were shown to be significantly weaker than in English. For example, in three-adjective strings of Size, Color, and Pattern semantic classes, only two different orders were produced in English, but five different orders were produced in Hebrew, none of which produced more than 30% of the time. Furthermore, the differences in naturalness between the least and the most natural orders in English (~4.5 versus ~6.5 respectively) were significantly larger than in Hebrew (~4.5 versus ~5.5). These results may suggest that adjective ordering in Hebrew is more susceptible to inter-speaker variation, especially with three-adjective strings. Furthermore, it is plausible to assume that Hebrew speakers would have less-constraining predictions towards adjective ordering because there are more viable options. Therefore, the weak adjective ordering preferences in Hebrew could allow us to examine stylistic non-crucial meaning preferences.
Some research on English suggests that adjective ordering variation is linked to meaning inference (Danks & Glucksberg, Reference Danks and Glucksberg1971; Danks & Schwenk, Reference Danks and Schwenk1972, Reference Danks and Schwenk1974; Scontras, Reference Scontras2023; Whorf, Reference Whorf1945). Specifically, it has been shown that deviation from the common adjective order in English, by placing a certain adjective earlier in the sequence than it usually is, could implicate that some feature of the object is especially worth emphasizing (Martin, Reference Martin1970; Martin & Ferb, Reference Martin and Ferb1973). This is mainly shown in contrastive contexts. For example, in a context with two images of large pencils, differing only in color, it is more felicitous to position the color adjective before the size adjective (i.e., say “the yellow large pencil”, instead of the preferred order of “the large yellow pencil”) than in a non-contrastive context. Because Hebrew does not have a strong preference for a certain order, the displacement may not signal such meaning variation. Furthermore, to avoid such effects in the current study, we did not use contrastive contexts.
Our stimuli were composed of nouns modified by three-adjective strings, which allowed several permutations of orders (unlike two-adjective strings which allow only two permutations). Furthermore, the preferences for three-adjective strings in Hebrew were significantly weaker than for two adjectives (Trainin & Shetreet, Reference Trainin and Shetreet2021), enabling us to treat these preferences as stylistic. We selected adjectives from three semantic classes – Size, Color, and Pattern – for two main reasons. First, they are easy to depict visually, and therefore could be used for our visual task. Second, we had converging evidence from three tasks (Trainin & Shetreet, Reference Trainin and Shetreet2021) for the hierarchy of ordering preferences when using adjectives from these classes, which allowed us to examine multiple contrasts of ordering preferences. This hierarchy is given in (1). Note that we abbreviate the orders, such that S stands for size, C for color, and P for pattern, and they are ordered in the linear order of Hebrew. For example, the order Noun-Size-Color-Pattern is abbreviated as SCP.
-
(1) Hierarchy of adjective order preferences, based on Trainin and Shetreet (Reference Trainin and Shetreet2021):
To test the effects of expectedness, we focus on four different orders, placed at the ends of the hierarchy in (1): SCP and CSP at one end and PSC and SPC, at the other. Trainin and Shetreet (Reference Trainin and Shetreet2021) showed that the SCP and CSP orders were produced more often than the SPC and PSC orders (~30% of the times per order versus ~0–2% per order, respectively), and were also considered slightly more natural (~5.5 versus ~4.5 on a scale of 1–7, respectively). Therefore, the SCP and CSP orders will be considered here as more natural and more common than the SPC and PSC orders. As such, they should also be more expected. In other words, the SPC and PSC orders should have higher surprisal values, because listeners should not expect the adjectives to appear in these orders.
Following Ostrand and Ferreira (Reference Ostrand and Ferreira2019), which used a task that did not draw attention to the language variation, we ensured that language use variation was not crucial for the task, by asking participants to verify whether a description in various adjective orders matched a picture with a single object (see Fig. 1). Because the visual context did not include any competitor which could drive a change in the adjective order, and because we did not ask participants about the manner of descriptions, it is unlikely that this task would make adjective ordering particularly salient. In our study, we used this verification task in an implicit exposure phase and then used an explicit recognition test in a test phase to examine whether participants mapped inter-speaker variability. In the exposure phase, participants heard two different speakers, each producing a different adjective order consistently. We created three conditions contrasting speakers using the two most natural and common orders, the two most unnatural and uncommon orders, and one common order and one uncommon order. In the test phase, the participants were asked to recognize which of the orders was produced by which speaker.
We hypothesize that if listeners are sensitive to inter-speaker variability in stylistic non-crucial language use, they should be able to correctly associate a specific adjective order with a specific speaker. If this is the case, participants should be able to map orders to speakers on all conditions. If naturalness or unexpectedness plays a role in successful speaker-specific learning, we may expect participants to have better order–speaker mapping in conditions where one of the speakers produces an unnatural order than in conditions with only natural orders. It is also possible that speakers detect inter-speaker variability only when it is associated with a crucial meaning variation, or needed for efficiency due to task demands. If so, participants should not be able to associate orders with speakers, under all conditions.
2. Experiment 1
To examine speaker-specific learning of non-crucial variation, we compared the ability of individuals to associate a speaker with his/her preferred adjective order under three conditions. To further test the effects of unexpectedness, each condition included two speakers, varying on the naturalness of the orders they produced: (1) the two most natural adjective orders in Hebrew (High frequency – High frequency (HH); see Table 1); (2) one of the natural orders to the most unnatural one (High frequency – Low frequency (HL)); and (3) the two most unnatural orders in Hebrew (Low frequency – Low frequency (LL)). Although no substantial difference in naturalness is introduced in both HH and LL conditions, the LL condition includes a general violation of listeners’ expectations as they are extensively exposed to unnatural orders. Therefore, this condition allowed us to examine – provided that an effect of frequency/naturalness does exist – whether it stems from differences in naturalness between speakers or from (at least) one speaker’s deviance from common language use (and not relatively to the other speaker(s) in the context).
2.1. Methods
2.1.1. Participants
Sixty native Hebrew speakers (18–32 years old, M = 24.91, SD = 3.22; 63% (38) females), with no language, cognitive, or social impairments, were recruited via social media and were randomly assigned to one of the experimental conditions, with 20 participants in each condition. All participants gave informed consent to take part in the study. The study was approved by Tel Aviv University’s ethics committee.
2.1.2. Materials and procedure
(a) Exposure phase. Ninety-six images of objects, comprised of four possible shapes (square/triangle/circle/star) in different sizes (large/small), in different colors (blue/green/red), and with different patterns (dots/checkers) were used (Fig. 1). We used the same adjective words as in Trainin and Shetreet (Reference Trainin and Shetreet2021), for better confidence as per the differences in naturalness between the descriptions in the different adjective orders. Each image was paired with an auditory description including the three features: size, color, and pattern. Half of the descriptions matched the visual image (e.g., ‘a big blue checkered square’ for Fig. 1), and half did not (because one feature was wrong, e.g., ‘a big red checkered square’ for Fig. 1).
To make the distinction between the two speakers clear, one of the speakers was a female, and the other one was a male. Thus, half of the descriptions were spoken by a female speaker, presented to the participants by a conventional female name, Naama. The other half was spoken by a male speaker, presented to participants by a conventional male name, Yoav. Each speaker consistently produced the same adjective order throughout the experiment (SCP, CSP, SPC, or PSC, see Table 1), with Speaker A always using a different order than Speaker B.
On each trial, participants heard one of the speakers describing the image and were required to decide whether the description matched the image, by pressing F for correct descriptions or K for incorrect descriptions. For example, for Fig. 1, a matched description, where all the visual features of the image were included and correct in the SCP order, would be square big blue checkered (according to the post-nominal ordering in Hebrew), and a mismatched description, with a discrepancy in one of the visual features in the same order, would be square big blue *dotted. The task was aimed to engage the participants in listening to the speakers’ utterances, without explicitly addressing the adjective ordering preferences. To ensure that participants listen to the end of each description, most incorrect descriptions were related to the feature presented by the last adjective in the string. There were four exposure blocks, two for each speaker. Each block consisted of 24 trials of image–audio pairs, all belonging to the same speaker. Blocks were interleaved such that a block presenting Speaker A was followed by a block presenting Speaker B, and so on. The order of blocks was counterbalanced for the identity of the first speaker (Female/Male) and the first order. Participants were instructed to listen to the audio clips and indicate their response using one of the keys (F/K; Fig. 1) as soon as the auditory description ended. No information was explicitly given about the speakers.
(b) Test phase. In this phase, participants were presented with written descriptions, without a picture (Fig. 2), such that the descriptions were neither correct nor incorrect, because they were grammatical and did not effectively describe any image. The descriptions were similar to those in the exposure phase, and included either the adjective order used by Speaker A or the one used by Speaker B. Participants were asked to decide which speaker had produced the written description, by pressing F for one speaker, and K for the other speaker (counterbalanced for side across participants). We did not explicitly mention that there is an inter-speaker variation, but simply instructed participants to recognize which speaker used each description. Overall, there were 24 test trials, 12 in each order from the exposure phase.
Participants were randomly assigned to one of the three conditions presented in Table 1. In each condition, the identity of the first speaker (Female/Male) and the assignment of a speaker to order was counterbalanced across participants, resulting in 4 lists per condition, with 5 participants in each list (for a total of 20 participants in each condition). The order of presentation in each exposure block and in the test phase was fully randomized for each participant. The experiment was carried out by the ‘Pavlovia’ (https://pavlovia.org) platform, using version 3.2 of PsychoPy (Peirce et al., Reference Peirce, Gray, Simpson, MacAskill, Höchenberger, Sogo, Kastman and Lindeløv2019), and took approximately 15 minutes to complete.
2.2. Results and discussion
We used a binomial test for each participant separately to determine whether participants were able to learn which speaker produced which order. Participants whose accuracy rate in the test phase was significantly above chance level (at least 18 correct responses out of 24 trials) were classified as ‘Learners’, and the rest were classified as ‘Non-learners’. In all conditions, some participants were successful in learning speaker-specific preferences. However, the rates of successful learning were different across the conditions, with only 2 successful learners out of 20 in the HH condition contrasting the two natural orders (Fig. 3), 11/20 successful learners in the HL condition contrasting one of the two natural orders with the unnatural order, and 9/20 successful learners in the LL condition contrasting the two unnatural orders.
To verify that the learning rates significantly differed between the conditions, we conducted an equality of proportions test, using the ‘stats’ package (Version 4.2.1; R core team, 2022), on the proportions of successful learners in each conditionFootnote 2. This test revealed that both the HL condition and the LL condition yielded more successful learning than the HH condition (χ2(1) = 7.29, p = 0.007 and χ 2(1) = 4.51, p = 0.034, respectively; Fig. 3). The difference between the HL and the LL conditions was not significant (χ 2(1) = 0.1, p = 0.75).
The comparison between the HH and LL conditions also allowed us to examine the effect of similarity in form (see Table 2). Both orders in each condition show some similarity in form, as they share the same adjective semantic class in the final position (SCP/CSP in HH and SPC/PSC in the LL condition; Table 1). Thus, the higher rates of learners in the LL condition compared with the HH also suggest that similarities in form do not by themselves hinder the ability to associate orders and speakers. Indeed, the LL condition yielded a significantly higher rate of learners. Because the orders were similar to the same extent, it appears that the reduced rates of learners in the HH condition were not due to similarity in form.
These results clearly show that learning speaker-specific ‘style’ is possible, albeit not in all cases. The most obvious difference between the conditions inducing higher learning rates and the one inducing lower learning rates was the inclusion of an unnatural order, regardless of how less natural this order is compared with the other order in the context.
Yet, we would like to exercise caution concerning this conclusion because of several important points. First, our sample size was small (N = 60, with 20 participants per condition). This sample size allowed us to observe large effects, but small-medium effects may have been missed (e.g., the difference between HL and LL). Second, we measured the rates of successful learning using a binomial test for each participant, according to which participants were classified as ‘Learners’ or ‘Non-learners’. Counting the number of learners based on a binomial test is dichotomizing a continuous variable. It could very well be the case that participants who were classified as ‘Non-learners’ did, in fact, detect the inter-speaker variability to some extent, but were ‘punished’ by the chance-level barrier. Finally, our test phase included only the two orders that were included in the exposure phase. Note that the task in the test phase did draw some attention to the speaker’s language use, as we asked participants to choose which description the speaker used. If so, including only the two orders used might have allowed participants to develop a strategy during the test phase, by guessing a specific order–speaker association and then consistently adhering to it. That is, participants could have succeeded in the test phase without having learned the correct association during the exposure phase. This might also be the factor that led to a few participants showing an opposite pattern in learning, performing significantly below chance level in the test phase of Experiment 1 (2 in the HH condition, 3 in the HL condition, and 1 in the LL condition). Importantly, the difficulty to distinguish between the orders (as may be determined by their naturalness) was also likely to modulate the stability of guessing. In other words, the more distinct these orders are from each other, the easier it is to remember and adhere to the guess, such that we expect more successful guessing rates in the HL condition (and possibly in the LL condition) than in the HH condition. To overcome these shortcomings, we conducted a second experiment.
3. Experiment 2
Experiment 2 aimed to replicate the findings from Experiment 1, using better methodological and statistical tools to examine the learning of speaker-specific non-crucial language use with adjective orders that deviate from common use. We included several modifications: (a) higher statistical power, with a much larger sample size estimated by an a-priori power analysis; (b) a more sensitive statistical analysis (see below); and (c) an addition of filler stimuli in the test phase which consisted of descriptions in an order not included in the exposure phase, to prevent response strategy.
In our second, well-powered experiment, we used a different approach to analyze the data – d’ analysis inspired by the Signal Detection Theory (Swets et al., Reference Swets, Tanner and Birdsall1961). In this analysis, a d’ measure is calculated for each participant. This measure provides an indication of how well a signal was detected by each participant because it is calculated by subtracting the proportion (Z-score) of incorrect positive responses (False Alarms) from the proportion of correct positive responses (Hits). In our case, this signal was the order–speaker association. d’ measures allow for both the overall detection of learning within each condition and the comparison between conditions. d’s that are significantly different from 0 essentially mean that the signal was detected to some degree. Furthermore, differences between conditions in d’ could indicate that the ability to detect order–speaker association is different under different conditions.
The addition of filler trials allowed us to better mask the experimental manipulation. In these trials, participants were presented with additional adjective orders that were not included in the exposure phase. This addition could have prevented the guessing strategy discussed above in two ways. First, participants might attribute different reasoning to the task in the first place, namely, that they needed to recognize which orders they heard before, and not the order–speaker association. Second, participants might fail in using the guessing strategy without a-priori learning during the exposure phase, because the filler trials could interfere, in trying to assign 3 orders to only 2 speakers, and potentially lead to a breakdown.
3.1. Method
3.1.1. Participants
One-hundred and ninety-two native Hebrew speakers (18–45 years old, M = 26.19, SD = 4.71; 63% (121) females), with no language, cognitive, or social impairments, were recruited via social media and were randomly assigned to one of the experimental conditions, with 64 participants in each condition. To determine the desired sample size for this experiment, we coerced d’ analysis on our data from Experiment 1. This analysis in itself is unreliable because the responses in Experiment 1’s test phase do not correspond to the type of responses in signal detection experiments. Nevertheless, we arbitrarily defined description assignment to the female speaker as ‘Yes’, and to the male speaker as ‘No’. Accordingly, when participants assigned the correct order to the female speaker, it was considered a Hit (because it was a positive response to a ‘true’ state of ‘Yes’), and when they assigned the order used by the male speaker to the female speaker, it was considered a False Alarm (a positive response for a ‘true’ state of ‘No’). We then calculated the d’ measure using the ‘psycho’ library (Makowski, Reference Makowski2018) in the R statistical software (Version 4.2.1; R core team, 2022), compared the d’ between all conditions using independent two-sample t-tests (using ‘stats’ package), and obtained the effect sizes, using the ‘effectsize’ (Version 0.7.0.5; Ben-Shachar et al., Reference Ben-Shachar, Lüdecke and Makowski2020) package. Based on the calculated effect sizes, we estimated the desired sample size with an a-priori analysis using the G*Power software (Version 3.1.9.4; Faul et al., Reference Faul, Erdfelder, Lang and Buchner2007). We aimed for a sample size that will allow us to detect the smallest effect of interest, which was the difference between the HL and the LL conditions (t(37.97) = 1.43; Cohen’s d = 0.5; Cohen, Reference Cohen1988). This analysis revealed that to obtain 80% power, 64 participants should be included in each group. All the participants gave informed consent to participate in the study. The study was approved by Tel Aviv University’s ethics committee.
3.1.2. Materials and procedure
The exposure phase was identical to the one used in Experiment 1. The test phase was also similar, but with minor modifications, to adapt it to the d’ analysis. Instead of asking participants to assign the presented description on each trial to one of the speakers (Fig. 2), we asked, “Did X say Y?” (see Fig. 4), where X was the name of one of the speakers (Naama or Yoav) and Y was one of the orders used in the exposure phase or a third order used for fillers. Participants were required to answer ‘Yes’ (the F key) or ‘No’ (the K Key). Furthermore, in addition to the 24 trials in which the description was in one of the orders from the exposure phase, 6 additional filler trials included a third order which was not included in the exposure phase. This order was different for each condition and was selected such that it will be different in form as much as possible from the included orders. For example, for the HH condition, the third order was Noun-Pattern-Color-Size, where the pattern adjectives appear first, as opposed to the included orders (see Table 1), in which the pattern adjectives appeared last. These trials were included to prevent participants from developing a guessing strategy, because rejecting an association between one of the orders and one of the speakers does not entail accepting it for the other speaker. Overall, each participant completed 30 trials in the test phase.
3.2. Results and discussion
We have conducted two types of analysis with our d’ measures. First, to assess the overall success in order–speaker association, we conducted one-sample t-tests in which we compared the d’ in each condition to 0. Second, to assess the different learning rates across conditions, we performed a series of two-sample unpaired t-tests, comparing the d’s of the different conditions, HH, HL, and LL. The d’ for each participant was calculated using the experimental items only (i.e., excluding fillers).
One-sample t-tests revealed that d’s in the HL (M = 0.96, SD = 1.53; t(63) = 5.00, p < 0.001) and in the LL (M = 0.31, SD = 1.12; t(63) = 2.19, p = 0.03), but not in the HH condition (M = −0.07, SD = 0.85; t(63) = −0.63, p = 0.53), were significantly larger than 0. This suggests that in the HL and LL conditions, participants could – to some degree – associate a speaker with their preferred orders correctly, but in the HH condition, they could not reliably do so. Pairwise comparisons revealed that the differences in d’ between all conditions were significant (see Fig. 5), with d’s in the HL being higher than in the HH condition (t(98.27) = 4.68, p < 0.001) and in the LL condition (t(115.92) = 2.73, p = 0.007), and higher in the LL condition than in the HH condition (t(116.83) = 2.13, p = 0.036). This suggests that speaker-specific learning is facilitated both by differences in naturalness (or form; see General Discussion) and by the mere existence of deviation from common language use in the linguistic environment. Learning was virtually absent when both conditions were relatively and comparably natural (HH), slightly better when both orders were comparably and relatively unnatural (LL), and considerably better when one order was natural and one was not (HL).
To be able to compare the results of Experiments 1 and 2, we further conducted the same analysis as in Experiment 1, counting the number of ‘Learners’ based on passing the threshold of chance-level performance as calculated by a binomial test (see Fig. 6). We then conducted a Chi-square test of equality of proportions and found that the number of learners significantly differed between all conditions, such that the proportion of ‘Learners’ in the HL condition (23/64) was significantly higher than the proportion of learners in the HH condition (3/64; χ2 (1) = 17.42, p < 0.001; Fig. 6) and the proportion of learners in the LL condition (11/64; χ2 (1) = 4.85; p = 0.026). Furthermore, the proportion of learners in the LL condition was significantly higher than in the HH condition (χ2 (1) = 3.93, p = 0.047).
3.3. An exploratory investigation of speaker-independent learning
The inclusion of fillers in this experiment also allowed us to investigate speaker-independent learning. If participants could correctly reject the association of filler orders to both speakers, it would mean that they generalized the overall linguistic environment properly. Thus, to examine speaker-independent learning, we calculated the d’ measures for the filler trials (consisting of an order which was not used by any of the speakers) alone.
Similarly to what was done to investigate speaker-specific learning, we conducted a one-sample t-test for the fillers in each condition, compared to 0. For the filler trials, d’s in all the conditions were significantly larger than 0, as determined by a one-sample t-test for each condition separately (HH: M = 2.15, SD = 0.79, t(63) = 21.6, p < 0.001; HL: M = 1.99, SD = 0.69, t(63) = 22.96, p < 0.001; and LL: M = 2.21, SD = 0.82; t(63) = 21.62, p < 0.001; see Fig. 7). This result indicates that participants correctly recognized that the order which was not included in the exposure phase was indeed not uttered by any of the speakers. This further suggests that participants properly recognized the overall linguistic environment in all conditions.
Taken together, these results suggest that learning speaker-specific adjective ordering preferences is – as shown in Experiment 1 – extremely hard when both speakers produce natural orders. In the HH condition, d’ was not significantly different from 0, and significantly smaller than in the HL and the LL conditions. In the other two conditions, d’ was significantly larger than 0, indicating that at the group level, these conditions yield learning, to some extent. However, the d’ in the HL condition was also significantly larger than in the LL condition, suggesting that learning is facilitated by increasing differences between the orders that listeners are exposed to (in form and in naturalness, as we discuss in the General Discussion).
Another point worth discussing is the substantial difference in learning rates between Experiment 1 and Experiment 2. One possible explanation is in the task. In Experiment 2 participants were required to answer ‘Yes’ or ‘No’ in the test phase and not to assign a description to a speaker in a forced-choice task like in Experiment 1. This means that in Experiment 2, participants might have been less aware of the manipulation, possibly thinking they were asked to recognize which orders they had heard overall, and not to recognize the inter-speaker variation. This was not the case in Experiment 1, where each description had to be assigned to a speaker, making the task more explicit and the manipulation more salient. Moreover, the contrast between the two orders in Experiment 1 may have deemed the differences in orders even more salient (for the role of contrast in task success, see Shetreet & Novogrodsky, Reference Shetreet and Novogrodsky2019). Another explanation concerns the inclusion of a third order in the test phase, which was not included in the exposure phase. This perhaps did in fact prevent participants from developing a successful guessing strategy. In Experiment 1, all the descriptions in the test phase consisted of the orders used by the speakers from the exposure phase. Thus, to be defined as a successful learner, it was possible to notice the manipulation in the test phase, pick a guess, and then be consistent. In Experiment 2, the filler trials could have confused participants who would have gone down that road, because trying to assign 3 orders to 2 speakers with no memory of the order–speaker association might increase uncertainty and make the guessing behavior less successful.
4. General discussion
In this study, we examined, in two experiments, the ability to learn speaker-specific stylistic non-meaning-crucial language use, and the role of unexpectedness in such learning. We used the weak adjective ordering preferences in Hebrew (Trainin & Shetreet, Reference Trainin and Shetreet2021), and compared speaker-specific learning rates across three adjective order pairs: (1) two natural orders in Hebrew (HH); (2) one natural order and one unnatural order (HL); and (3) two unnatural orders (LL). Our results show successful learning (in all combinations in Experiment 1 and in conditions 2 and 3 in Experiment 2). Notably, lower rates were observed when the two natural adjective orders were used (the HH condition). This indicates that listeners can learn speaker-specific language use, even when such use does not alter the interpretation of utterances, and furthermore, that this learning is modulated by unexpectedness.
One could argue that the learning observed in the current study is not simply ‘stylistic’ because it is possible that – in real-life referential communication – differences in adjective ordering entail different meanings (Danks & Glucksberg, Reference Danks and Glucksberg1971; Danks & Schwenk, Reference Danks and Schwenk1972, Reference Danks and Schwenk1974; Scontras, Reference Scontras2023; Whorf, Reference Whorf1945). As discussed in the introduction section, this is unlikely in the context of the current study. First, Hebrew has weak adjective ordering preferences, such that using a certain adjective order to emphasize a certain property of an object (Martin, Reference Martin1970; Martin & Ferb, Reference Martin and Ferb1973) may not be perceived as deviant. Second, our study used single (non-contrastive) images with banal and recurring sizes, colors, and patterns. Under these circumstances, participants were not expected to anticipate any meaning modification as is possibly conveyed by different adjective orders. Furthermore, if the difference in meaning was the driving force behind learning, we would expect it to operate under all conditions. Taken together, these suggest the difference between conditions could not be solely attributed to meaning differences.
The speaker-specific mapping of non-crucial language use has been demonstrated before, at the lexical level (e.g., Brennan & Clark, Reference Brennan and Clark1996; Metzing & Brennan, Reference Metzing and Brennan2003) and the sentence level (Kroczek & Gunter, Reference Kroczek and Gunter2017). In our study, both speakers used the same words (in different orders) throughout the experiment. Therefore, distinguishing between their ‘styles’ requires going beyond the lexical level. Because adjective ordering does not appear to be subject to structural constraints in Hebrew (Trainin & Shetreet, Reference Trainin and Shetreet2021), this learning appears to occur within the phrase level in this study. Importantly, we show that such learning (that extends beyond the lexical level) occurs even without drawing explicit attention to the form of the utterance (cf. Kroczek & Gunter, Reference Kroczek and Gunter2017).
4.1. Differential learning rates
One critical finding of our study is that learning rates of speaker-specific ‘stylistic’ language use were not similar in all cases. Specifically, learning rates were significantly lower when listeners were exposed to two speakers who uttered the most natural orders (the HH condition). Why would this be the case? We consider two explanatory factors: similarities in form and naturalness. The reader should note that these two factors are intertwined in our data because of the limited permutations of adjective ordering using three specific semantic classes. Unfortunately, data from Trainin and Shetreet (Reference Trainin and Shetreet2021) on the naturalness of adjective ordering does not offer a possible comparison that could completely tease these factors apartFootnote 3.
We first note that we cannot completely rule out the possibility that similarities in form play a role in the ability to learn speaker-specific adjective ordering preferences. Differential learning rates were observed for the condition with the two orders of different degrees of naturalness (HL) and the condition with two unnatural orders (LL) in Experiment 2, with the former showing higher learning rates. These two orders differ in both form and naturalness, such that both factors can explain the differences in learning rates (see Footnote 3).
Yet, our findings suggest that similarities in form could not be the only explanation for the difference in learning rates. Notably, similarities in form in the semantic class in the last position occurred in two tested conditions: the HH condition comparing two natural orders (SCP & CSP) and the LL condition comparing two unnatural orders (PSC & SPC; see Table 2). Despite these similarities, these two conditions yielded differential learning rates with the condition comparing two unnatural orders showing higher learning rates than the condition with two natural orders. Thus, at the very least, our findings suggest that the presence of at least one speaker who deviates from common language use facilitates speaker-specific learning (present in both the HL and the LL conditions).
Therefore, it is likely that learning speaker-specific ‘style’ is facilitated when the style is unexpected or perceived as peculiar. Unexpected and peculiar speakers may draw more attention to themselves because the language use of such speakers might be considered somewhat infelicitous (e.g., violating the Grice Maxim of manner; Grice, Reference Grice, Cole and Morgan1975). Listeners may try to understand why the speaker communicated in such an infelicitous manner, to better understand the communicative situation, and to better predict his/her linguistic behavior in future exchanges.
The differential learning rates observed across conditions are compatible with the results of studies of non-crucial speaker-specific variation at the sentence level, despite their contrasting results. Kroczek and Gunter (Reference Kroczek and Gunter2017) used the common SOV order and the uncommon OSV order and showed evidence that listeners could learn speaker-specific preferences. Similarly to that study, we observed speaker-specific learning mainly in conditions that included an uncommon and unnatural adjective order. Ostrand and Ferreira (Reference Ostrand and Ferreira2019), on the other hand, contrasted natural structures in English (e.g., double object versus prepositional datives) and did not observe speaker-specific learning. Indeed, the condition in our study that included two natural and common adjective orders yielded virtually no learning. This suggests that the difference between the findings of these two studies could be related to the naturalness (or in other words, the expectedness status) of the linguistic material in the exposure phase.
4.2. Motivations for learning non-crucial speaker-specific language use
It is suggested that learning speaker-specific language use is motivated by the need for efficient communication. We assume that this is the case also for learning speaker-specific stylistic preferences, as tested in the current study. Because adjective ordering preferences in Hebrew are relatively weak, it is likely that Hebrew speakers do not have clear a-priori expectations towards them. Identifying speaker-specific preferences and detecting a consistent adjective order per speaker could speed up the processing of the referential phrase and therefore aid object identification. For example, learning that a specific speaker always produces the SCP order would allow the prediction of the class of adjective that appears at a given point in the phrase, thereby facilitating the processing of the phrase (i.e., expect a size adjective in the first position, and so on).
It is important to note that the task we used did not demand communicative efficiency, because it was not interactive, and also because there was only one image on each trial, and therefore no visual search was needed for object identification. This could mean that learning speaker-specific language use is not necessarily only used for enhancing efficiency within task, but could rather be related to a more general mechanism, by which learning speaker-specific language use might reduce uncertainty in future interactions.
A second motivation that we offered for learning speaker-specific language use comes from Surprisal theory (Hale, Reference Hale2001; Levy, Reference Levy2008). According to this theory, processing difficulty depends on the probability of a certain linguistic input appearing in a specific context. Specifically, lower surprisal (i.e., higher probability) is linked to easier processing. Surprisal was found to be relevant for learning speaker-specific language use, where a better association of certain speakers to novel linguistic materials was observed when the stimuli had higher surprisal values (Lai et al., Reference Lai, Rácz and Roberts2020). The unnatural orders we used here were virtually never produced (0 ~ 2% of the time, Trainin & Shetreet, Reference Trainin and Shetreet2021), given the same visual stimuli as used in this study. Thus, given this visual context, descriptions using these adjective orders were highly unexpected, and therefore had high surprisal values. These high surprisal values were present in the HL and LL conditions, which exhibited higher rates of learning, in line with the findings of Lai et al. (Reference Lai, Rácz and Roberts2020). In other words, conditions including at least one unexpected order led to better learning than the condition including only expected orders. Thus, our findings suggest that surprisal-driven learning can occur when speakers differ in stylistic non-crucial language use.
4.3. Implications for speaker-specificity in lexical entrainment
Previous research has shown indications that listeners can track, implicitly (Pickering & Garrod, Reference Pickering and Garrod2004) or explicitly (Brennan & Clark, Reference Brennan and Clark1996; Hanna et al., Reference Hanna, Tanenhaus and Trueswell2003), speaker-specific stylistic preferences at the lexical level. As observed in studies on lexical entrainment (Brennan & Clark, Reference Brennan and Clark1996; Clark & Wilkes-Gibbs, Reference Clark and Wilkes-Gibbs1986; Metzing & Brennan, Reference Metzing and Brennan2003; Wilkes-Gibbs & Clark, Reference Wilkes-Gibbs and Clark1992), interlocutors adopt the same terminology for entities in the conversation, thus establishing common ground for communication. Our study examined the learning process itself, allowing us to examine whether inter-speaker variation is actively stored, or whether this information is transient and occurs only within-task, to assist in fulfilling the task demands. Given that participants in our study successfully recognized the association between speakers and their language use following the exposure phase, it is reasonable to conclude that the language–speaker association is not solely the result of an automatic memory mechanism (Pickering & Garrod, Reference Pickering and Garrod2004, Reference Pickering and Garrod2006), but rather that individuals actively map – under some conditions – inter-speaker stylistic variability. Importantly, in our experiment, participants learned the association of language use with specific speakers without interactively forming conceptual pacts, meaning that language-speaker mapping is not necessarily the result of an interactive process by which interlocutors create shared representation jointly.
Supplementary material
The supplementary material for this article can be found at http://doi.org/10.1017/langcog.2023.32.
Acknowledgments
This work was funded by the Israeli Science Foundation (ISF 1824/17) and by the Alon fellowship. The authors would also like to thank Prof. Sarah Brown-Schmidt for her helpful comments on the data analysis and Shiri Hornick and Omri Kimchi-Feldhorn for contributing their voices for the auditory stimuli.
Data availability
Data and code are available on this link: https://osf.io/sru8d/.
Competing interest
The authors declare none.