Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-26T06:01:19.506Z Has data issue: false hasContentIssue false

Dominic Schmitz, Production, perception, and comprehension of subphonemic detail: Word-final /s/ in English (Studies in Laboratory Phonology 11). Berlin: Language Science Press, 2022. Pp. vi + 193. ISBN 9783985540594.

Review products

Dominic Schmitz, Production, perception, and comprehension of subphonemic detail: Word-final /s/ in English (Studies in Laboratory Phonology 11). Berlin: Language Science Press, 2022. Pp. vi + 193. ISBN 9783985540594.

Published online by Cambridge University Press:  12 February 2024

Motoki Saito*
Affiliation:
University of Tübingen
*
Department of Linguistics University of Tübingen Keplerstr. 2 72074 Tübingen Germany motoki.saito@uni-tuebingen.de
Rights & Permissions [Opens in a new window]

Abstract

Type
Book Review
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

1 Summary

Morphological structures are invisible to phonology and phonetics. It is a widely accepted assumption in theoretical linguistics (e.g. Chomsky & Halle Reference Chomsky and Halle1968) and psycholinguistics (e.g. Levelt et al. Reference Levelt, Roelofs and Meyer1999). Dominic Schmitz's monograph Production, Perception, and Comprehension of Subphonemic Detail: Word-final /s/ in English challenges and tests this assumption by means of production, perception and comprehension experiments. The focus is on English word-final /s/, which can be a part of a stem, namely non-morphemic /s/ (e.g. box [bɒks]), a plural inflectional suffix (e.g. books [bʊks]) and also a clitic of is and has (e.g. The book's) among others. Throughout production, perception and comprehension studies, the author shows that different types of /s/ are phonetically realized differently as different durations, and that such durational differences can be perceived and employed for comprehension.

The monograph consists of eleven chapters. In chapter 1, three main aims are set up. The first aim is to investigate whether durational differences exist among different types of /s/. The second aim is to explore how big such durational differences need to be in order to be perceived. The third aim is to test whether such durational differences affect the comprehension process.

Chapter 2 provides a series of reviews of the literature. For production, it is pointed out that previous corpus studies unanimously report the longest duration for non-morphemic /s/, middle duration for plural /s/ and the shortest duration for clitic /s/. Previous experimental studies, by contrast, provide unclear results. Schmitz claims that a potential reason for the unclear results is confounding by uncontrolled lexical factors such as frequency. This consideration led to the use of pseudowords throughout the monograph as well as real words. For perception and comprehension, he claims that there are very few studies so far that looked into perception and comprehension of morphologically different but phonologically identical segments, while he cites one study which found that 25 ms is the lower threshold for perceptibility.

Chapter 3 provides three pieces of information: (i) a description of the experimental materials; (ii) the procedure of statistical analysis; and (iii) an introduction to Linear Discriminative Learning (LDL). Regarding experimental materials, pseudowords are given the main focus along with real words in all the studies reported in the monograph, except for one of the comprehension studies, which is reported in chapter 7. This decision is based on the claim that pseudowords are free from storage effects such as frequency effects. All the pseudowords have the structure CCV(V)C(s). The onset consonant cluster and the nucleus vowel are one of /glɪ/, /prʌ/, /pliː/, /cluː/, /blaʊ/ and /gleɪ/. The offset consonant cluster can be /p(s)/, /t(s)/, /k(s)/ or /f(s)/. The word-final /s/ of these pseudowords is intended to act as both non-morphemic and plural /s/, being disambiguated by pictures of aliens and context sentences in the experiments. Real words are adopted in the perception study (chapter 6) and one of the comprehension studies (chapter 7). The real words used in these chapters all consist of one syllable with a single consonant onset, a monophthong or diphthong vowel, and a single consonant and /s/ in the coda position (e.g. mix for non-morphemic /s/ and books for plural /s/).

Linear mixed-effects models are used to investigate durational differences among different types of /s/ in chapters 4 and 5. Generalized additive mixed-effects models for beta-distribution (BGAMMs) are used to model listener sensitivity in the perception study in chapter 6. Piecewise additive mixed-effects models (PAMMs) and additive quantile regression models (QGAMs) are used to investigate reaction times and mouse-tracks in the comprehension studies in chapters 7 and 8.

For variable selection, the backward stepwise selection procedure from the maximal model is employed. Collinearity is checked and resolved by one of the following three methods: (i) variable selection by comparing performances of models of each of the correlated variables; (ii) Principal Component Analysis (PCA) to combine correlated variables; and (iii) variable removal by variance inflation factors (VIFs).

For LDL, the method of setting up form and semantic vectors adopted in Baayen et al. (Reference Baayen, Chuang, Shafaei-Bajestan and Blevins2019) is explained in detail, in addition to how to estimate associations among form dimensions and semantic dimensions. In this setting-up method, form vectors are coded 1 and 0 for triphones, and semantic vectors are estimated by Naive Discriminative Learning for each lexome (i.e. word stems such as cat and morphological functions such as plural).

The following two chapters (chapters 4 and 5) provide two successive analyses of the same production experiment. In this experiment, participants looked at pictures of aliens on a computer screen with context sentences and answered a question that elicited production of the target pseudowords. In chapter 4, the duration of the word-final /s/ of these pseudowords is analyzed in terms of the predetermined morphological status of these pseudowords. Using linear mixed-effects models with a number of covariates, it is revealed that non-morphemic /s/ is the longest, plural /s/ is in the middle and clitic /s/ is the shortest in duration. Carefully reviewing potential theories and frameworks, Schmitz concludes that no existing framework predicts these durational differences. To explore a possible account from the perspective of discriminative learning (e.g. Baayen et al. Reference Baayen, Chuang, Shafaei-Bajestan and Blevins2019), the monograph moves on to the next chapter.

In chapter 5, an LDL model is first trained, from which a series of potential measurements are derived. Three linear mixed-effects models are then fitted: (i) the model with traditional measures only, which is comparable to the model in chapter 4; (ii) the model with the LDL-derived measures in addition to the variable of coding the status of /s/; and (iii) the model with the LDL measures only. The first model showed basically the same results as in chapter 4. The second and third models suggested that greater semantic uncertainty is associated with shorter duration of /s/, while greater phonological uncertainty is associated with longer duration. In terms of model performance, the first and second models are comparable, while the third model is significantly worse than the other two models, although their performances are numerically very close to each other. From these results, Schmitz concludes that different types of /s/ are not yet fully captured by these LDL measures, while he also emphasizes that durational differences among different types of /s/ are robust.

Perceptibility of such durational differences of /s/ is tested in chapter 6 with the same–different task, where participants hear two instances of /s/ of the same or different durations and answer whether they are the same or different. While responses by participants in the same–different task are typically analyzed in terms of error-rates, Schmitz makes use of a measure from Signal Detection Theory (Macmillan & Creelman Reference Macmillan and Creelman2005). This measure is based on error-rates but individual differences in sensitivity to certain (durational) differences are taken into account. Based on the sensitivity measurement, the author demonstrates that a durational difference of 35 ms shows an indication of increase in perceptibility and that a durational difference of 75 ms shows a clear increase in perceptibility.

Durational differences being perceptible does not necessarily mean that such durational differences are employed for comprehension of words. The author, therefore, proceeds to test if manipulated durational differences of /s/ can affect comprehension accuracy and speed (chapters 7 and 8). In chapter 7, only real words are employed to compare non-morphemic and plural /s/. In chapter 8, pseudowords are adopted to compare plural and clitic /s/.

Duration of word-final /s/ is manipulated for all the experiment items in chapters 7 and 8, creating matched and mismatched conditions. In the matched condition, the acoustic signal of the word-final /s/ is taken from the same condition as, but a different recording from, all the other segments up to the word-final /s/ (e.g. plural stem + plural /s/). In the mismatched condition, the acoustic signal of /s/ is taken from a different condition from the other segments up to /s/ (e.g. plural stem + non-morphemic /s/). Participants listened to these manipulated recordings and answered by mouse-clicking whether the item they had heard described ‘one’ or ‘two or more’ entities.

Responses by participants are analyzed with respect to their response times (only in chapter 7) and mouse-tracks (in both chapters 7 and 8). No effect of the manipulation condition (i.e. matched vs. mismatched) is found for reaction times, while mouse-tracks are found to deviate significantly in the mismatched condition, compared to the matched condition. Based on these observations, it is concluded that durational differences of /s/ are not only perceptible but also employed for comprehension.

Chapters 9, 10 and 11 are general discussion, conclusion and supplementary material respectively. The results obtained from the production, perception and comprehension studies are interpreted to indicate that differences in morphological status are reflected in phonetic realizations (chapters 4 and 5), perceived by the listener (chapter 6) and effectively employed for comprehension (chapters 7 and 8). These findings challenge a classical view of morphology being independent from phonology and phonetics. Morphological information is mostly assumed to not affect subphonemic realizations in literature. Perception and comprehension models operate mostly on phonemes, and therefore subphonemic differences are assumed to play no role in perception and comprehension. Schmitz claims that these findings call for the revision of most classical models that exclude effects of subphonemic details.

2 Evaluation

For the production of /s/, the monograph provides two studies to investigate durational differences among different types of /s/ (chapters 4 and 5). The second part of the two studies (chapter 5) makes use of an LDL model to seek for possible driving factors for different durations of different types of /s/. In order to integrate pseudowords, the author performs a unique modeling process. First, a form-matrix and a semantic-matrix are set up for real words, i.e. Crw and Srw. From these matrices, associations from forms to meanings are then estimated, i.e. Frw. This estimated association matrix is then used to create a semantic matrix for pseudowords, i.e. Spw. Its corresponding form matrix for pseudowords is created from word-forms of pseudowords, i.e. Cpw. These form-matrices and semantic-matrices of real words and pseudowords are stacked up as below:

(01)$$C_{comb} = \left[{\matrix{ {C_{rw}} \cr {C_{\,pw}} \cr } } \right], \;S_{comb} = \left[{\matrix{ {S_{rw}} \cr {S_{\,pw}} \cr } } \right]$$

Ccomb and Scomb are then used to estimate association matrices between these two matrices: one from forms to meanings and the other from meanings to forms, namely (2):

(02)$$F_{comb} = C^{\prime}_{comb}S_{comb}, \;G_{comb} = S^{\prime}_{comb}C_{comb}$$

where Ccomb and Scomb represent the Moore–Penrose generalized inverses of Ccomb and Scomb respectively. Finally, Fcomb and Gcomb are used to produce predicted semantic vectors and form vectors, namely (3):

(03)$$C_{comb}F_{comb} = \hat{S}_{comb}, \;S_{comb}G_{comb} = \hat{C}_{comb}$$

(in the monograph, is used instead of Ĉcomb)

This way of estimating associations between C and S is called endstate learning (Chuang & Baayen Reference Chuang, Harald Baayen and Aronoff2021). Endstate learning is heavily tied with its incremental counterpart of learning. In incremental learning, associations between forms and meanings are usually initialized to be zero and updated little by little according to co-occurrences of forms and meanings, namely learning events. This process of associations being calibrated by learning experiences can be conceptually understood as an individual person being born with no knowledge about form–meaning relationships, learning form–meaning relationships little by little through learning experiences, and acquiring a certain linguistic knowledge or a lexicon. Endstate learning estimates associations that are theoretically expected to be reached ultimately after an infinite number of learning events/experiences.

Consequently, the endstate estimation of associations with all the real words and all the pseudowords aggregated, namely the equations in (2) above, makes a certain assumption that all the real words and all the pseudowords are identical in their lexicality. In other words, in this hypothetical world, pseudowords are encountered as frequently as real words, hence there is essentially no distinction between real words and pseudowords. In the monograph, a single pseudoword (e.g. pleeps) can be morphemic (i.e. pleep+s) and non-morphemic (i.e. pleeps). They are only disambiguated in the experiments by pictures of aliens and context sentences. As a consequence, in the hypothetical world of the trained LDL model, the word-final /s/ cannot be a good cue for either non-morphemic meanings (e.g. singular) or morphemic meanings (e.g. plural). This could be one of the possible reasons why the measures derived from the LDL model did not show such clear predictivity, compared to traditional measures.

In fact, all the LDL measures that are removed either by step-wise elimination or due to collinearity issues in the LDL-only model in chapter 5 are all involved with Ŝcomb or Ĉcomb. These two matrices are based on Fcomb and Gcomb respectively, in which the word-final /s/ is likely to be a poor cue for either morphemic or non-morphemic meanings due to morphemic and non-morphemic pseudowords being aggregated together with real words. At the same time, all the LDL measures that do not involve any of Fcomb, Gcomb, Ĉcomb and Ŝcomb turn out to be significant predictors.

Therefore, insignificant effects of the LDL measures in the production study (chapter 5) may need to be taken with caution. These LDL measures are based on the implicit assumptions that all the pseudowords occur in the same way as the real words and that, especially for the pseudowords, the word-final /s/ occurs infinite times with the meaning of singular and other infinite times with the meaning of plural.

Nevertheless, it should be emphasized that the LDL measures found to be significant are mostly free from this implicit assumption. Therefore, the main findings and the main claims made in this monograph regarding the production studies are not affected so much by this issue.

In the perception study (chapter 6), it is investigated whether manipulation of /s/ duration is perceptible. In order to find out how much of durational manipulation is detected by participants, taking into consideration individual differences in sensitivity to durational differences, a sensitivity measure from Detection Theory was employed. This measure required responses by participants for each trial to be aggregated for each participant and for the four conditions of durational manipulation.

Due to this aggregation, different properties of experiment items/words are not taken into account. Among them are the distinction of real words and pseudowords, biphone probability and the types of /s/. Because the items that each participant encounters are balanced and their orders are pseudo-randomized, the aggregation process is not likely to have distorted the results. Nevertheless, the aggregation process obscured effects of (sub)lexical properties. For example, higher biphone probability may make real words and pseudowords both easier to detect durational manipulation. In addition, the loss of data points by that aggregation process may have weakened statistical power and may have led to the less clear effect of the durational manipulation of 35 ms.

However, again, this concern does not affect the main findings and the main claims of the perception study. Being quite the opposite, the findings from the perception study can be interpreted to suggest that effects of durational manipulation of 35 ms and 75 ms are found in spite of a weakened statistical power, hence quite robust findings.

In the comprehension studies in chapters 7 and 8, participants heard pseudowords and real words with manipulated durations of /s/ and were asked to click ‘one’ or ‘two or more’ on a screen with a mouse. Mouse-tracks by participants are analyzed as well as reaction times. For the analysis of mouse-tracks, QGAMs are employed. Due to a heavy computation load of QGAMs, the model structure is kept to the minimum. As a consequence, the dependent variables (i.e. x- and y-coordinates) are modeled as a function of simple effects of ‘order’ (normalized time steps), ‘condition’ (matched vs. mismatched durations of /s/ with the stem), ‘item and ‘speaker’.

Without an interaction between order and condition, the fitted models are evaluated with respect to the estimated simple effects of condition. Since condition is not interacted with order, the estimated simple effects of condition represent overall mean differences in x- and y-coordinates, aggregating mouse positions in different time points altogether. As a consequence, estimated effects of condition may also represent when a mouse starts moving and how fast the mouse pointer moves, in addition to deviations of mouse-tracks. For example, suppose two mouse-tracks follow exactly the same trajectory, namely no deviation of mouse-tracks between them, but one of them starts moving immediately after the onset of the item with a very fast movement, while the other stays rather longer in the origin and starts moving much later with a slow movement. Then, the first mouse-track would be recorded more towards the goal (the top left or right corner) for more time steps, while the second mouse-track would be recorded more around the origin for more time steps, namely leading to different quantile points (e.g. median).

Moreover, a mouse-track can make a turn and come back, for example when it overshoots the intended goal position and comes back. Such a turn would aggregate the same coordinates from different time points. For example, suppose the mouse pointer in one condition stays at x = 0 at t = 0, moves to x = 1 at t = 1 and comes back to x = 0 at t = 2, while the mouse pointer in the other condition stays at x = 0 at t = 0 and 1 and moves on to x = 1 at t = 2. Both of these mouse-tracks are different; nevertheless, there is no difference between the two conditions. In both the coordinate x = 0 is registered twice, and the coordinate x = 1 is registered once. This is due to the fact that it is invisible to the variable of the conditions at which time step each coordinate is registered, given the lack of interaction between condition and order. The exclusion of the interaction between condition and order may be one of the possible factors that led to occasional non-significance of the effects of condition.

In conclusion, this monograph demonstrates the presence of systematically different subphonemic durational differences among different types of English word-final /s/ according to its morphological status in a very well-organized concise manner, in spite of such intricacy and deep consideration of well-balanced experimental setups and a quite robust analysis procedure. The findings of the studies presented in this monograph not only help to disentangle and resolve contradictory unclear results of previous literature on this topic, but also convincingly demonstrate the necessity of reconsidering the view of morphology being a separate encapsulated module independent from phonology/phonetics. Because of its well-organized and easy-to-follow manner of argumentation, this monograph certainly serves well for a wide audience, including students, junior researchers and also experts in the field. This monograph certainly helps the reader to learn important literature and previous findings on the topic, to learn cutting-edge analysis methodologies, and to learn a new perspective of morphology, phonology/phonetics and their interactions.

References

Baayen, R. Harald, Chuang, Yu-Ying, Shafaei-Bajestan, Elnaz & Blevins, James P.. 2019. The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 139.CrossRefGoogle Scholar
Chomsky, Noam & Halle, Morris. 1968. The sound pattern of English. New York: Harper and Row.Google Scholar
Chuang, Yu-Ying & Harald Baayen, R.. 2021. Discriminative learning and the lexicon: NDL and LDL. In Aronoff, Mark (ed.), Oxford research encyclopedia of linguistics. Oxford: Oxford University Press.Google Scholar
Levelt, Willem J. M., Roelofs, Ardi & Meyer, Antje S.. 1999. A theory of lexical access in speech production. Behavioral and Brain Sciences 22, 175.CrossRefGoogle ScholarPubMed
Macmillan, Neil A. & Creelman, C. Douglas. 2005. Signal detection theory: A user's guide, 2nd edn. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar