The Organization of the Mental Grammar

Harry van der Hulst

doi:10.1017/9781009346313.003

2 - The Organization of the Mental Grammar

from Part I - Introduction

Published online by Cambridge University Press: 13 November 2025

Harry van der Hulst

Show author details

Harry van der Hulst: Affiliation:
University of Connecticut

Book contents

Summary

In this chapter, I will provide a brief outline of the structure of the mental grammar, referring for a more extensive treatment to ML, Chapter 6. This chapter then offers a conversation about what Noam Chomsky considers to be the most central linguistic argument for his Innateness Hypothesis (IH), the poverty of the stimulus argument. We then discuss some different ways in which the mental grammar could be organized. Finally, I will raise questions about what kinds of evidence could falsify the IH and whether such evidence can actually be found. In this connection, we will also ask how rich the alleged innate system needs to be.

Keywords

language input Universal Grammar words sentences phonology syntax semantics lexicon morphology grammaticality

Information

Type: Chapter
Information: Genes, Brains, Evolution and Language
The Innateness Debate Continued
, pp. 17 - 76

DOI: https://doi.org/10.1017/9781009346313.003 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2025

2 The Organization of the Mental Grammar

Section I The Structure of the Mental Grammar

Introduction

In this first section of Chapter 2 we will outline in some detail how the mental grammar is organized internally in terms of different submodules and, within each, the basic units and the rules that account for building hierarchical structures and, in some cases, altering them.

It will be obvious to anyone that people who speak and understand a language (which one might call a certain type of behavior, namely linguistic behavior) can do so because they possess a mental grammar which is part of the knowledge that allows them to use their language. This mental grammar can be compared to a grammar book that, at the very least, contains a list of words and rules for forming sentences. We will see in this section that there is a little more to it, but first we will need to establish that to be a language user, a person needs to know more than what is covered by the mental grammar which is responsible for characterizing all and only those linguistic expressions (i.e., words and sentences) that are grammatically well-formed, in short: that are grammatical. It is tempting to think of a mental grammar as a system that “generates” linguistic expressions that can then be produced using the articulatory organs. However, language users do not only produce language; they also perceive and understand it. For this reason, we should think of the mental grammar as being neutral with respect to production and perception, its task being to characterize the grammaticality of linguistic expressions. This neutrality is maintained, even though we will in this chapter refer to grammatical rules as “structure building” or as “structure changing.” The mental grammar thus does not cover how linguistic expressions are produced as audible sound or visual sign, nor does it account for how people perceive these sounds or signs such that they can be recognized by the mind as grammatical linguistic expressions. To these ends, a mental grammar is “surrounded” by language processing systems (LP) that regulate production and perception of linguistic expressions, which is often called externalization; this means that these systems mediate between the mental grammar and the external, observable side of language. One could say that the mental grammar is like the spider in a “language web” that consists of several modules that play a role in being a language user.

In A Mind for Language (ML), chapter 2, I schematically represent the “language web” as follows. Here bear in mind that the innate universal grammar (UG) expands during the process of language acquisition into the mental grammar based on the language input (primary linguistic data, PLD) that children get. We assume that there is a mediating system, called a language acquisition device (LAD), that offers a preliminary analysis of the PLD involving grouping of percepts into categories and establishing recurrent patterns, as well as pointing the learner to aspects of those patterns that are relevant for “setting the parameters,” which means the choices that learners need to make with respect to aspects of languages in which they differ (such as “word order,” choice of speech sounds and so on). See (1).¹

(1) Language Input (PLD) + (LP (LAD (UG → MG))) ⇒ Externalization

In more recent terminology, nativists refer to the mental grammar as internal language (I-language) and its externalization as external language (E-language).

A central concern in the present book, as well as its prequel ML, is whether the various components of the language web are specific to language or fall under more general cognitive capacities. As I mention in ML, chapter 2, instead of “LAD,” I prefer LD (learning device), assuming that this mediating system is likely not specific to language, because, among other components, pattern recognition and sensitivity to frequency of linguistic units are general cognitive abilities that mediate across all kinds of perceptual input, perceptual processing, and learning. As for the language processing systems, those clearly recruit motoric activity that is not specific to producing language (whether spoken or signed) and the same holds for receiving and processing the perceptual side of language. It seems nevertheless plausible that a certain degree of specialization in the muscular and neural control of these systems has evolved during the long period that humans have used language. The big debate between empiricists and nativists is whether there actually is a spider in the language web, Universal Grammar, and if so, what exactly it contains. Empiricists have always argued that we do not need the spider at all and that the construction of mental grammars can be entirely explained in terms of general learning systems (in particular, categorization, pattern recognition, and sensitivity to frequency). I have already mentioned in Chapter 1 that nativist views on this matter have changed during the decades that have passed since Chomsky first proposed his Innateness Hypothesis (IH) for language, to a point where the content of UG may be very “minimal.” This development is central to Chapter 4.

Words and Sentences

Returning to a simple characterization of the mental grammar as a list of words and rules for forming sentences, it is noteworthy that this intuitively straightforward characterization recognizes a distinction between words and sentences. I take this distinction as fundamental, even though, as we will see, both types of linguistic expression reveal several similarities, which has led some linguists to say that no sharp distinction can be made. Whatever the outcome of the controversy about this, I find it a useful distinction to maintain, if only for didactic purposes. We will see that both words and sentences have three layers of structure that capture the sound aspect, the meaning aspect, and the “combinatorial properties” of these units, that is, the properties that indicate how words can be combined into sentences and what kind of structures they then form. We can think of a sentence as a three-layered cake (choose whichever flavors you like) and of words as slices of that cake, which thus also have three layers.² Given the intuitive basic nature of words, I will start by explaining the three layers of this smaller unit and then turn to sentences.

The Three Layers of Words

Ask yourself what you know about a word; for example, the word cat. A brief reflection will tell you that you know how this word sounds and how you can make that sound, you know what it means, and finally, you know that it is a noun (that is, I hope you know that, but if no one ever told you this, please accept that this is the case). We have technical terms for these three layers, as in (2).

(2)
Figure 2.01

The term “syntax” stands for “putting together,” which here refers to putting together words into sentences, which, as we’ll see shortly, is crucially dependent on knowing the word category (i.e., noun, verb, adjective, etc.)

I will now discuss the three layers of words in more detail, introducing some linguistic terminology. A very important take-home message is that each layer is hierarchically structured, which means that for each layer there is a set of basic units and structure-building rules (also often called constraints) for how the units can be combined into hierarchical structures.

The Phonological Layer

The fact that the sound form of words is structured is easy to digest when we consider how our alphabetic writing system represents words as sequences of letters. While most alphabetic writing systems are not perfect in this respect, the basic idea of such systems is that each letter represents a speech sound.³ However, I will explain that letters do not (however imperfectly) represent actual speech sounds. Rather, they represent the “mental idea” of speech sounds. We call these mental ideas, which are phonological concepts, phonemes. Phonemes are the basic units of the phonological layer of words.⁴ Languages differ in terms of which phonemes they use. Linguists have designed a special notation system called The International Phonetic Alphabet (IPA) that has separate symbols for all the phonemes that seem to occur in the world’s languages, although we have to add that not all languages in the world have been studied in sufficient detail. These symbols look like letters (although IPA also contains many other symbols), but do not think of them as letters that we use to write down words. Think of them as notations for phonemes (i.e., the segmental units in terms of which the phonological form of words is represented in the mental lexicon).

It is of great importance to now learn that any given phoneme can be pronounced in many different ways depending on its position within a word, especially due to neighboring phonemes. We call such positional variants allophones. A striking example in English is that the phoneme /k/ (we use slant lines when we refer to phonemes using IPA symbols) is pronounced very differently in words like keep and peek. In most dialects of English, the /k/ in keep is pronounced with “aspiration” (a little puff of air at the beginning of the following vowel), which is missing in the /k/ in peek. The same difference can be observed in the t’s and p’s in the words tip and pit. This small phonetic difference is due to an allophonic rule (which we could also call a realization rule). Aspiration is a predictable property of a /k/ occurring, to be precise, at the beginning of a syllable when followed by a “stressed” vowel, that is, a vowel that is pronounced with more effort relative to other vowels in the word, such as the first vowel of the word tapestry, or the last vowel of the word harpoon in which the /t/ and the /p/ (respectively) are realized with aspiration. We can use IPA to represent allophones, but then we put the symbols between square brackets: [k^h], [t^h] and [p^h]. This means we use the same IPA symbols both for phonemes and their allophones.

The distinction between a phoneme and its realizations (allophones) leads us to make the following fundamental point. When a phoneme has, say, two allophones, like [t] and [t^h] for the phoneme /t/, it is necessarily the case that we will never find two words that are identical except in having [t] or [t^h] in the same position. Here it is important to understand that the allophones of a phoneme can by definition never occur in the same position in the word precisely because it is the specific position that determines the allophonic realization of a phoneme. You can compare this to the fact that Superman and Clark Kent could never be present in the same room, precisely because they are one and the same person. Make sure you get this! Phonemes are like chameleons: Their realizations adapt to the environment in which they occur. Indeed, in English, we can never have two different words such as [tɪp] and [t^hɪp], precisely because the phoneme /t/ must always be aspirated in a position where it is followed by a stressed vowel. What is possible in English is to have two different words that sound like [dɪp] and [t^hɪp] and this means that the “phones” [d] and [t^h] must be realizations of two different phonemes, namely /d/ and /t/ (just like Superman and Batman are two different persons). We say that the difference in the realizations of these two phonemes is contrastive. When two “phones” can be used contrastively, by definition, they necessarily count as realizations of different phonemes. In English, “aspiration” is a predictable phonetic property of the phonemes /p/, /t/, and /k/ when these occur in a certain position (namely before a stressed vowel).⁵ However, in some other languages (like Thai), aspiration is actually contrastive, which means that in such languages, [tɪp] and [t^hɪp] could be different words, with different meanings. We say that in Thai /t/ and /t^h/ are separate phonemes that can minimally distinguish words from each other.

A given phoneme can have more than two allophones. Take /t/ again. When placed between vowels in a word like water, the /t/ is realized as what we call a “flap,” in IPA [ɾ]. In some other cases, it would seem that a phoneme has only one allophone. We could say this for the /b/ in English. But we must always remember that every phoneme has an almost unlimited number of realizations if we take into account the fine differences that are caused by anatomical differences between people or the mood they are in, or how much alcohol they have consumed. In addition, there are minute differences that are due to their position in the word but are so small that they can hardly be heard. For example, the /k/ in English words like cool and keep is different due to the nature of the following vowel. This difference, however, is bound to occur in every language in which a /k/ occurs when it is followed by such vowels. In general, a linguist making an IPA transcription for a language that has hardly been analyzed would try to account for allophonic differences that could be contrastive (based on knowledge that they have of other languages, or what they learned in their introductory phonology course) and thus not for phonetic effects that are inevitable and universal. It is worth pointing out that a child being exposed to language input has to figure out which phones are allophones of the same phoneme and which phones count as realizations of different phonemes. It is not self-evident that they use the same strategies to do this as linguists do; see ML, chapter 10.⁶

When confronted with the incoming flow of speech, the child needs to figure out (after a rough segmentation of the stream of sound into “phones,” which is by no means an easy task), which phones are allophones of the same phoneme and which are not. The outcome of the process is that the child has established the inventory of phonemes and the inventory of allophonic rules.

Having thus established that the basic units of the phonological form of words are phonemes, let us now reckon with the fact that the phonological form of words is not just a linear sequence of phonemes. Rather, phonemes are hierarchically organized into units that we call syllables, such as the three syllables in a word like albatross: al/ba/tross. Why do we need syllables? It only takes a brief moment of reflection to realize that once we have established what the phonemes are in a given language (not an easy process, for sure), we quickly see that there are many restrictions (constraints) on how such phonemes can be combined to result in a phonological word form that is possible in the language at issue. For example, at the syllable beginning (in English), there can be a vowel (ink) or up to three consonants (pin, print, sprint), but which combinations of consonants can occur is severely limited (not rpint, prsint, etc.). The constraints would appear to refer to what counts as a possible syllable in a language. The manner in which phonemes are organized into syllables involves a division in two syllable parts, called the onset (O; consonants preceding the vowel) and the rhyme (R; the vowel plus following consonants); see (3). What is claimed here is that most constraints on phoneme combinations refer to the unit syllable (or its subunits onset and rhyme) as the domain within which these constraints hold.⁷ It is important to know that two languages can have the same set of phonemes yet differ in the constraints on combining them into syllables. For example, while the onset in English can contain up to three consonants, there are many languages, such as Japanese, in which onsets can contain only one consonant. The study of how languages can differ in their inventories of syllables reveals a lot of variety.

We must realize that when people speak they do not literally build their syllables. Words are stored in the lexicon with their phonemic form and syllabic organization. Nevertheless, we say that people have knowledge of what possible syllables there are in their language and thus of the rules (some would say constraints) according to which syllables are formed. Speakers of English know these rules/constraints because they can judge whether a combination is possible for their language or not. In other words, people know the structure-building rules for combining phonemes into syllables. The set of phonemes and the set of structure-building rules belong to a submodule of the grammar that is called phonology. It would be wrong to believe that once we have established what the possible syllables are in a given language, every sequence of syllables is a possible word form. There are also constraints that prohibit certain combinations of syllables, which means that these constraints refer to a unit larger than the syllable (i.e., a grouping of syllables, see below). For example, a word form like /trʌnpǝt/ in which the first syllable ends in /n/ and the second syllable starts with /p/ is not permitted in English. In comparison, trumpet /trʌmpǝt/ is fine. I will discuss the constraint that plays a role here later on.

Another constraint on word form, at least in English, is that one of the syllables must contain a stressed vowel (or we say, one of the syllables must be stressed). Representing stress is done by grouping syllables into binary units,⁸ and such units into the whole word structure, while designating syllables and their groupings in terms of a “strong/weak” labeling that represent degrees of stress.

All in all, this means that a word like albatross has the hierarchical phonological structure shown in (3).

(3) Tree diagram for the word form (i.e., phonological) layer of the word albatross ⁹

Figure 2.02Long description
At the very top, the Phonological Word branches into two main nodes: S (Strong) and W (Weak). These two main nodes lead to the S; W; syllable node sigma; O (Onset); R (Rime); V (Vowel); C (Consonant). To the right of the diagram, a legend provides explanations for the symbols used: S stands for Strong, which implies stressed. sigma stands for syllable. O stands for Onset, R for Rime. C and V are the features for consonants and vowels. The elements at the very bottom represent sets of phonological features, other than modulus C, V.

A tree diagram is a graph that consists of lines and nodes. Each line begins and ends at a node. The lowest nodes are called terminal nodes and the highest node is the top node (or root node). Other nodes are intermediate nodes. We can also use a “mother/daughter/sister” terminology: A node that dominates lower nodes (two or just one¹⁰) is called a mother node and the nodes that are dominated by a mother are called daughter nodes (which makes these nodes sisters of each other).

While we regard phonemes as the basic building blocks out of which word forms are constructed, these units can be further analyzed into even smaller units (called phonological features, also simply called features, a term I will use here). For example, all consonants share the feature |C| and all vowels share the feature |V|, which are used in (3) to label the terminal nodes of the tree structure that dominate the actual phonemes.¹¹ Further features differentiate the consonants and vowels into subclasses. For example, the consonants b, d, g, v, z (and a few others) are all “voiced,” that is, they share the feature |voice|. Features are, in a sense, the smallest properties out of which phonemes are constructed. Given this, think of the IPA symbols in (3) as sets of features. Distinguishing features is useful because in the case of the allophonic rule for aspiration, this rule, as we have seen, applies not only to /k/, but also to /p/ and /t/. With features, we can naturally characterize groups of phonemes that have the same distribution in word forms and that are subject to the same allophonic rules. For example, the allophonic rule for aspiration in English applies to phonemes that share the features |C|, |stop|, and |voiceless|. While features are smaller than phonemes, many phonologists believe that phonemes most directly reflect the knowledge that people have of the phonological form of words, and I will try to defend that view here.¹²

In conclusion, phonetic differences between allophonic realizations of phonemes cannot be used to differentiate the form of words in the mental lexicon. They predictably arise when phonemes are pronounced in certain contexts (i.e., surrounding phonemes, stress). When perceiving speech, people are not even aware of the allophonic variants, although during language acquisition, children subconsciously “notice” them, which leads them to internalize the allophonic rules. Interestingly, picking up on such rules is much harder when we learn a foreign language later in life, which is the major cause for “speaking with an accent.” Not only do we fail to pick up on the allophonic rules of the new language, we also impose the allophonic rules of our native language when we speak the new language.

I conclude this discussion of phonology with two important points. Firstly, I repeat that we must not confuse phonemes with letters of the writing system. Secondly, while the role of phonemes is to minimally differentiate word forms that have different meanings (such as in keep, seep, beep, etc.), phonemes by themselves have no meaning. The building blocks of the word form (whether features, phonemes, syllables, or syllable groups) are as such devoid of meaning.

The Semantic Layer

We now turn to the meaning layer of words, which I will cover in less detail. The study of word meanings is called semantics. According to most linguists, word meanings can also be analyzed into smaller units called semantic concepts, for short: sememes (analogous to the term phoneme). For example, the meaning of the verb kill can be said to contain the sememes (which we usually write in capitals) CAUSE – BECOME – NOT – LIVING. Just like in the case of word forms, word meanings are specific combinations of these smaller, basic units. This then also implies that there are structure-building rules for how such sememes can be combined to make up word meanings.

(4)
Figure 2.03

In (4) I not only break up the meaning of kill into four sememes, I also suggest that these units are organized in a hierarchical structure. In this structure, “y” and “x” are variables that stand for the “arguments” of the verb that will be filled in by the subject and object when this verb occurs in a sentence like the man (= y) killed the dog (= x).

As in the case of phonemes, one might propose that sememes can be further analyzed into semantic features, but I will not go into that issue here, save for mentioning that not all semanticists would make a distinction between sememes and semantic features. Another important point is that to establish the set of sememes for a given language, or in general, is more difficult than to establish phonemes. For many centuries, in fact, many scholars, starting with philosophers, then psychologists and finally linguists have tried to come up with such a set.¹³

We should note that the tree structure in (3) for the phonological layer captures not only the grouping of phonemes, but also their linear order. The tree structure in (4) also seems to encode a linear order of the semantic building blocks, but this is not intended. There is no reason to believe that linear order is relevant to semantic structure. Due to using a two-dimensional medium, such as a sheet of paper, we necessarily impose linearity, but the reader should think of the structure in (4) as a “mobile.” However, it is a question of some debate whether the suggested linear ordering of phonemes in (3) is truly phonologically relevant. Perhaps we have to think of linear order as a phonetic necessity which results when words are realized because we cannot pronounce all the phonemes at the same time.¹⁴ I will return to this issue of linearity and hierarchy when we consider the syntactic structure of sentences.

The Syntactic Layer

The third layer of words is called the syntactic layer. It would seem that this layer has a very simple structure. It is just a category label that represents the word class (or part of speech) of the word, but we will learn in later sections (when we consider certain types of words, called complex words) that the syntactic layer of words can be more complex, with different labels being grouped into hierarchical structures. By analogy with the terms phoneme and sememe, I will call such labels syntactemes, although this is not a commonly used term. Here too the question arises whether syntactemes can be analyzed in terms of syntactic features. Syntacticians certainly take this to be the case and I will return to this point below.

You might wonder at this point why words need to have a syntactic label. The answer is that syntactemes are required to explain how words can be combined into sentences (or, as we will see below, how parts of complex words combine). In English, we cannot combine articles like the or a with verbs or adjectives; rather articles combine with nouns (the car, a car). Even in the expensive car, the article the presupposes a noun in close proximity.¹⁵ Syntactemes are also required to regulate the order in which words occur in the sentence. In English, adjectives occur before the noun (white wine), but this is not so in all languages; in French, most adjectives occur after the noun (vin blanc). To express that, you need to be able to rely on words having a syntacteme layer of information.

The Lexicon

Words have to be learned and memorized with their three layers. The “place” in the mental grammar where we store words is called the lexicon. The inventory of words in the lexicon is not fixed; words come and go. Children learn only those words that they are exposed to, but during their lifetime they can enrich their lexicon either by learning new words from other speakers or by “copying” words from other languages or by making new words in ways that we will discuss below.

The reason why words cannot be innate and thus have to be learned is that the relation between a specific phonological form and a specific meaning of a word is usually what we call arbitrary. You have to learn that the word form dog correlates with DOG (which may very well be decomposable into smaller sememes). The arbitrariness of this relation is evidenced when we consider that in other languages the form that is associated with the same meaning can be quite different (French: chien; Spanish: perro; Italian: cane; Dutch: hond; Polish: pies; Hungarian: kutya; Turkish: köpek; Hebrew: kelev; Japanese: inu; Swahili: mbwa). Sometimes, it would seem that the word form is not arbitrary, as in the Chinese word for cat, which is something like “mao.” Here the typical noise that cats make seems to motivate the word form. In such cases we say that the word form is iconic of (or resembles) a property of cats, namely the noise that they make,¹⁶ but such cases are not very common in spoken languages.¹⁷

In a later section, we will find out that the lexicon must contain more than just the words (and perhaps not even necessarily all words) .

Word Formation: Simplex and Complex Words

Consider the difference between words like wise and unwise. The word wise consists of a phonological structure with several phonemes that is associated with a meaning (WISE¹⁸) and a label Adjective. The word unwise, on the other hand, can be decomposed, not only into a phonological structure with phonemes and syllables, but also into two form–meaning packages, namely un- (which means NOT) and wise. All words that can be decomposed into such smaller packages of form and meaning are called complex, and all words that cannot are called simplex. A form–meaning package that cannot be decomposed into such smaller packages is called a morpheme. Thus, both wise and un- are morphemes, but unlike wise, we do not use un- as a word, which means that it is a “word piece” or, with a technical term, an affix. Affixes are morphemes that cannot occur by themselves as words. Another way to refer to the difference between wise and un- is to say that wise is a free morpheme (which thus means that it is at the same time a simplex word), while un- is a bound morpheme.

All languages have affixes that can be attached to words (whether simplex or complex) to form new words. Here, in (5), are some examples of affixes in English (try to think of complex words that contain these affixes).

(5)
Figure 2.04

Another way to form new words is to simply combine two free morphemes: arm chair, kitchen table, iPod charger. Such complex words have a special name; they are called compounds.¹⁹ In fact, we also have a term for word formation that uses affixes: derivation. Another term for word formation is morphology (the study of complex words in terms of their morphemes).

Morphemes do not only have a phonological form and a meaning, they also have a category label. This is quite obvious for free morphemes. The word wise is an adjective. We will see in a moment that we can also assign category labels to affixes.

Checking for Grammaticality

When complex words are formed, we have to be careful, because we cannot lump together morphemes indiscriminately. We have unwise, but not unchair. Un- cannot be attached to nouns, just like re- cannot be attached to adjectives, but only to verbs (reread). As you see, this is where the syntacteme labels become relevant. Their relevance is to guide and restrict the combinations of morphemes into larger wholes (here: complex words). Consider the complex word readable. Read is a verb, but the whole word is an adjective because -able “makes it an adjective,” and that is, we will say, because -able is an adjective, albeit a bound adjective. (Note that the suffix -able is not the same morpheme as the free adjective able, although the two morphemes are of course historically related. You know that they are different, despite their identical spelling, because they have two different vowels in their first syllable.) However, it is not sufficient to give affixes a syntacteme label, we also need to give them a specification that indicates what they can be combined with and whether they occur before or after their “base.” We have to specify that re- needs a verb as its base. The notation for such contextual frames is [__ V], where the dash indicates the place of re- before a verb.

Importantly, affixes can be added to words that are already complex: re-read-able, where we add -able to reread. We must assume that each time only one affix is added for the following reason. To add -able, we first must add re- because we could not say that re- is attached to readable. After all, readable is an adjective and not a verb, which means that we cannot attach re- to readable. The contextual frame of re- says that it can only be attached to verbs.

Another illustration to show that complex words can be made more complex is that compounds can be made out of words that are themselves already compounds (as in arm chair factory, which combines arm chair with factory, a place where arm chairs are made). Multiple affixation and multiple compounding will thus give rise to hierarchical syntactic word structure. To see this, let us consider a complex word like rereadability. If the syntactic labels of each morpheme match (given the contextual frame of the affixes), we get a complex word that is well-formed from the viewpoint of its syntactic layer, which is the part of the whole word that is enclosed in the box in (6).²⁰

(6)

Figure 2.05Long description
At the top, the primary node is N (Noun). This N node branches into two main constituents: On the left, an A (Adjective) node. On the right, another N (Noun) node. The A node branches into two constituents: On the left, a V (Verb) node. On the right, an A (Adjective) node. This node branches into two more V (Verb) nodes. Under the first V (Verb) node, it’s labeled V, [underscore V]. Under the second V (Verb) node, it’s simply labeled V. The A (Adjective) node that branched from the main A node: is labeled A, [V underscore]. The right branch from the top-level N (Noun) node: is labeled N, [A underscore]. To the right of the tree diagram, three vertical labels includes syntactemes, phonemes, and sememes.

The tree diagram in (6) is meant to say that when we make complex words, we combine the entire morphemes with all their three layers, in such a way that a well-formed syntactic structure is created. The syntactic labels are, so to speak, the hooks that are used to create complex words. The formation of complex words will automatically deliver a hierarchical syntactic structure that is well-formed syntactically, given that the contextual frames of affixes are observed. Also note that we must assume that the syntacteme labels of the morphemes are “projected” upward in the syntactic tree structure to the non-terminal nodes to indicate the syntactic labels of combinations of morphemes. This raises the question of which unit in a combination projects its label to the top node of the combination. This brings us to the notion of headedness. The idea is that every combination has one unit that is the head, and it is this unit the determines the category of the combination. When the two units that are combined have the same label (as in re- and read), we cannot be sure which unit is projecting its label, but because in general affixes do determine the category of the derived word, we could say that in this case re- is the head. The unit that the head combines with is called the dependent.²¹

It is now important to understand that because the syntactemes form just one layer of the morphemes, grouping morphemes with reference to the syntacteme labels has the automatic effect that two other layers of morphemes are also “brought together,” as shown in (6). A logical next step is to check whether a well-formed hierarchical grouping structure can arise at the phonological and the semantic layer, given that so far it only looks like we have a simple linear sequence of phonemes or sememes at these two layers. But recall that we have seen that each layer displays its own hierarchical organization, according to the structure-building rules for each layer. The diagram in (6) for rereadability only displays the hierarchical grouping at the syntactic layer that is enclosed within the box. If we try to give a full representation of the hierarchical structure at all three layers, we arrive at a more sophisticated, multidimensional display, in which the closed triangles in (7) represent the hierarchical structure at each layer, which in this display ignores the details of these hierarchical structures.

(7)

Figure 2.06Long description
In the center of the diagram, there are three horizontal layers. Syntactemes is the topmost central layer. It consists of four repeated segments, each denoted by Sy enclosed in curly braces. Above this layer, lines converge to the syntactic structure label. Sememes is the middle, central layer. It consists of four repeated segments, each denoted by Se enclosed in curly braces. Phonemes are the bottommost central layer. It consists of four repeated segments, each denoted by Ph enclosed in curly braces. Below this layer are the labels Semantic Structure on the left and Phonological Structure on the right.

We cannot assume that the grouping together of morphemes in accordance with their syntactic properties will automatically deliver well-formed hierarchical structures at the phonological and semantic levels. Rather, at these two layers a hierarchical grouping must be established that conforms to the structure-building rules that apply at each layer. For reasons of brevity, I will only consider what is required to get the phonological level in good shape. That said, I can say that while the grouping of the semantic properties of morphemes will usually follow the syntactic grouping quite closely, discrepancies can and do occur. I will consider examples of such discrepancies when we discuss sentence structure.

A Note on the Notion “Word”

If a word has three layers of structure, then the question arises what the top label is of each hierarchical structure in (7). We seem to have three different characterizations of words. The label phonological word (or Pword) is usually used for the top label of the phonological structure of simplex words (words that lack morphological structure, i.e., are monomorphemic). By analogy, we could then use the labels semantic word and syntactic word as the top labels for the other two structures.

This raises the question of what the top labels are of the three structures of complex words at all three layers. I will here assume, without further elaboration, that the same top labels are used for the structures of the syntactic and semantic layer of complex words. As for the phonological layer, certain classes of complex words with affixes and all compounds have a phonological top label that is not Pword. There is a class of derived words that, from a phonological perspective, are just like simplex words, namely words derived with so-called cohering affixes. Cohering affixes “melt” with their base into a single Pword. Examples of cohering suffixes in English are -al, -ity, and -ism, while in- is a cohering prefix. Hence a word like parental (consisting of the two morphemes parent and -al) is one Pword. However, there is a second class of affixes that are non-cohering, like -ful and -hood, which from a phonological perspective form a Pword by themselves. Hence, a word like childhood consists of two Pwords. In this respect, such words are, phonologically speaking, the same as compounds, like arm chair, which also consist of two Pwords. This shows that there can be a mismatch between what we call a “word” at different layers. Terminologically, we label derived words with non-cohering affixes and compounds in their phonological layers as a phonological group, a unit that consists of two phonological words. The recognition of the phonological group as distinct from the phonological word is that the rules for assigning word stress differ (but I will not discuss these stress rules here).

Check Everything

We have now seen that with respect to words, mental grammars must contain three subsystems of structure-building rules or constraints, as shown in (8).

(8) Word Phonology: phonemes and structure-building rules
Word Semantics: sememes and structure-building rules
Word Syntax: syntactemes and structure-building rules

This means that when a complex word is formed, the grammar needs to “check” whether the result is phonologically, semantically, and syntactically well-formed. A complex word is only completely well-formed (i.e., grammatical), when, for each of the three levels, it contains the right basic units (the -emes) which have been properly combined into a hierarchical structure. With respect to the syntactic layer, this is already guaranteed given that we agreed that when we make a new word, we combine the affixes with their base in accordance with their contextual frames.²²

Obviously, all simplex words must be well-formed at each of their three layers to count as grammatical. This is what is guaranteed by having the relevant modules specify for each layer what the -emes are and what the structure-building rules are. Let us say that, by and large, complex words must contain the same -emes and obey the same structure-building rules as simplex words, although in addition they may be subjected to further constraints that are not relevant for simplex words. In fact, we already established that at the syntactic layer, complex words must conform to the contextual frames of affixes. As mentioned, it is also the case that at the phonological level, the stress contours of simplex words can differ from those of derived words (with non-cohering affixes) and compounds.

What about morphemes? Do morphemes also need to contain these same -emes and obey the same structure-building rules? There is a view according to which we do not really have to care about the well-formedness of morphemes. In this view, the grammar checks both simplex and complex words for grammaticality, while there is no checking for morphemes because these, as such, never leave the lexicon on their own. Free morphemes that can occur as words, automatically will have to pass the checking systems for simplex words. As for bound morphemes, those that contain phonemes or phoneme combinations that will cause complex words to be ill-formed will also automatically be filtered out. It stands to reason that over time all morphemes that are used in a certain language will not contain properties that would always cause a simplex or complex word to be ill-formed. This effect is called lexicon optimization.

We will now consider how well-formedness is guaranteed at the phonological layer of complex words that are formed with cohering affixes, and what would be the consequence if, for whatever reason, the phonological layer could not be organized into a well-formed hierarchical phonological structure because they create a phonological layer that is not well-formed. We will see that this situation can arise and trigger the application of structure-changing rules.

Consider the complex word in (9).

(9) parental

This word consists of the morphemes parent and -al. Assuming, as we did, that the combination is firstly driven by the requirements at the syntactic layer, the syntactic structure is then, so to speak, “imposed” on the phonological layer, which then divides the string of phonemes into two linear substrings, namely /parent/ and /al/. However, this division is not appropriate at the phonological layer which instead demands an organization in terms of syllables. This means that syllable structure must be built on the string /parental/, respecting a very general constraint that, where possible, syllables must start with a consonant if one is available. This means that /pa.ren.tal/ (where dots indicate a division between syllables) is chosen over /pa.rent.al/, which would be more in line with the syntactic structure in separating /parent/ from /al/. This shows that at the phonological layer we get a structure that groups the phonemes differently from the syntactic structure. Whereas the syntactic layer implies that in the word parent the /t/ “belongs to the noun parent,” we see that a very different grouping is required at the phonological layer where the /t/ “syllabifies” to become an onset of the word-final syllable (10).²³

(10)

Figure 2.07Long description
The top node is an A. The A node branches into two main parts: On the left, an N (Noun) node. On the right, an A (Adjective) node. The N node connects to the letters p, a, r, e, n, and t. The A node connects to the letters a, l (forming al). Below p and a, there is a s (syllable) node), there is a W (Weak) node. Below r, e, and n, and there is a syllable node, there is an S (Strong) node. Below t, a, and l there is a syllable node, there is a W (Weak) node.

This example serves to show that the hierarchical syntactic structure at the syntactic layer is not necessarily isomorphic with the hierarchical phonological structure.

Now consider complex words that contain the negative cohering prefix in- that combines with adjectives to form adjectives, as in (11).

(11) indecent
improper
incredible

If you pay close attention to how these words are pronounced, you will note that the phonological form of the prefix varies. In the first example the consonant is [n] (as the first phoneme of the word nice). In the second example it is [m] (which we express in the way we write the word), while in the third it is [ŋ] (as in sing; a fact that is not expressed in the way we write the word). It seems reasonable to suppose that all three forms of the prefix are variants of “the same morpheme.” Let us assume that the representation of these three variants in terms of phonemes is /ɪn/, /ɪm/, /ɪŋ/ because we know from studying the language as a whole that the phones [n], [m], and [ŋ] are realizations of the three phonemes /n/, /m/, /ŋ/, that occur in words such as din, dim, and ding. (<ng> is the spelling for the phoneme /ŋ/). We call such variants of a morpheme allomorphs, which here means that they are different phonemic variants of that morpheme. When there is allomorphy, we furthermore assume that there is a basic or underlying form from which the three variants are derived by rules. In this case, we take the form with /n/ to be the basic form because this form occurs when the prefix is attached to an adjective that starts with a vowel: inedible. What causes this phenomenon of allomorphy? The answer is that there are phonological structure-building rules or constraints that disallow certain combinations of phonemes, in this case, across syllables. We will write the relevant constraints as */np/ and */nk/.²⁴ These constraints, in fact, hold both within syllables and across syllables; we will assume that their domain is the whole word form, the Pword. You can see this when you realize that there are no simplex words like */lɪnp/ or */lɪnk/, whereas /lɪnt/, /lɪmp/, and /lɪŋk/ are all fine. This means that if we combine the prefix in- with proper or credible, the result is an ill-formed sequence of phonemes, namely /np/ and /nk/. As such, these complex words would therefore have to be rejected by the phonological word module. However, it turns out that such violations of structure-building rules and constraints can be “adjusted” or “repaired” if we replace the /n/ by other phonemes that do not violate them:

(12)
i/nd/ecent *i/np/roper *i/nk/redible
↓ ↓
/m/ /ŋ/

Such repair rules are called allomorphy rules. We distinguish such rules from the structure-building rules/constraints²⁵ (that account for well-formed phoneme combinations) because they change the phonemic makeup of morphemes. As such we also call them structure-changing rules.²⁶

An astute reader could now ask why we do not regard the different manifestations of the phoneme /n/ in this prefix as allophones of this phoneme. The answer is that I have previously defined the allophones of a phoneme as speech sounds that cannot independently function as different phonemes. When we discussed the realizations of /k/, we noted that this phoneme can be realized as a plain [k] or as an aspirated [k^h], but these two sounds cannot function as separate phonemes because they cannot function to minimally differentiate words. However, because /n/, /m/, and /ŋ/ are separate phonemes in English (as demonstrated with the three words din, dim, and ding), the rule that replaces /n/ by either /m/ or /ŋ/ cannot, by definition, be an allophonic rule.

The diagram in (13) depicts the organization of the word phonology module.

(13)

Figure 2.08Long description
The diagram begins with the Word Phonology Module has two main branches: Basic units: phonemes on the left, which lead to features, and Structure building rules or constraints on the right lead to structure-changing (repair or allomorphy) rules. Finally, feature and structure-changing lead to the realization slash allophonic rules.

In this diagram I separated the realization rules in their own “box” because I take those rules not to be part of the mental grammar, placing them in the processing systems that account for, in this case, the realization and perception of the phonological layer of words.

We could now ask whether the semantic and syntactic word modules have the same kind of organization as the phonological module. To some extent we know that they do. We have seen that all modules have basic units (-emes) and structure-building rules/constraints, with the possibility of analyzing the -emes in terms of features. Whether we can also motivate that all modules have structure-changing rules and realization rules is a question that I will not discuss in this chapter, but see ML, chapter 6, for some tentative ideas.

The Lexicon Again

Earlier I stated that all morphemes must be stored in the lexicon because the relationship between their form and their meaning is arbitrary and thus unpredictable. Storing morphemes implies that all simplex words (which consist of one free morpheme) are automatically also stored in the lexicon.

One might now ask whether complex words, once formed, have to be entered into the lexicon. As long as these words are fully regular, speakers always have the knowledge to construct and deconstruct these words any time they are needed or occur. This being so, it is likely that over time, as complex words are frequently used and are, indeed, not “new” anymore, people will start committing them to memory and will start using and recognizing them just like they use and recognize simplex words. Perhaps a more compelling reason for assuming that complex words must sometimes be added to the lexicon is when they develop properties in either their form or meaning that cannot be derived from the morphemes that they are composed of. As an example, think of the word sleeper, which literally means “someone who sleeps.” This word is also used to refer to spies who live in a foreign country behaving as if they were regular citizens of that country. This specific meaning, however, cannot be predicted from the parts sleep and -er. Thus, the word sleeper, with this specific meaning, must be listed as such in the lexicon. Complex words can also have unpredictable properties in their phonological form. Why is the vowel in length different from the vowel in long? While adding the suffix -th to an adjective is in itself regular (cf. warmth from warm), the vowel change is unpredictable. This means that length must be stored in the lexicon. It is often assumed that all compounds must be stored as well, because their precise meanings do not follow from the meanings of their parts. We can predict that an arm chair is a kind of chair that has something to do with arms. But what that relation is, is unpredictable and has to be memorized. (In some sinister fairy tale it could mean a chair made of arms.)

Open and Closed Word Classes

When we inspect the class of complex words, we notice that only certain word categories can include complex words, specifically nouns, verbs, and adjectives. In most languages, there are morphological means to make new nouns, verbs, adjectives, and adverbs, but not articles, prepositions, pronouns, conjunctions, and so on. Accordingly, we call the first group open classes, and the latter group closed classes. Closed classes are also called functional categories. Whereas open-class words usually have clear meanings, the meanings of closed-class words are more difficult to pin down, playing a role as the “connectors” for grouping words into sentences.

Sentences

For many people perhaps the most important characteristic of language is the possibility of making sentences. Indeed, when we speak we do not just utter words on their own, although a single word can be a sentence (just like a single morpheme can be a word). A command like Sit! is a “one-word sentence.” While the basic units in word formation (aka morphology) are morphemes, the basic units of sentences are words (whether simplex or complex). However, we will see that making sentences is in many ways like making complex words: In both cases we combine meaningful units to make larger meaningful units (complex words or sentences, respectively). Let us now consider how combining words gives us sentences. The first thing to know is that sentences are not simple linear strings of words (just like complex words are not just linear strings of morphemes). Sentences have a hierarchical structure. The first step in making sentences is to form phrases, which are word groupings within sentences.

A phrase is a group of words that contains a “central” word (e.g., the man, the little man, the man on the train, etc.). These three phrases are about a man. The words that are added to the central word play the role of adding extra information to the central word. In the little man, little adds information about man, narrowing down the meaning from MAN to LITTLE MAN. Then the is added to narrow the meaning down to A SPECIFIC LITTLE MAN. Likewise, in the man on the train, the part on the train (which is itself a phrase) adds information about the man. We will see that what is central in a phrase can be defined in strictly syntactic terms: The central word is the word whose syntactic label determines the syntacteme label of the phrase. (We saw the same thing at the word level: The syntacteme label of the affix determines the syntacteme of the complex word because affixes are “category-makers,” i.e., they are heads.)

To form phrases, words, as one might now expect, cannot be combined randomly. Rather there are syntactic structure-building rules and constraints that limit how words can be combined. This is where we need to refer to the category labels of words. The man is fine, but *the eats is not; nor is the man puts. Certain words cannot combine with certain other words, and in other cases a certain word, like put, must be combined with something following it, which means that the verb put has a contextual frame. There are thus structure-building rules and constraints that regulate the formation of various types of phrases such as noun phrases, verb phrases, and so on.

Here, in (14), is an example of a sentence with its syntactic structure fully specified in the box.

(14)

Figure 2.09Long description
The top node is S (Sentence) has two main branches: On the left, an NP (Noun Phrase) and on the right, a VP (Verb Phrase). The Noun Phrase leads to Art; Noun; The; children. The Verb Phrase leads to Verb Phrase; Noun Phrase; PP (Prepositional Phrase); AP (Adjective Phrase); A (Adjective); P (Preposition); N (Noun); built; the; huge; boat; in; backyard.

The rule system that accounts for how words can be combined in phrases does not only need to refer to the category of words. In some cases, as we just noticed, it is also necessary to know that certain words have specific demands on what they can or must occur with. You will recall that in the case of complex words, the affix was said to require a base with a specific category label. A contextual feature would specify the syntactic category of the base that the affix attaches to. At the same time, the affix, being a category-maker, also determines the syntactic category of the derived word, and we have agreed that category-makers are the head of a construction. Finally, an affix is also restricted to occurring before or after its base. We see the same system at work for phrases, especially those that have a verb as their central word. Consider another example, the verb (to) build. In the typical case, this verb requires a specification of what is built (15).²⁷

1. a. *The children built
2. b. The children built a boat

The first sentence is “odd” because it feels incomplete. What this means is that the verb (to) build has a contextual feature that specifies that it must occur with an NP to its right, as shown in (16).²⁸

(16)
Figure 2.010

A sentence is well-formed if, firstly, it consists of words that are well-formed (at each of their three layers). Secondly, the grouping of words into phrases and sentences must also comply with all requirements at the syntactic, phonological, and semantic layers. At each layer, the basic units are the structures that belong to the words at the corresponding layer. This means that the sentence syntax starts with the syntactic category labels of the words. The sentence phonology starts with the complete phonological word form (notated as “Pword” in (16), but it could also be a phonological group when the word is a compound) and the sentence semantics starts with the complete word meaning of words. A principle of word integrity states that sentence-level systems do not “dip” into the structures that are internal to words. For example, when combining words, the system does not need to know how many syllables words have and what the internal syntactic structure is in case of complex words. We must thus add three more checking modules to the mental grammar, in addition to those that were listed in (8), which all must check the basic units and structure-building systems for their respective layers (17).

(17) Sentence Phonology
Sentence Semantics
Sentence Syntax

As in the case of words, a full representation of a sentence would have a well-formed hierarchical structure for the syntactic layer, the phonological layer, and the semantic layer, but I will not spell those out for phonology and semantics here. A hierarchical structure for the syntactic layer looks like what we have in the box in (14), but below I will suggest some updates for what sentence syntactic structures look like. We must now ask whether the sentence modules also contain structure-changing rules and realization rules.

More Checking

If the combination of words into phrases and sentences is driven by their syntactic properties, we can assume, as we did for complex words, that the sentence is syntactically well-formed. We then need to check whether these combinations are also well-formed at the phonological and semantic layers. This, again, is fully parallel to what we have already seen with respect to complex words. But before we get to phonological and semantic checking, I will explain that the structures that are formed in accordance with the syntactic structure-building rules and their contextual demands may in fact not necessarily be syntactically well-formed after all. In some cases, “repair” is necessary. This happens when certain phrases cannot stay in the location where they have been generated in accordance with the structure-building system. To see this, consider the sentence in (18).

(18) Which house does John build?

In this question sentence, the noun phrase which house occurs at the beginning of the sentence. However, syntactically (and semantically) this phrase represents the “object” of the verb build and would as such be expected to occur after build, given that build has the contextual feature [_ NP] (see (16)).

(19) John builds a house

We therefore assume that the syntactic structure-building rules will place which house after the verb, as in (20).

(20) John builds which house

Then the phrase which house must be moved to the beginning of the sentence. This movement is triggered by a constraint that forbids questions words like which, what, who, where, and how (or rather phrases that start with just a word) to occur as a direct sister of the verb. When they are nonetheless placed in that position by the structure-building rule that obeys the contextual frame of verbs, there must be a “repair,” which in this case is to move the question word to the beginning of the sentence (21).²⁹

(21)
Figure 2.011

The rule in question is called wh-movement, and syntacticians refer to structure-changing rules of this kind as transformational rules. It is important to see that transformations are comparable to allomorphy rules in the word phonology. Both are rule types that are structure changing and they apply in response to a constraint violation that, if not “fixed,” will make the resulting word or sentence ill-formed. The way the rule is depicted in (21) is misleading, however. Transformational rules do not operate on the linear string of words; rather, they refer to the hierarchical organization of the sentence. They are what is called structure dependent.

(22)
Figure 2.012

The basic form of a sentence in which which house occurs in the position to the right of the verb is called its deep or underlying structure and the surface form (which results from applying the transformations) its surface structure.

I will now consider a different example of wh-movement. In this case, the sentence contains a so-called auxiliary verb in addition to the main verb; see (23).

(23)
Figure 2.013

In this case we can see that not only will what move to the beginning, the subject John and the auxiliary will occur in opposite orders.

To show exactly how the needed transformations are structure dependent we must assign a more refined underlying syntactic structure that is generated by the phrase structure rules. This involves a new type of category labels, namely C (for Complementizer) and T (for Tense, which can dominate, among other things, auxiliary verbs), as shown in (24).³⁰

(24)

Figure 2.014Long description
The top node is CP (Complementizer Phrase (equals complete sentence, also called clause)) has two branches: C (Complementizer) node and TP (Tense Phrase) node. The C (Complementizer) leads to C. The TP (Tense Phrase) node leads to T (Tense); will; NP (Noun Phrase); John; V (Verb); buy; what. A curved arrow originates from beneath the Noun Phrase to C (Complementizer).

The position that the wh-phrase is moved to is independently motivated because we also need such a position for embedded sentences that actually start with a complementizer word like that: (I heard) that John will buy a house.

The astute reader will have noted that the order of John and will in (23), which is also the one that a non-question sentence has (John will buy a house), is different from what we see in the wh-question sentence. This raises the question of what the underlying order should be for John and will from which the question sentence and the non-question sentence are derived. Without trying to explain this here, syntacticians argue that the order will John is the underlying order (as in 24) and this means that we need a transformation to change that order to John will in non-question sentences which “raises” John to a position to the left of will. I will not complicate things further by trying to elaborate the structure in (24) that would allow John to move higher up the tree so that it can end up to the left of will.³¹

One could ask why sentences that contain a wh-phrase are not built with the wh-phrase in sentence-initial position in the first place. As already discussed, this would create the problem that the verb build occurs without a following NP as required by its contextual demand. A second reason is that, in his early work, Noam Chomsky argued that structure-building rules are only responsible for “basic” sentence structures, with several other sentence types being derived from the basic structure by transformational rules. This line of reasoning was also applied to cases where two different constructions appear to have the same meaning, as in (25).

(25)
a. The boy kicked the ball (active sentence structure)
b. The ball was kicked by the boy (passive sentence structure)

Chomsky wanted to derive both sentences from one structure, choosing one as basic and the other (in this case, the passive construction) as derived. As mentioned, this can be compared to deriving different allomorphs of a morpheme from one basic morpheme form that is listed in the lexicon.³²

We have seen that the syntactic sentence module comprises structure-changing rules. I will refrain from looking into the possibility of syntactic realization rules. One possibility is that syntactic realization rules account for some optional phrase order differences. Another kind of phonological realization rule concerns the linearization of words in a sentence. Assuming that syntactic structure merely reflects hierarchical grouping, we do need to specify how words are linearized. The required linearization rules could be regarded as realizational rules.³³ Below I also suggest that rules involved in inflection may be a subtype of syntactic realization rules.³⁴ (Recall that we did not consider whether the word syntactic module needs structure-changing rules or realization rules.)

Checking the Phonological and Semantic Layer of Sentences

We now turn (albeit very briefly) to the need for checking of the phonological and semantic layers of sentences. It is easy to show that the phonological structure of a sentence can differ from its syntactic structure. An example concerns a sentence like he is sick, which syntactically groups as [[he]_NP [is sick]_VP]_S, while it seems that the phonological grouping is ((he is) (sick)).³⁵ This grouping is motivated by the fact that it explains the phonological “contraction” of he and is into he’s, which is more preferred than the non-contracted sequence. It is said that (he’s) forms a Pword, another example of a mismatch, because at the syntactic layer this unit corresponds to two syntactic words. There are many additional reasons for saying that we need a phonological structure that is different from the syntactic structure. A classic example is that the intonation pattern (i.e., the sentence “melody”) of the sentence in (26) needs groupings that are different from the syntactic structure.³⁶

(26) Phonological structure: ((This is the cat) (that caught the rat) (that stole the cheese))
Syntactic structure:³⁷
[_NP[this] _VP[is _NP[the cat _CP[that _VP[caught _NP[the rat _CP[that _VP[stole _NP[the cheese]]]]]]]]]

In the phonological structure, we have three so-called intonation phrases,³⁸ each ending with a noun that is pronounced with an elevated pitch. The syntactic structure is much more detailed, as shown in (26). The general idea is that the phonological structure is usually structurally simpler than the syntactic structure, while at the same time containing non-syntactic information such as a strong/weak labeling (not included here to keep things simple) that accounts for phrasal stress (on the bold nouns in (26)) and sentence stress (on the last bold noun).³⁹

If we assume that there is an initial phonological structure that simply copies the syntactic structure, the fact that the phonological layer requires a different kind of hierarchical structure necessitates phonological structure-changing rules at the sentence level. Do we also need phonological realization rules at the sentence level? Actually, the contraction rule that causes he is to become he’s could be regarded as a realization rule.

What about the semantic layer? We can easily show that a specific syntactic construction sometimes correlates with two different semantic structures (i.e., is semantically ambiguous). An example involves the sentence Everyone loves someone. Syntactically, this sentence has the (simplified) structure in (27).

(27)
Figure 2.015

However, this sentence is semantically ambiguous between the two meanings shown in (28).

1. a. There is someone such that everyone loves this person
2. b. Everyone loves some person (not necessarily the same person)

Semanticists (and logicians) call this a “scope ambiguity,” which means that someone has scope over everyone or everyone has scope over someone. To account for the ambiguity, we need to assume that there are two distinct semantic structures that must be subject to structure-changing rules if we assume that there is an initial semantic structure that simply copies the syntactic structure.

(29)
Figure 2.016

What this example shows is that the semantic structure can differ from the syntactic structure in order to unambiguously represent the logical relations that obtain between noun phrases.⁴⁰ This motivates semantic structure-changing rules. I will here not consider whether we also need semantic realization rules at the sentence level.

Inflection

Most languages have a system that looks like morphological derivation in that it attaches affixes to words, while it also looks like sentence syntax in that the affixes play a role that is comparable to function words (like the, and, if, etc.). For example, the Swedish counterpart of the house is huset, where the affix -et means “the.” This kind of affixation is called inflection.⁴¹ What some languages do by using inflection, others do in the sentence syntax. To give another example of the fact that inflectional affixes have functions that can also be fulfilled by independent words, consider that, in English, the formation of the comparative form of adjectives (bigger) is inflectional, which is evidenced by the fact that the comparative meaning can also be achieved by combining words (more important). The suffix -er (inflectional) and the word more (syntactic) are functionally equivalent.⁴²

It makes sense to separate inflectional affixes from derivational affixes, which are used to create new words. Inflectional affixes do not result in new words; they function to make words ready for syntactic use (i.e., for use in a certain syntactic structure). Another way to put this is to say that the addition of inflectional affixes to a word is determined by the syntactic context of that word. To illustrate this, consider the sentence in (30).

(30) *He is smart than Pete ⇒ He is smarter than Pete

In this example, the presence of the word than requires the adjective to be a comparative (i.e., smarter). This means that comparative -er must be an inflectional suffix.

The set of inflected forms for a word is called a paradigm. English does not have a lot of inflection; in fact, it has very little. There is plural -s for nouns, as well as a “genitive” -s, as in the man’s bicycle. For verbs, there are three inflectional endings: past tense (which is also used for the past participle) -ed (walk-ed), progressive -ing (walk-ing) and third-person singular -s (walk-s). The adjective has the comparative -er (small-er) and the superlative -est (small-est). There are many languages that have dozens of inflectional endings for nouns and verbs.

The reader might ask when inflectional affixes are added if they are part of the syntax. I will not discuss this question here except to say that inflectional affixes can be analyzed as phonological realizations of syntactic features. As such, they would fall within the class of syntactic realization rules; see ML, chapter 6.

The Lexicon Once More

Sentences, including the inflectional properties of words, are formed “on line,” while speaking. The usual assumption is that sentences, once formed, need not be stored in the lexicon, unless specific sentences develop what is called an idiomatic meaning, as in “kick the bucket” or “bite the bullet.” In such cases, while the syntactic structure is regular, the meaning cannot be derived from the words and how they have been combined. We must thus include such expressions in the lexicon.

As for inflectional endings, when fully regular, inflected words do not have to be listed in the lexicon, unless the inflectional rule in question is itself irregular. This means that the plural of nouns that do not follow the productive rules of “adding -s” have to be listed in the lexicon. This includes cases like foot/feet and child/children and several others.

In Chapter 5, we will discuss approaches that assign an even bigger role to the lexicon which gravitates toward incorporating regular constructions and even “construction schemas,” which reduces sentence syntax to combining (or unifying) such stored sentence fragments (provided with words in the case of schemas) in a bottom-up fashion.

Interim Summary: A Model of the Mental Grammar

Summarizing, we can say that the mental grammar is a system of submodules that produces linguistic expressions such as words, complex words, phrases, and sentences. Putting aside “words” that may miss layer (of which I have not provided examples), each of these units has a phonological, a semantic, and a categorial dimension. The grammar must make sure that these three dimensions of linguistic expressions are all well-formed. To that effect, there are three types of systems and, because I have separated words from sentences, there are six systems, complemented by the lexicon (31).

(31)

Figure 2.017Long description
On the left is the word Lexicon, followed by a greater-than symbol pointing to a middle box labeled Words (Morphology), which contains Word syntax, Word phonology, and Word semantics. Another greater-than symbol points to the right box labeled Sentences (Syntax), which contains Sentence syntax, Sentence phonology, and Sentence semantics.

The word modules together constitute what is called the morphological component of the mental grammar, while the term syntax is usually used for the sentence modules, even though in this chapter I am using syntax to refer to the categorical layers of both the word and the sentence module.⁴³

I have assumed that the syntactic systems are the driving engines for the formation of complex words and phrases/sentences, which means that the two other submodules come in to check whether the phonological and the semantic layer are well-formed as well, which often they are not, which then entails either rejection or repair.

Each subsystem contains a set of basic units (-emes⁴⁴) as well as structure-building rules and constraints that characterize their possible combinations. Combinations of basic units that are permitted by the structure-building rules are called well-formed or grammatical. Those that do not are ill-formed or ungrammatical. While users of language do not consciously know what the primitives or the rules for combining them are, they do have the ability to say whether a structure is well-formed. For example, speakers of English will agree that lpokl is not a possible syllable and thus not a possible structure at the phonological word layer. This so-called grammaticality judgment is purely about the form and is independent of whatever the meaning or syntactic properties of this “word” might be. Linguists use such grammaticality judgments as data in their attempts to understand the phonological systems of, in this example, the phonological layer of words.

Structural Analogy

In ML, chapter 6, I suggested that the six modules of the mental grammar have an analogous organization, meaning that the modules display a structural analogy; see (32).⁴⁵

(32)

Figure 2.018Long description
The diagram begins with Module, which has two main branches: Basic units on the left, which lead to features, and Structure building rules on the right lead to structure-changing (repair) rules. Finally, feature and structure-changing lead to the realization rules.

We have seen that all parts of this diagram apply in the word phonology module. We have also seen that parts of it also apply to the other word modules (word syntax and word semantics) although we did not give examples of structure-changing rules and realization rules for these modules. With respect to the sentence modules, we focused on sentence syntax, where we learned that, in this case, we find structure-changing rules (transformations). But we also motivated structure-changing rules for the phonological and semantic sentence layers. I nevertheless suggest that the organizational structure in (32) is the “template” on which each submodule is based, implying that each module may require structure-changing rules and realizational rules, the latter due to processing modules.

Section II Some Refinements and Questions of Innateness

Introduction

In the second section of the chapter, after discussing some additional properties of mental grammars, such as the central property of recursion, the fact that sentences have intonational melodies, and the role of co-speech gesture, we will turn in some detail to questions about the innateness of the language capacity. This section will conclude with an A and Q about the Poverty of the Stimulus Argument for Innateness.

Recursion

The approach to language and the mental grammar that Noam Chomsky has advocated attributes a central role to the syntactic structure-building rules (at both the word and sentence levels). In this chapter, I have also given these rules the “lead” in accounting for complex words and phrases/sentences, followed by the checking function of the phonological and semantic modules.

A remarkable and long-recognized structural property of human language is the possibility of allowing structure-building rules to apply recursively, which here means that structures that result from combining smaller units can themselves be made part of larger units. As a result, it is bound to happen that a complex structure of a certain type contains a smaller structure of the same type.⁴⁶ Perhaps the most “spectacular” instance of recursion can be observed at the syntactic layer of sentences, especially when we see that sentences can contain structures that are themselves sentences.

1. a. A sentence:
  [He fell] _S
2. b. A sentence inside a sentence:
  [I heard [that he fell] _S] _S
3. c. A sentence inside a sentence inside a sentence:
  [I said [that I heard [that he fell] _S] _S] _S
4. d. A sentence inside a sentence inside a sentence inside a sentence:
  [He heard [that I said [that I heard [that he fell] _S] _S] _S] _S

However, we can also demonstrate that morphology allows recursive structures, as in (34).

(34) Derivation: [[[[[[act] _N ive] _A ate] _V ion] _N al] _A ism] _N
Compounding: [[[[[[arm] _N [chair] _N] _N factory] _N director] _N vacancy] _N announcement] _N

Due to the possibility of recursion, languages have the property of infinity. There is no finite number of complex words nor of sentences. Another way to put this is to say that there is no longest word or longest sentence. The possibility for a structure of some category to contain a structure of the same category is sometimes called “the Russian doll effect.”

It seems obvious that recursivity also occurs in semantic structure given that the syntactic examples provided are what they are so that they can capture the recursivity of the corresponding semantic structures in which “propositions” contain “propositions” (where propositions are conceptual structures that capture one simple, complete thought). It is a matter of debate whether recursivity is a property that also occurs in phonological structure (which I think is the case), but I will not discuss that issue here.⁴⁷

According to Noam Chomsky, recursion is the hallmark of human language. For him, and his followers, recursion is an exclusive property of hierarchical structure at the syntactic layer. He also claims that no other animal species can “wrap its mind around” recursion. As such, recursion is claimed to be an exclusive property of the human mind and even of the syntactic module. We will return to these strong claims in several subsequent chapters.

Intonation

All languages have “sentence melodies,” which we call intonation. The phonetic correlate of intonation is fundamental frequency, F0, the perceptual correlate of which is called pitch. On the articulation side, pitch variations result from different rates of vibration of the vocal cords. If pitch functions in a way that is “linguistically relevant” (i.e., contrastive), we use the term tone. If F0/pitch is relevant lexically in a language (thus distinguishing different words that are otherwise phonologically identical), the language is a so-called tone language. In a tone language, the presence of tones is contrastive in the same way that in English the difference between front and back vowels can be contrastive. All languages use intonation (use of pitch at the sentence level), but not all languages use pitch at the lexical level.

When we utter linguistic expressions, pitch properties can also reflect so-called paralinguistic aspects, caused by the age or “size” of the speaker, or of the psychological state (“mood”) or physical state of the speaker. In studying tone or intonation, it is important to abstract away from the effect that such factors have on F0.

The question now arises of how the tune (the tones) are aligned with the phonological structure of words and sentences. This is a very difficult subject. The simple part of the problem is that intonational tones that highlight what is important information in a sentence line up with syllables that carry phrasal stress of the relevant sentence part. At this point, we need to bring in the notion focus. Consider the sentences in (35).

(35) Question: Where did you go yesterday?
Answer: I went with my sister [to a movie].

The part in square brackets provides the important information that the questioner is after. We say that this part is in focus, and one way is to mark it with an intonational tone, as shown in (36).

(36)
H
|
I went with my sister [to a movie]_[+focus]

Since the speaker has here volunteered another piece of information that the question poser may not know, they could decide to put an intonational tone on sister as well, thus effectively placing that constituent also in focus:

(37)
H H
| |
I went [with my sister] _[+focus][to a movie] _[+focus]

Let us now ask at which layer of the sentence does focus-marking take place? Even though focus seems to be an inherently semantic notion (having to do with the “information packaging” of a sentence), most frameworks assume that focus-marking occurs at the syntactic level. In that sense, focus-marking can be seen as one way in which the morphosyntactic structure serves semantics, although this seems to imply that the semantic structure must also have a way of marking what is important information.

Intonational melodies also need so-called boundary tones that mark the edges of domains that we have called the intonational phrase. I will assume that focus tones are phonological realizations of focus-marking (in which sense they are functionally similar to inflectional endings) and that boundary tones are inserted at the phonological sentence layer to mark edges of prosodic intonational phrases, thus at the prosodic level.

The Role of Gesture

The words that make up sentences are not only aligned with an intonational tune. When people speak, they will also always gesture with their hands. This is called co-speech gesture. While co-speech gesturing is not the same for every individual or culture, and may be subdued in some “formal” situations, it has also long been known that the use of gestures can help bring “the message” across. Several scholars, such as David McNeill, Susan Goldin-Meadow, and Adam Kendon, have studied the relationship between co-speech gesture and speech intensively, analyzing the form of the gestures (handshapes and hand movements) and their meaning.⁴⁸ There are specific, regular ways in which gestures can be portioned into different “phases” (like beginning, middle, and end, much like we portioned syllables in smaller units), and the alignment of these phases with the spoken words is well orchestrated. Researchers have found that there are general types of gesture that show a correspondence with the content of an utterance, and they have established that gestures are lined up temporally with the syntax of speech in non-arbitrary ways. Gesturing supports speech and is, at the same time, dependent on, or follows the structure of, the spoken language. That said, gesture researchers have shown that the hand can sometimes anticipate or, in other cases, contradict the words, which shows that gesturing can express what is on people’s mind even before they become consciously aware of it.⁴⁹

Someone might say that we only have to leave out “that accompany speech,” and we get a sign language. However, this is absolutely not the case. Sign language is not just gesture without speech; see ML, chapter 13. You will realize this when you turn on your TV, find a lively speaker and turn off the sound. Aside from a few signs like “thumbs up” or “OK” that can be used without speech (called emblems) and that in many cultures occur in impressive numbers, co-speech gesture needs speech to be intelligible.

Rightly or wrongly, most linguists construct their theories of the mental grammar only with reference to verbal communication, but some propose mental grammars that integrate verbal and nonverbal communication. Observe that some words are pretty useless unless they are accompanied by a pointing gesture of some sort. I’m thinking of words like this, that, here, there, called deictic words.⁵⁰

What Could Be Innate and What Does That Do for Language Acquisition?

Let us ask which aspects of the mental grammar are likely candidates for being innate and how that relates to the process of language acquisition by children. The backdrop of what we are about to discuss is Chomsky’s proposal that the innate Universal Grammar can be seen as a system that contains principles and parameters. Principles capture the universals of language; they state the properties that all mental grammars display. Children do not need to learn them; they are part of their innate guide to language. Parameters state properties for which languages can differ, but here the central idea was that languages do not differ in unlimited ways. Rather, where languages differ, they seem to display one out of a limited set of choices, often only two. A child being exposed to a language will decide, given the input, which choices apply for their language for each of the parameters that are stated in UG. I refer to ML, chapter 9, for a more extensive explanation of the “Principles and Parameters” model.⁵¹

It seems reasonable to suppose that the overall architecture of the grammar, including the structure of its modules (as indicated in the previous sections) is not something that learners have to come up with themselves. In a sense, this overall structure accounts for a universal grammar plan, just like our genes specify a body plan for all human beings. The possibility that the different grammatical modules are structurally analogous suggests that the task of the language learner is made easier given the expectation that all modules have the same organization. As such, this kind of analogy contributes to solving Plato’s problem in the domain of language acquisition by explaining that the child gets a lot of information about mental grammars “for free.”

But do we have to say that the overall grammar plan, including the organization of its submodules, results from an innate, language-domain-specific guide (Universal Grammar, UG)? One might argue that the grammar design is “logical” in the sense that we would expect to find it in any system that generates complex structures of some kind. This raises the reasonable conjecture that this plan is the result of general factors that go beyond what needs to be innate specifically for language. Such general factors could be rooted in human cognitive systems that are not language-specific or that reflect principles of design that apply to all complex things, whether in the human mind or, more generally, in the mind-external world. We will return to these questions and possibilities in some detail in Chapter 4.

Turning to ways in which languages differ, we must assume that the child needs to know the features for each of the three word modules, that is for the smallest units that define the -emes in each module:

1. a. a finite list of semantic features
2. b. a finite list of syntactic features
3. c. a finite list of phonological features

The question that arises here is whether these sets are universal, in the sense of being the same for all languages, or whether, given a universally available set, the child needs to set parameters that determine which specific subset of the universal set applies to their language. Such parameters would essentially specify for each feature whether or not it is “active” in the language at hand. This means that each feature is in itself a parameter, which we might call a feature parameter.⁵²

Both the idea of there being an innate set of features that are used in all languages and that there are innately available feature sets from which choices are made are controversial, and views differ depending on which module is considered. There are progressively stronger claims that features may be emergent, which means that children establish them during the process of language acquisition due to their general ability to form categories based on perceptual experiences. I refer to ML, chapter 5, for a more in-depth discussion and references.

Thirdly, with respect to structure-building rules and constraints, it has been claimed that the actual rules and constraints that are used in a given language constitute a “choice” from a set of pre-given (i.e., innate) structure-building options (which we could call structure parameters). Structure-building rules and constraints are relevant at different levels in each layer. They concern how features can be combined to form -emes and how -emes can be combined to former larger units (such as the syllables that consist of phonemes). The pre-given (i.e., innate) structural parameters with their “switches” account for the fact that differences between languages appear to be limited in terms of their structural options. In this view, children need to make choices (i.e., set the parameters to a value) based on the primary linguistic data that they are exposed to. As for the format of structure-building rules and constraints, nativists assume that there are universal principles that stipulate what such rules and constraints look like and what they can do. For example, as for structure-building rules, a prevailing idea is that all structure is binary, which means that each node in a structure can have no more than two daughters. Fourthly, with reference to structure-changing rules, we expect that, even though languages differ in the kinds of “repairs” that they have, there are also at least general formats for such rules. A general trend has been to reduce repair operations to a few very general possibilities, with potential “overapplication” being curtailed by constraints that would prevent repair rules from “making things worse.” As in the case of structure-building rules/constraints, if there are limited options for the kinds of rules that languages can have in terms of their formal operations, we could speak of rule format parameters that limit the specific rules that are required in a language as determined by the language input data. For example, with respect to allomorphy rules in word phonology, which rules are necessary for a language is dependent on the alternations that have been caused by historical processes of sound change. However, in the domain of sentence syntax, attempts have been made to reduce all structure-changing transformations to just one rule whose specific working is fully determined by the syntactic structure of any given sentence. Because such drastic proposals have not been dominant in phonology, this means that the child needs to learn more in the domain of word phonology than in the domain of sentence syntax. The task of what comes “for free” in terms of principles, what needs to be chosen in terms of parameters, and what needs to be learned “brute force” can thus differ for the different modules and thus for the layers of linguistic expressions.⁵³

Whether language learning cannot be fully successful if we merely postulate general learning capacities remains a hotly debated issue. Even linguists who used to accept Chomsky’s claims about the need for language-specific innate knowledge are now saying that general learning capacities are sufficient, for example in the domain of word phonology. It is then said that the phonology is emergent, meaning established from scratch.⁵⁴ We discuss such a proposal in some detail in Chapter 5.

The pivotal reason for postulating an innate universal grammar plan has always been that it explains why children can construct their mental grammars efficiently, quickly, while being exposed to primary linguistic data that underdetermine the mental grammar that children must come up with. Given the alleged “poverty of the stimulus,” Chomsky argued that children must have a built-in “manual” that prespecifies the universal properties of grammars and the limited grammatical choices that children need to make.

In this and the preceding chapter I have alluded to the fact that the content of this language manual, which was initially taken to be rather rich by Noam Chomsky, has become thinner and thinner over time, a development that eventually resulted in the idea that only the notion of recursivity needs to be specified in it. Later in this chapter, I will ask whether, whatever its content, there could ever be evidence that falsifies the claim that humans are born with a universal grammar. It is perhaps no coincidence that it has always been important for those who follow Chomsky’s IH for language to look for arguments in support of this nativist hypothesis rather than ways to falsify it. In ML, we critically examined a number of such supporting arguments, which all share the property that they are critically based on linguistic evidence. The present book examines arguments from other quarters. I refer to my summary of the linguistic arguments in Chapter 1 of the present book. The next section offers a dialogue about one of the most central linguistic arguments.

A and Q Discuss the Poverty of the Stimulus Argument for Innateness

In the following dialogue, person A is someone who accepts the nativist stance, while Q is a skeptic, perhaps even an empiricist.

A: I think that the acquisition of language is one of the most miraculous phenomena in the world.

Q: How so?

A: All over the world, thousands of linguistics professors study and work hard to describe the grammar of English. The English language has been studied for centuries. To date, however, there is no complete description of the grammar of this language. Yet, young children with their “little brains” manage to work out the grammar of English in just a couple of years. Some discussion is possible about when we can say that the acquisition process is complete. According to many linguists, the grammar is only fully in place when the child is five years old, but it is obvious that many three-year-olds know most of the relevant things already. But even if we were to assume that acquisition continues until puberty, it is still amazing that a single individual can do subconsciously what thousands of professors cannot accomplish consciously. I call that a miracle.

Q: But there are extensive books that describe “the grammar of English,” aren’t there? Are you telling me that all those books are incomplete?

A: Yes, they are very incomplete. If there was a complete description of the English grammar, it would be possible to put all the rules into a computer program and have computers that talk, read, and write. To date, that’s wishful thinking. In fact, for some time, thousands of computer programmers and computer scientists have joined the army of linguists in trying to capture the rules of English. We are still waiting for impressive results. Devices that answer questions or follow instructions do exist in increasing numbers, but it is not clear that these systems use programs that resemble what people do when they use language. They draw on very large databases to establish frequencies for which words tend to occur together in texts and then they use such frequencies to generate text that looks like human language. Such systems are not challenged to apply the rule for question formation in the right manner in sentences such as Is John still sleeping (from John is still sleeping), and this is just one of the many grammatical rules that even three-year-olds apply flawlessly.

Q: Given the success of current chatbots, it would seem that we do not need a full analysis of any language to make it possible to generate sensible text that resembles human language …

A: Yes, this has become a real discussion and we can come back to that later. For the moment, take my word for it that chatbots do not learn language in the way that children do it. There is no disagreement on that. Such bots need to be trained on billions of sentences, and children achieve a better result while being exposed to much less than that.⁵⁵

I might add that a chatbot will regurgitate whatever you train it on. So, if you train it on “sentences” that are ungrammatical according to just about every linguistic theory, thus impossible or highly unlikely to occur in any human language, the bot will produce a nonhuman “language.” What this means is that whatever program the bot uses to learn “language,” this program cannot count as a theory of human language. The goal of linguistics is to find such a theory. Designers of chatbots simply do not share this goal. I think their goal is to make a lot of money. This is not to say that the bot technology does not have potential uses that can be beneficial for human societies, although we already have seen that they can also do harm by delivering nonsense, falsehoods, and “fake news” that people come to rely on.

Q: So how does a child do it?

A: That, my friend, is what most linguists take to be the central question of linguistics. You have just (re)formulated Plato’s problem (“How do people come to know so much based on so little experience?”). Focusing on knowledge of language, the answer that nativists give is that there has to be innate knowledge that helps the child along. Just to briefly come back to your chatbots, children have not been exposed to huge amounts of sentences by the time they show evidence of understanding and producing sentences.

Q: But empiricists claim that whatever infants are exposed to is enough to detect patterns, using statistical methods that are perhaps comparable to what enable chatbots to form sentences. Isn’t it possible that our species, with its powerful brain, is simply capable of discovering all the relevant basic units and rules from carefully observing the language input, without having any innate knowledge that specifically bears on language? How do we really know that a child has all this innate linguistic information?

A: I see where you are going with this … You yourself sound like an empiricist, and that’s OK. If you want to show that language can be learned from scratch, you have a better, simpler theory, because it does not need to postulate an innate system beyond general cognitive learning capacities. So, if you want to make that case, do it! Argue for it. Show how acquisition is possible based on limited input with the assumption that there is nothing except for a powerful general learning device when the child is born. But if you say that that powerful device has nothing to do with language per se, then you have to explain why this device doesn’t allow the child to do subtraction, addition, and multiplication at the age of three. These operations are simpler than most of what is going on in language. A three-year-old can make a sentence like:

(39) What was it that you said that Sammy told to mommy?

This is a pretty complicated sentence because the question word “what” refers to “what Sammy told to mommy.” In other words, what is the object of told, but this verb is far removed from the beginning of the sentence. To “link” the word what to the verb told, you have to ignore the verb say. At the age that children understand the meaning of this sentence (even though they may not yet produce them; that comes a little later), they cannot tell you how much 2 plus 2 is.

It seems to me that the way we learn counting, music, reading, writing, or many other things is very different from the way we acquire language. All these areas require extensive instruction and training.⁵⁶ Language learning does not involve explicit instruction at all, nor does it appear to help much when we try to correct a child’s mistake. Kids stick to their “mistakes” until they are ready to move on. Language acquisition just happens! In fact, there’s no way of preventing a child from acquiring language, except by keeping them away from all language input, which is an evil thing to do. Noam Chomsky therefore sees language “learning” (which he rather calls acquisition) as a form of biological growth or maturation. We “grow” our grammar just like we grow organs and limbs. Our genes specify the structure of these bodily parts, and sufficient food and oxygen does the rest. Likewise, we might say that our genes specify the possible shape of grammars, whereas some language input and social interaction do the rest.

By the way, I know that people argue that children are born with multiple instincts, not just a language instinct. It has been claimed that there is a math instinct and a music instinct, and so on. However, this simply reinforces the idea that language acquisition is likely also based on an instinct.

Q: That means that there are other complicated things that a child can do early on?

A: My guess is that if children were exposed to music and math early on, we would see that they can do amazing things early in life. In fact, parents sing to children and expose them to other forms of music, and this explains why children can sing (i.e., hold melodies) and, later on, whistle melodies. But a better example is vision. I guess vision is pretty complicated. Indeed, researchers who study vision also assume that lots of what it takes to process visual sensory information (such that our mind constructs a 3D visual image of the world around us) is innate. At the same time, they also say that the ability to see needs to be fine-tuned after birth on the basis of exposure to the environment.⁵⁷ This is very similar to Chomsky’s story about language acquisition. In both language and vision, a lot of “structure” is already there, at birth or developing a little later based on genetic specifications and epigenetic processes, but the child needs to fine-tune the system on the basis of exposure to specific languages and visual input.

Q: Are the views that you defend here widely accepted?

A: Well, that depends on what you regard as “widely.” There are many linguists and cognitive scientists who do not buy the Chomskyan line of reasoning. They argue that postulating innate knowledge may not be necessary, at least not to the extent that Chomsky suggests in his earlier work. They claim that a general data-driven learning capacity can do more than you might think, and they design (and implement) computer models (automated learners in the form of “connectionist” networks of some kind) that are fed with language (or language-like) expressions, on the basis of which these programs extract statistical regularities and probabilities, which are then subsequently used to automatically produce new expressions. Again, we might refer to the human-like language utterances of chatbots as evidence.

It is not clear, however, that this work shows that smart computer programs can extract grammatical rules from these inputs, which means that it is not the case that this line of work has succeeded in capturing all the fine details of languages or is capable of representing utterances as hierarchically structured objects rather than as linear stings of units. Due to their success, chatbots give people the impression that the system has grammatical knowledge. But that is not the case. The goal of such systems is not to model how people learn languages, but rather to produce outputs that people find useful, so that the product will sell.

Many linguists (when they are not analyzing phenomena in actual languages) take it upon themselves to construct a model (or hypothesis) of the remarkable capability that enables kids to acquire language. The dispute is about what this model looks like and to what extent it contains language-domain-specific information. Many Chomskyan linguists feel that an important part of this hypothesis is formed by a set of innate principles and parameters that are specific to human language and that leave the child little choice about what kind of grammar fits the language that they hear. Meanwhile, people studying vision do the same thing. If, at the end of the day, it turns out that the principles and parameters that we need for language acquisition are (in part) similar to those in the area of acquisition of vision, we might conclude that certain innate capabilities are indeed more general, whereas others are domain-specific. It is of course also possible that different systems use the same principles, but in a specialized form.

Chomsky’s proposal for an innate capacity consisting of principles and parameters is meant to show that it is logically possible for the child to acquire the grammar of a language. By saying that it is logically possible, Chomsky means that the innate system is rich enough to enable a child to bridge the gap between the “raw” input data and knowledge of a specific grammar in a finite number of steps. This is especially important because the “raw” data underdetermine the richness of the mental grammar that is required to speak the language. Since the input is “poor” and the output of the learning process (i.e., the mental grammar) is “rich,” it is necessary to postulate an innate system that bridges the gap:

(40)
(English) input ⇒ Universal Grammar ⇒ Mental Grammar (of English)
“poor” “rich”

Thus, to propose a model for the innate system is to offer a solution for the logical problem of language acquisition, and the crucial argument to motivate postulating this system is called the poverty of the stimulus argument. (I think that van der Hulst did a fine job showing in detail why the input is poor in ML, chapter 9.)

By the way, let us note here that one can propose a solution to the logical problem (in the form of a theory of Universal Grammar) without going into the specifics of the actual process (or stages) of acquisition; that is, without aiming to predict the nature and timing of the stages that children will go through on their way to “grammatical adulthood.”

Q: But we know that children go through various stages before they speak “correctly.” Is it the case that the phases that they go through tell us something about the structure of the innate Universal Grammar?

A: I think so. By studying the phases of acquisition, linguists indeed hope to get a view on the set of principles and parameters that drive the development of the grammar. Studying the stages of acquisition is also a goal in itself, of course, a part of developmental (cognitive) psychology. In addition, knowing the “normal” course of acquisition may help us to detect and understand abnormalities, which may be of great importance for providing adequate treatment.

It seems obvious that a good and complete theory of acquisition will, in fact, explain why the child goes through the stages of development, thus solving what is sometimes called the developmental problem of language acquisition.

It is very important to emphasize that children, when not yet speaking the way adults do, do not make “mistakes.” Rather, they are working with a mental grammar that is not yet like the mental grammar of their caregivers; and likely it will never be completely identical. At each stage of development, they produce utterances that are grammatical according to their mental grammar at that point.

Q: You said that language learning and other forms of learning are different. Can you elaborate on that?

A: Notice that when we consider other domains of learning – let us say, learning how to write or learning how to do subtraction – the input that we offer to the child is highly structured. Teachers and parents do everything they possibly can to explain the rules as clearly as possible. They make the child practice over and over again until they get it right.

The learning (or acquisition) of language cannot work the same way, for the simple reason that teachers and parents do not know the rules of language themselves, at least not in a conscious way. They know them subconsciously (because they speak and understand the language), but they cannot make them explicit.

There is an interesting paradox here. Children have no trouble picking up the language system, which is very, very complex indeed. On the other hand, they do have trouble learning to spell, and I can tell you that the spelling rules are much, much simpler than the rules of the grammar, even for English, which has some pretty horrendous spelling conventions. Whenever we see that something that is very hard comes easy to the child, we suspect that the child has an innate “helping hand,” a heads-up so to speak, about what to expect and how things work.

Q: But surely, adults who have been through schools and colleges can explain the rules of grammar!

A: Forget it. You might say that everyone can observe and subsequently say explicitly that in English the subject usually comes before the verb, while the object follows the verb. But such “rules” barely scratch the surface! Consider the following sentences:

(41)

Figure 2.019Long description
Sentence (a): John washed his face. A curved line connects John and his. Sentence (b): John’s mother washed him. A curved line connects John’s and him. Sentence (c) John washed him. A curved line connects John and him, but this line has an X marked on it. On the right, the text for the first two sentences reads him refers to John or someone else, and the text for the last sentence reads him can only refer to someone else.

Words like him or his are pronouns that can refer to someone already mentioned in the sentence or to someone else altogether. Which parent can formulate the rule that will tell you why him in sentence (41c) cannot refer to John, while in (41b) it can, just like his in (41a)? Or consider the following sentences, where the pronoun he is at stake:

(42)

Figure 2.020Long description
The diagram (a) shows that the sentence is When John left, he was sad. A curved line connects John and he. Diagram (b) shows that the sentence is John was sad when he left. A curved line connects John and he. Diagram (c) shows the sentence is When he left, John was sad. A curved line connects he and John. Diagram (d) shows that the sentence is He was sad when John left. A curved line connects He and John, but this line has a large X drawn over its center. To the right, the explanation for the first three sentences reads that he refers to John or someone else, and the explanation for the last sentence reads that he can only refer to someone else.

Again, can you imagine a parent (or a teacher who is not a linguist) who can explain why he in the sentence in (42d) is different from he in the other three sentences? When you might think that he cannot refer to John when John is in an embedded sentence that follows the main sentence, then why can his in the following sentence do precisely that?

(43)
Figure 2.021

Linguists have found answers to these questions, and these answers (i.e., the rules about coreference) are not simple and obvious. (By the way, these rules are structure dependent in that they refer to the hierarchical syntactic structure of the sentence.) Interestingly and importantly, coreference rules (also called the Binding Principles) appear to be largely universal, allowing very little variation across languages.⁵⁸

In short, it is highly unlikely that children are explicitly instructed in these matters. It is also unlikely that they have enough time and input to figure it out from scratch.

Q: Thank you for this insightful discussion. You convinced me of the fact that the knowledge that children must have (and adults of course) of their language is quite intricate. I understand why Chomsky concluded that it is really not obvious, if not entirely implausible, that children can learn all this without some sophisticated a priori abilities.

A: You’re welcome.

Q: Hold on, I have one final question. I have read or heard somewhere that Chomsky has changed his views on what exactly needs to be specified in the innate Universal Grammar. Can you say something about that?

A: Right! Chomsky’s initial idea was that UG is rich in content, which would go a long way to explain that children do not need too much input and time to become successful language users. But that has changed. Following the general guideline that scientific theories need to be as simple as possible, Chomsky launched what he called the Minimalist Program, which aimed at reducing what is contained in UG as much as possible. In fact, some might argue that he went very far in his reduction because it led to the hypothesis that UG merely contains a combinatorial system that permits the generation of an infinite number of linguistic expressions in terms of their syntactic features, leaving semantic and phonological properties to systems that lie outside UG. A consequence of this development was that many recurrent and even universal properties of mental grammars that are no longer anchored in UG have to be explained in other ways. Some of those ways are cognitive systems that humans have and that serve other purposes than just language. In fact, one could say that these systems are of the kind that empiricists have always taken to be responsible for learning languages. But Chomsky also appeals to explanations that refer to natural laws (i.e., laws of physics that underlie the form of practically everything in the natural world).

Q: That sounds a bit mysterious …

A: … you’re right about that too. Rather than going into this matter now, let me refer you to Chapter 4. Is that OK?

Q: Sure. And thank you for your answers. One last question: It is obvious that humans create sentences because they have something in mind that they want to express in language. At the same time when I hear a sentence, I infer what it means. Do chatbots understand language?

A: This is a very hard question, urgent as it is. Let me for now give you what ChatGPT says about this (on 12/7/2024) …

ChatGPT: No, ChatGPT doesn’t “understand” things in the way humans do. It doesn’t have consciousness, self-awareness, or subjective experiences. Instead, it processes language based on patterns it has learned from vast amounts of text data. When you ask a question or make a statement, ChatGPT generates a response by predicting the most likely sequence of words based on the patterns in the data it was trained on. In other words, ChatGPT’s responses are the result of statistical modeling, not actual comprehension or reasoning. It doesn’t know what it’s saying, but it can generate language that often seems meaningful because of the way it has been trained.

Q: Well, at least we can say that ChatGPT “knows itself.”

Section III Structural Analogy

Introduction

In this third section, I will further discuss the notion of structural analogy in more detail. We then look at some different views on the organization of the mental grammar. We conclude by asking whether the Innateness Hypothesis for language can be falsified.

Analogies between Syntax and Phonology

We have discussed the minimalist view of Noam Chomsky, who now regards the innate language faculty to comprise only a formal syntactic module, which we will call formal syntax, which is characterized by a single operation (recursive Merge), and has the function of structuring human thought. In this view, phonology is “merely” a part of the externalization system that produces language in the public domain. We must assume that the syntactic system feeds into this externalization process. An alternative view is that a conceptual system structures human thought and that delivering language in the public domain is the task of the mental grammar which comprises two central modules: Formal Syntax and Formal Phonology.⁵⁹

An argument for putting phonology and syntax side-by-side can be derived from the fact that both systems are highly analogous in terms of their internal organization. In this section, I will mention two striking parallels.

A first parallel is that in both systems two basic categories play a central role. It has been argued that from the basic syntactic categories of Noun and Verb all other word categories have developed through so-called grammaticalization processes which are historical processes of language change that derived functional words and eventually affixes from these two word classes.⁶⁰ In phonology the central categories are Vowel and Consonant. It has long been argued that in the process of first language acquisition, a distinction between these two units comes first. It is plausible that processes of language change could have led to the emergence of phoneme systems after a basic C/V distinction was made first. The fact that the phonological and syntactic categories that we see in modern languages have emerged from such basic units is reflected in hierarchical structured sets of syntactic and phonological features as recognized in many current theories.⁶¹

Let us now turn to the structure that these two basic categories enter into in both domains.

A common view of syllables is that they consist of two parts, called the onset and the rhyme. The onset is formed by the consonant or consonants that precede the vowel, while the rhyme is, well, the part of the syllable that rhymes (44).

(44)
Figure 2.022

It is striking that syllable structure and the structure of syntactic phrases are very similar, if not identical, given the common claim in formal syntax that syntactic phrases have a “two-level” structure in which the head (i.e., the central word) first combines with a complement and then secondarily with a specifier, both being syntactic phrases in their own right.⁶² Indeed, many writers have referred to a close correspondence between the hierarchical organization of syllables and those of simple sentences (45).⁶³

(45)

Figure 2.023Long description
The left diagram has a top labeled Syllable, has two branches, Onset and Rhyme. Rhyme leads to Nucleus and Coda. The right diagram shows that Phrase has two branches, Specifier and Core. Core leads to Head and Complement. Under Specifier, the words include the superscript 64, and under head and complement, it reads house and on the hill, respectively.

The linguist Andrew Carstairs-McCarthy has suggested that perhaps the structure of phrases in formal syntax was copied from the structure of syllables, which assumes that syllable structure developed first when humans started to develop language.⁶⁴ Many others believe that the opposite was the case, and that syntactic structure developed before phonological structure.⁶⁵ If words came to be analyzed in terms of phonemes and syllable structure, it is very possible that the structural principles that directed the formation of syntactic structure were repurposed in the domain of word phonology, but it is of course also possible that a similar structure developed in both domains, guided by “deeper” structural principles that, at least, separated head units from dependent units.⁶⁶

There are arguments in favor and against seeing phonology as “very old.” In Chapter 8, we will see that birdsong has phonological structure. This would suggest that the possibility of assigning a compositional phonology to perceptible, phonetic events is very old in evolutionary terms. On the other hand, it has been argued that in emerging sign languages, the evidence for syntax is there before phonological compositionality develops.⁶⁷

Different Views on the Organization of the Mental Grammar

In a 2002 paper in the journal Science, Chomsky surprised friends and foes by speaking out on the issue of language evolution. Thus far, he had been mostly silent on the issue or dismissive of proposals in this area. The paper was co-authored with Marc Hauser and Tecumseh Fitch.⁶⁸ The authors distinguish between a narrow language faculty (NLF) and a broad language faculty (BLF). The broad faculty includes a sensory-motor system (for the perception and production of speech or sign) and a conceptional-intentional system; in short, the interpretative modules phonology and semantics, with NLF being responsible only for the syntactic engine. The BLF systems are assumed to be shared with other species, either fully or to some extent, and are said to be the result of so-called “third factors” (see Chapter 4). The question is what the NLF consists of. The authors claim that this is where we place the recursive combinatorial system (called “Recursive Merge”), which is claimed to be unique to humans and the human language. This is what Chomsky has termed The Basic Property, which makes it sound really important! But in the end, they say that recursion may have “migrated” from other mental faculties, where it originated first, from where it could then be adopted by other systems such as the narrow language faculty. It isn’t clear, then, whether there is anything at all that is unique to language, although the claim remains that recursion is unique to the human mind.⁶⁹ Whatever the fate of the narrow language faculty, the claim in the 2002 article is that this system accounts for what is called internal language, which is essentially the language of thought. The crucial consequence for the authors is that the question of the evolution of language is about the emergence of internal language, probably due to a single mutation. External languages as means of communication are not their concern.

Steven Pinker and Ray Jackendoff responded to this article and argued that much more than just recursion is special to language, but we must recognize that they are mostly talking about external language.⁷⁰ For example, they show that phonology is unique to human language (although birdsong seems to have something like phonology; see Chapter 8), as well as the enormous power of word learning (number of words, abstract concepts). Turning to syntax, they also refer to special properties other than recursion (word order parameters, long-distance dependencies, etc.). Finally, there is word formation and inflectional morphology, and here we should add intonation. These two authors also differ sharply from the idea that language is just an internal system for organizing thoughts, emphasizing the function of language as a system for communication. Clearly, for these authors the question of language evolution focuses on how external languages came to be, which is not to deny that humans must have a mental grammar that structures thoughts, but also crucially externalizing them in the public domain.

These two conceptions of the mental systems that underlie human language do not differ in terms of how many subsystems are postulated. They both need three representations (and their corresponding sets of well-formedness constraints) that are manifested in both words and sentences.

(46) Conceptual structure
Syntactic structure
Phonological structure

In the theory of Ray Jackendoff, these three modules operate in parallel with so-called interface rules accounting for (isomorphic) correspondences (or the absence thereof) between the structures that each module builds.

(47)

Figure 2.024Long description
On the left (Chomsky’s), a central box labeled Formal Syntax has two downward arrows: one to Conceptual Structure and one to Phonological Structure. On the right (Jackendoff’s), Formal Syntax sits between Conceptual Structure and Phonological Structure, with double-headed arrows connecting it bi-directionally to each.

A third logical view is to propose a model that takes conceptual structure to be central, as shown in (48).

(48)
Figure 2.025

This third model, which is known as the Generative Semantics model,⁷¹ provides a natural basis for the commonsense idea that the formation of a sentence must start with a thought in the form of a conceptual structure. Syntactic structure in this model thus does not come “out of the blue.” (In fact, some might see it as a paradox that, if syntax is a system for organizing thoughts, it is a system for building “naked” syntactic structures that are only provided with a meaning in the interface with the conceptual system.) Congruent with Generative Semantics, models that come from psycholinguistic quarters are extremely likely to start with the formulation of a conceptual structure.⁷²

If we separate the conceptual system from the system that is used to express conceptual structures in the public domain, we should recognize that the conceptual system has a “grammar” of its own. This grammar uses concepts (taken from a conceptual network) as building blocks and uses a conceptual syntax to produce conceptual structures, which are essentially what we call thoughts. In this approach, the relation between conceptual structure and formal syntax can be characterized as “faithfully copying conceptual structures” into overt syntactic structures. Building a syntactic structure starts with units that are stored in the lexicon that have a syntactic layer, a phonological layer, and a semantic layer. The semantic layer of words is copied from the conceptual structures, which can also be encoded in syntactic structures (complex words or syntactic phrases and sentences).⁷³

(49)

Figure 2.026Long description
The diagram features two large boxes: Mental Grammar on the left and Conceptual System on the right. Inside the Mental Grammar box, the component Lexicon: words (form, category, meaning) leads to Formal syntax and Syntactic structure. Inside the Conceptual system box, the component Conceptual network leads to Conceptual syntax, Formal syntax, Conceptual structures, Lexicon: words, and Syntactic structure.

Once the syntactic system had emerged in our very distant ancestors, there are two mind-internal “syntactic systems,” the conceptual syntax and the formal syntax. Conceptual structures are seen as merely expressing grouping and not linear order, while syntactic structures encode both hierarchical and linear structure, the latter sometimes seen as being due to the phonological system (because linearization arises from the phonological necessity to utter words in sequence). In the model adopted in this chapter, linearization is considered to be part of the formal syntactic system, falling into the group of syntactic realization rules. While formal syntax is a module of the mental grammar, the conceptual system is not part of the mental grammar, because it is not, as such, “linguistic,” which here means that is has no direct function in the externalization of linguistic expressions, apart from being the “input” to it.

However, the diagram in (49) is not complete. The mental grammar also contains a phonological system which accounts for the phonological structures of words and sentences. In (50), I indicate that formal syntax caters foremost to the conceptual structures (from which it is “copied”), while phonology tries to faithfully encode a perceptual phonetic structure which is, while mind-internal, external to the mental grammar.⁷⁴ We thus must recognize a third syntactic system: phonological syntax.⁷⁵

(50)
Figure 2.027

This model of the mental grammar captures the traditional distinction behind the so-called dual articulation of language, with (often) syntax being called the “first articulation” and phonology the “second articulation.”⁷⁶

However, the diagram in (50) can be made more complete once we realize that both syntactic systems in the mental grammar can be influenced by both grammar-external systems, though one type of structure is more important for each (highlighted in bold in (51)).

(51)

Figure 2.028Long description
Formal syntax and the Phonological syntax are connected to phonetic structure on the left and conceptual structure on the right by double-headed arrows. The conceptual structure connected to formal syntax and phonetic structure connected to phonological syntax are highlighted.

The influence of conceptual structure on phonological form gives rise to what I have called iconicity, which is a resemblance between the phonological form and the conceptual structure. Especially in sign languages, as we saw in ML, chapter 13, the phonological composition of signs tends to reflect their conceptual structure when phonological building blocks are chosen that in their form resemble aspects of what the sign refers to (or rather a visual representation thereof). It has also long been argued that sentential syntax is iconic in many ways. Since syntactic structure tries to faithfully copy conceptual structure, syntax is necessarily iconic because it will resemble (i.e., is largely isomorphic to) the hierarchical organization of the conceptual structure. In addition, syntactic structure, once linearized, can be iconic of conceptual structure in other ways.⁷⁷ The influence of phonetic structure on formal syntax is less obvious, although there are (phonologized) phonetic constraints on syntactic structure. Examples are the prohibition for phonologically weak words to occur in certain syntactic positions, as illustrated in (52).

1. a. She gave him the book ~ She gave the book to him
2. b. ^?She gave him it ~ She gave it to him

If the object is a small word like it, a structure in which it comes at the end of the sentence, as in (52b), is not preferred, and some people even find it “ungrammatical.” The reason could be that ending a sentence with a word that is not capable of carrying phrasal stress is dispreferred.

1. a. He attributed the fire to a short circuit. (“basic” order)
2. b. *He attributed to a short circuit the fire.
3. c. He attributed [to a short circuit] [that fire in the basement].

The phenomenon called “heavy NP shift” involves an avoidance of phrasal constituents that do not have a certain degree of phonological complexity at the end of the sentence. Compare the sentences in (53).

The explanation for why [the fire in the basement] can be “shifted” to the end of the sentence, while [the fire] cannot, is believed to result from the “heaviness” (i.e., branching) of the first constituent.

The Mental Grammar Is Not a Perfect System

The model of the mental grammar, as here explained, contains a formal syntax and a formal phonology that both diverge from the “substances” they try to faithfully represent. Formal syntax, although derived from conceptual structure, and aiming to faithfully represent it, is autonomous with respect to the conceptual structure in having its own inventory of basic units, principles of organization, and rules. The phonological system is also autonomous from the phonetic substance, while still trying to be faithful to it. It is in this sense that phonologists often say that phonology is “substance-free,” but then we could also say that syntax is “meaning-free.” Formal syntax and formal phonology have both become autonomous. While on the one hand general, third-factor principles cause both systems to have a “logical organization” (see Chapter 5), processes of language change (often contradictory in their motivations, and sometimes even random) introduce “clutter” in both systems. Both syntax and phonology have found compromises in ways to faithfully represent different aspects of the substances that “ground” them. Furthermore, both systems need to adapt to ongoing changes of the phonetic and conceptual substance. Such adaptations likely occur in the course of language acquisition, when learners have to “make sense” of the input that they are exposed to, thus causing their mental grammars to differ from those of their caregivers. Therefore, for various reasons, neither syntax nor phonology can be said to be “perfect.” This view departs from Noam Chomsky’s claim that syntax (here only referring formal syntax) is the “perfect solution” to relating meaning to form. We return to this claim in Chapter 5.

One could ask whether the conceptual system is perfect. I don’t know the answer to that. Perfect to do what? We do not know whether the conceptual system can capture all that is going on in the subconscious ocean of the mind. There is no reason to believe that any cognitive system will be perfect if it emerged from evolutionary processes that were adaptations to a heterogeneous set of circumstances over long stretches of time. As the famous biologist François Jacob wrote:

The action of natural selection has often been compared to that of an engineer. This, however, does not seem to be a suitable comparison. First, because in contrast to what occurs in evolution, the engineer works according to a pre-conceived plan in that he foresees the product of his efforts. Second, because of the way the engineer works: to make a new product, he has at his disposal both material specially prepared to that end and machines designed solely for that task. Finally, because the objects produced by the engineer, at least by the good engineer, approach the level of perfection made possible by the technology of the time. In contrast, evolution is far from perfection. This is a point which was repeatedly stressed by Darwin who had to fight against the argument of perfect creation.⁷⁸

He also includes in his article that, like everything else in nature, the human brain is also the product of tinkering (p. 1166). I return to the discussion of the alleged perfection of the mental grammar in Chapter 4.

Why Would Analogies between Phonology and Syntax Exist (or Not)?

Despite what we have discussed in the previous section about a striking structural analogy between syntax and phonology, it is often said that phonological structure is fundamentally different from syntactic structure, for example, because phonological structure does not display recursivity.⁷⁹ The present author has taken issue with that claim.⁸⁰ Pushing the idea of structural analogy within the grammar, I have argued in this chapter (see also ML, chapter 6) that the submodules of the mental grammar are structurally analogous.⁸¹ This of course does not preclude their being differences between them. In fact, differences are to be expected, precisely because the different submodules have different functions and operate to express different kinds of substances, which causes them to have different sets of basic units (but even at this level, as just discussed, analogies exist).

Proponents of domain-specific, innate modularity, whether within the mental grammar or the mind at large, assume and sometimes argue that analogies between modules are not to be expected precisely because each module has its own specific domain and function which, presumably, calls for domain-specific specialization and solutions. As discussed in ML, chapter 5, this idea agrees with the basic claim of evolutionary psychologists, which is that mental modules are designed to solve a specific problem. The metaphor of the Swiss army knife that is used in this approach is meant to show that different modules are shaped such that they will be optimal for the problem that they are supposed to solve. We could argue that this expectation is counterbalanced by the basic tenets of Evolutionary Developmental (EvoDevo) biology, which promotes the idea that biological variation results from the fact that a small array of so-called master genes can be expressed more times in some species than in others.⁸² For example, a genetic specification for growing limbs will be expressed more often in species that have more limbs than others. By invoking the EvoDevo line of thinking, resemblances between mental modules are not analogous (i.e., similar by chance) but homologous, on the assumption that the relevant structural properties, while applying in different domains, are due to expressions of the same genes.

Instead of invoking multiple expressions of genes in order to explain the occurrence of similarities in different mental modules, we could also invoke the notion of gene duplication, which has been argued to play an important role in evolution.⁸³ Gene duplication, like mutations in the DNA of genes, happens “accidentally,” leading to the occurrence of two identical genes in the genome of an organism. This can either be detrimental or go unnoticed in development, but in the latter case, additional mutations can lead to both genes becoming different, having different effects with perhaps differentiation and specialization in different systems. The question is whether gene duplication could have played a role in diversifying general cognitive capacities, such as categorization and pattern recognition, as well as attributing linear and hierarchical organization to patterns. It is possible that specialized forms of these capacities ended up being part of different cognitive modules, which would then cause the similarities between modules. One hypothesis about the evolution of the human mind is that the pre-human mind was a single general cognitive system that was applied to all the challenges of life, and that over the course of evolution, this holistic mind diversified into several mental modules that had specialized functions. In this scenario (proposed by Steven Mithen),⁸⁴ it is possible that different modules contain copies of general cognitive capacities such as categorization, pattern recognition, and pattern organization, tailoring them to the specifics of each module. Gary Marcus supports the idea of gene duplication as playing a role in the evolution of the human mind.⁸⁵

Variation in multiple expression of the same genes, as well as the mechanism of gene duplication, is consistent with the idea just mentioned (and further discussed in Chapter 4) that all complex systems, especially “self-diversifying systems,” tend to have a specific “logical organization,” which includes William Abler’s Particulate Principle (see Chapter 4). This “third-factor” perspective suggests that perceived similarities are due to modules making use of systems that are external to it, which then allows for a more minimal characterization of these modules. Given the idea of multiple gene expression and gene copying, we can also say that similarities are due to the fact that different modules contain subsystems that, in an evolutionary sense, are descendants of older genes that have developed into specialized versions, which in the case of cognitive modules could translate to specialized neural circuits.

If we apply this line of thinking to grammatical modules, there is no reason to minimalize the various modules just because they display similarities. In this view, recursion and headedness (as well as the prevailing occurrence of binary structure) can be genuine properties of several different grammatical modules, which does not mean that these properties work in exactly the same way or with the same frequency in all modules that contain them. For example, while recursive structure is ubiquitous in syntax, its use is much less called for in phonology. The reason for this is that while syntax foremost accounts for semantic-conceptual structure, which is arguably recursive, phonology models the linear structure of speech, which explains a more important role for linear organization with limited hierarchical structure (and, according to some, no recursion at all).⁸⁶

The Baldwin Effect

Let us finally ask whether the idea that most aspects of grammar are perhaps based on general cognitive capacities (i.e., capacities that are not specific to language) is incompatible with the claim that these same aspects of grammar are dependent on an innate grammar plan. I think that there is not necessarily a contradiction here, because it is conceivable that some grammatical properties, once “invented” or emerged, made their way into the innate ability for language, through what is sometimes called the Baldwin Effect.⁸⁷ The Baldwin Effect is named after the psychologist James Mark Baldwin, who in 1986 suggested that what needs to be learned in one stage of evolution can come to be prespecified in an innate endowment, because people who happen to display the formerly learned knowledge or behavior spontaneously, due to mutations, have an advantage. They do not have to “waste time” on learning, which means that their chances of survival and success increase. This genetic assimilation of behavioral tasks can occur as a result of genes developing additional ways of being expressed and gene duplication.

There may be a problem in applying this line of reasoning to properties of grammars because, if grammatical diversity has arisen only “recently,” the time for these aspects of language to become innate due to the genetic assimilation effect is most likely too short, but this is debatable given that there have been genetic changes since the emergence of Homo sapiens. If the diverse complexities that modern languages exhibit developed over the course of the last 200,000 years, this might be long enough for genetic adaptations that facilitate either learning these properties quickly or having them built-in. However, we do not have to rely on genetic causes for all properties of grammars. In fact, the basic argument of Grammaticalization Theory is that the wondrous ways in which languages differ in their syntactic and phonological patterns can be attributed to processes of language change, the causes of which are not genetic, but rather must be found in the use of language as a communication system. The built-in properties of grammars must thus be more abstract such they can be shared by all human languages. These properties are given in the “language manual,” which we can continue to call universal grammar, that children are born with, which we can characterize as a set of expectations or biases concerning the general organization of the mental grammar and its core modules, phonology and syntax. What these expectations are precisely will remain a topic of discussion for some time to come, but there is no point in reducing what is innate to a single operation that says nothing about the function and use of “external language.”⁸⁸

Can the Innateness Hypothesis Be Falsified?

One must now ask how the Innateness Hypothesis (irrespective of how much content we give it) can be tested. When evaluating a hypothesis, we should ask what kind of predictions it makes in various domains that did not originally motivate it. We then must ask what kind of evidence could falsify these predictions. Specifying potential falsifying evidence for the Innateness Hypothesis for language is not so easy, as I will argue in this section. For this reason, most nativists have focused on arguments that aim at supporting these predictions. While it is true that showing that predictions are confirmed will lend plausibility to a hypothesis, philosophers such as Karl Popper argued that to make scientific progress, it is more important to indicate what kind of evidence would refute (or falsify) a hypothesis. Let us therefore ask what kind of evidence would refute the Innateness Hypothesis for language and then see whether such evidence is even available, at least in principle.

The IH states that children cannot acquire language (i.e., formulate a mental grammar) unless they are born with an innate UG. This hypothesis can be falsified by showing that children who are not born with an innate UG can nevertheless successfully acquire language. The problem is that there is no conceivable way to do the relevant experiment. Ignoring the obvious ethical prohibitions, the problem is that it is not possible to create a situation in which a child is exposed to primary linguistic input while having no access to the hypothesized UG. Of course, empiricists will immediately say that this “experiment” is performed every time a child is exposed to language input, because, according to them, there is no UG to begin with. An empiricist assumes that language acquisition is successful without postulating domain-specific innate knowledge. As the empiricist-minded philosopher David Hume said long ago: If human knowledge can emerge without postulating innate ideas, the demand of simplicity requires us to reject them. The problem is, however, that we cannot know, in principle, whether successful acquisition of language was in fact achieved without a UG.

For the sake of the argument, let us see why an experiment that would lead to falsification of the IH cannot succeed in principle, even if we did (forbidden) experiments with human children. To make the hypothesized UG inaccessible to the child, we would have to first know “where it is.” There are two possibilities to locate UG. The most direct evidence for the location of UG would be to find it in the genes. Let us assume that such genes have been identified. (See Chapter 7 which shows that this is not true.) In that case, the perfect (yet highly unethical) experiment would be to “knock out” (i.e., silence) the gene or genes that are responsible for the innate UG, which would predict that normal exposure to the primary linguistic data will not result in knowledge of language (i.e., a mental grammar).⁸⁹ If the individual whose UG language genes are silenced does not display normal language behavior, the necessity for a UG is confirmed. However, if it is the case that our poor subject shows no negative effects and becomes a fully fluent language user, we have falsified the claim that language acquisition is only possible when the genes for UG are available (although another reason could be that we silenced the wrong genes). One might object that if the genes that underwrite UG have been identified (which is necessary if we want to silence them), there is perhaps no need to do the forbidden experiment because, in a sense, we have already delivered definitive proof that UG is real. However, as mentioned, we don’t know whether we have identified the correct UG genes, and even if these genes can be correctly identified, it does not, strictly speaking, follow that language cannot be acquired if the genes are silenced. So, our horrible experiment is still useful, but the paradox to this experiment is that, while it requires that we have identified UG genes, a result that shows that the child can form a mental grammar when the relevant genes are silenced would lead to the conclusion that these genes are not necessary. This of course raises the question of why they would be there to begin with …

Putting ethics aside (which of course we should never do), and ignoring the paradox, we cannot even do this experiment because, as we will see in Chapter 7, no “UG genes” have been found, even though genetic evidence supports that many genes do play a role in language behavior. We know that these genes have relevance for language behavior because the “knock-out experiment” is sometimes performed for us due to spontaneous gene mutations that can affect certain individuals (and that can be inherited by their offspring). As we will see in Chapter 7, evidence indicates that these genes always appear to have many other functions as well. But if that did not stop the “mad scientist” from knocking out all these genes, the problem remains that it has not been shown that the genes that have a bearing on language behavior specifically underwrite the alleged UG (i.e., the system that is necessary to construct a mental grammar). In fact, as we will see in Chapter 7, it would appear that the language-relevant genes that have been located regulate language processing and, more generally, language use, rather than the alleged UG.

Another type of forbidden experiment, which does not presuppose knowing the genes that underwrite UG, would be to paralyze (i.e., sedate) those brain areas which harbor the alleged innate UG. Sedation of brain areas is possible, even for experiments that are considered ethical, but is certainly not meant to extend over a very long period, such as the first years of life! Surgical removal of these areas would be an even more horrible experimental method (assuming that the poor subject survived the procedure). Putting that aside, if our poor subject cannot acquire language, we would confirm the Innateness Hypothesis, while this hypothesis is falsified if our subjects become linguistically successful. This approach presupposes that specific brain areas can be identified as uniquely supporting UG. As discussed in ML, chapter 11 and in Chapter 6, it can happen that brain areas that are important for language are damaged as the result of stroke or other type of injury. Also, sometimes it is necessary that such areas are fully or partially removed in surgery to cure a brain disease or diminish the effect of epilepsy. What such cases show is that older individuals who lose language abilities are not able to regain full language. This confirms that said areas are necessary to construct a mental grammar. If such individuals were able to relearn language perfectly, this, one might argue, would falsify the Innateness Hypothesis, on the assumption that the damaged or removed areas hold the innate language capacity. This conclusion is not invalidated by the fact that young children whose language areas have been damaged or removed can construct a new mental grammar in other areas of the brain, even in the other hemisphere. Here the view is that since those young people were within the critical period for language acquisition, the genes can rebuild UG in a different part of the brain; see ML, chapter 11.

The UG is supposed to remain active from birth to puberty, as per the critical period hypothesis, whether or not it has been used for first language acquisition during the critical period. If UG has been activated for that purpose, will we then say that it gives rise to the mental grammar but remains distinct from it, or do we say that UG develops into the mental grammar? With the first option, one might conclude that we have to replace our search for UG by a search for the mental grammar. However, even if UG is subsumed in the mental grammar, it would still also be independently present, ready to be used for a later instance of language acquisition (within the critical period). (Note that within Chomsky’s Minimalist Program this difference does not arise because UG, in a sense, is the mental grammar, assuming that this system is only responsible for the operation of Recursive Merge that delivers a universal, infinite set of syntactic structures.)

If, thus, we forget about genes and knock-out experiments, and we furthermore assume that UG is present in the brain, and separate from processing systems, we should be able to find it using the same methods that have been used in the localization of cognitive functions, including language functions, in the brain. In Chapter 6 we will visit the field of neurolinguistics, which has delivered many results that indicate that there are many areas of the brain that are of specific relevance for language. Although these areas typically also have other functions, it would seem that they have a lot to do with language processing, which raises the question whether there might also be areas that harbor a processing-neutral mental grammar. We will see that within this field, researchers have not initiated a direct search for UG in the brain, with the exception of some so-called biolinguists (such as Cedric Boeckx and collaborators).⁹⁰ In this line of work, an explicit distinction is made between processing systems that are involved in the externalization of language, which are said to have been the focus of perhaps all neurolinguistic research, and a system that is responsible for the core of human language, the syntactic engine (Recursive Merge), which is innate and invariant across the human species. I will discuss that line of work in Chapter 7.

A nativist could now say the following. It is not necessary to talk about genes and brains. If we focus on those factors, we have failed to understand what the poverty of the stimulus argument is really trying to show. Chomsky’s key argument for his IH was (and still is) that the construction of a mental grammar is simply impossible unless we assume that children are guided by an innate “manual” that is specific for language (considering the quality of the input, the speed of acquisition, and the convergence of mental grammars of different learners). Why impossible? Well, languages have essential abstract properties that cannot be inferred from being exposed to language utterances. For example, we have seen that syntactic transformations are structure dependent, which means that they make crucial reference to the hierarchical, syntactic structure of sentences. How would a child know this? Why can learners not be caught making mistakes that suggest that they are trying a strictly linear solution to wh-movement?

The empiricist response to this point is rooted in David Hume’s simplicity argument, mentioned earlier, which is that we must only postulate innate ideas that are strictly necessary. This is of course an instance of the more general principle known as Occam’s razor: entia non sunt multiplicanda praeter necessitatem.⁹¹ We must thus try to solve the puzzle of language acquisition with minimal reliance on what an alleged language-dedicated innate system must contain to guarantee successful language acquisition. If possible, we must not rely on such a system at all if it can be shown that general learning capacities (which as such are innate) are sufficient to solve the logical problem of language acquisition. Interestingly, this is precisely what drives Noam Chomsky’s minimalist approach, although his rather extreme claim that UG only contains a recursive combinatorial operation (leaving everything else to “other factors”) brings to mind a remark that is often attributed to Albert Einstein: “Everything should be made as simple as possible, but not simpler,” although this is a popularized condensed summary of what he actually wrote in 1933: “It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic features as simple and as few as possible without having to surrender the adequate representation of a single datum of experience.”⁹² Critics of the minimalist approach argue that minimalist simplicity has come at the expense of narrowing down its empirical coverage. While, at least in the picture of the mental grammar that was painted in this chapter, the mental grammar accounts for all three layers of linguistic expression (phonology, semantics, and syntax), Chomsky’s minimal UG is only held responsible for the syntactic layer. Additionally, even with respect to syntax, many syntacticians argue that minimalist syntax falls short in terms of empirical coverage.⁹³ In this book, I will not try to settle these syntactic disputes, but in Chapter 4 I will discuss those “other factors” that allegedly allow UG to be as minimal as many nativists who follow Chomsky think it could be.

In conclusion, since we do not know whether or not humans have an innate capacity for language, one would have to show that language acquisition is possible without reliance on UG in an experiment that does not involve humans. For example, one would have to write a computer program (an “artificial learner”) that can take natural language as input and produce another program (an artificial mental grammar) that can produce language output that compares in all respects to what a three- or four-year-old child can do. As discussed in ML, chapter 9, empiricists have certainly tried to write such learning programs, but to date such programs learn only partial and, some would say, rather modest aspects of natural languages. This brings us back to chatbots, which are based on computer programs and language input. These systems generate human-like language, which is apparently coherent and meaningful. Given the success of these systems, the cognitive scientist Steven T. Piantadosi has claimed that Chomsky’s Innateness Hypothesis is in fact falsified.⁹⁴ As we have seen in the preceding Q and A, the question is whether the ChatGPT program that is used to generate sentences bears any resemblance to the rule system that linguists hypothesize in their models of the mental grammar. This question is difficult to answer because even chatbot designers apparently do not know why their systems are so successful in generating their output. However, Piantadosi’s claim has not remained unchallenged.⁹⁵ An important difference is that (a) the training set for bots and kids is very different in “size” and (b) bots can learn anything they are trained on, while kids seem to converge on specific outputs even if the training set is impoverished (which it always is according to nativists).

Conclusions

In this chapter I have outlined the internal organization of the mental grammar. We then had a conversation about the central argument that rests on the poverty of the stimulus that children are exposed to. The question as to what is contained in the innate UG was also discussed in this chapter. We continued with the question of whether the IH can be falsified, concluding that this is not possible either in principle or because of ethical limitations on experiments that might bear on this issue. Does this mean that in evaluating this hypothesis we need to be satisfied with supporting arguments? In this book and its prequel ML, it would seem that I have indeed relied on supporting arguments from a variety of linguistic and non-linguistic areas, the latter not all totally unrelated to language. The question of how to invalidate the Innateness Hypothesis remains one that is difficult to answer until someone can show that there is a learning path that leads from the input that children are exposed to a complete model of the mental grammars that underlie human language. ChatGPT is not such a model.

Figure 2.01

Figure 2.02 Figure 2.02 long description.

Figure 2.03

Figure 2.04

Figure 2.05 Figure 2.05 long description.

Figure 2.06 Figure 2.06 long description.

Figure 2.07 Figure 2.07 long description.

Figure 2.08 Figure 2.08 long description.

Figure 2.09 Figure 2.09 long description.

Figure 2.010

Figure 2.011

Figure 2.012

Figure 2.013

Figure 2.014 Figure 2.014 long description.

Figure 2.015

Figure 2.016

Figure 2.017 Figure 2.017 long description.

Figure 2.018 Figure 2.018 long description.

Figure 2.019 Figure 2.019 long description.

Figure 2.020 Figure 2.020 long description.

Figure 2.021

Figure 2.022

Figure 2.023 Figure 2.023 long description.

Figure 2.024 Figure 2.024 long description.

Figure 2.025

Figure 2.026 Figure 2.026 long description.

Figure 2.027

Figure 2.028 Figure 2.028 long description.

Accessibility standard: WCAG 2.1 AA

Why this information is here

This section outlines the accessibility features of this content - including support for screen readers, full keyboard navigation and high-contrast display options. This may not be relevant for you.

Accessibility Information

The HTML of this book complies with version 2.1 of the Web Content Accessibility Guidelines (WCAG), covering newer accessibility requirements and improved user experiences and achieves the intermediate (AA) level of WCAG compliance, covering a wider range of accessibility requirements.

Content Navigation

Table of contents navigation
Allows you to navigate directly to chapters, sections, or non‐text items through a linked table of contents, reducing the need for extensive scrolling.

Index navigation
Provides an interactive index, letting you go straight to where a term or subject appears in the text without manual searching.

Reading Order & Textual Equivalents

Single logical reading order
You will encounter all content (including footnotes, captions, etc.) in a clear, sequential flow, making it easier to follow with assistive tools like screen readers.

Short alternative textual descriptions
You get concise descriptions (for images, charts, or media clips), ensuring you do not miss crucial information when visual or audio elements are not accessible.

Visual Accessibility

Use of colour is not sole means of conveying information
You will still understand key ideas or prompts without relying solely on colour, which is especially helpful if you have colour vision deficiencies.

Structural and Technical Features

ARIA roles provided
You gain clarity from ARIA (Accessible Rich Internet Applications) roles and attributes, as they help assistive technologies interpret how each part of the content functions.

a.	The boy kicked the ball	(active sentence structure)
b.	The ball was kicked by the boy	(passive sentence structure)

(English) input ⇒	Universal Grammar ⇒	Mental Grammar (of English)
“poor”		“rich”

i/nd/ecent	*i/np/roper	*i/nk/redible
	↓	↓
	/m/	/ŋ/

	H	H
	\|	\|
I went [with my	sister] _[+focus][to a	movie] _[+focus]

Book contents

2 - The Organization of the Mental Grammar

Summary

Keywords

Information

Section I The Structure of the Mental Grammar

Introduction

Words and Sentences

The Three Layers of Words

The Phonological Layer

The Semantic Layer

The Syntactic Layer

The Lexicon

Word Formation: Simplex and Complex Words

Checking for Grammaticality

A Note on the Notion “Word”

Check Everything

The Lexicon Again

Open and Closed Word Classes

Sentences

More Checking

Checking the Phonological and Semantic Layer of Sentences

Inflection

The Lexicon Once More

Interim Summary: A Model of the Mental Grammar

Structural Analogy

Section II Some Refinements and Questions of Innateness

Introduction

Recursion

Intonation

The Role of Gesture

What Could Be Innate and What Does That Do for Language Acquisition?

A and Q Discuss the Poverty of the Stimulus Argument for Innateness

Section III Structural Analogy

Introduction

Analogies between Syntax and Phonology

Different Views on the Organization of the Mental Grammar

The Mental Grammar Is Not a Perfect System

Why Would Analogies between Phonology and Syntax Exist (or Not)?

The Baldwin Effect

Can the Innateness Hypothesis Be Falsified?

Conclusions

Accessibility standard: WCAG 2.1 AA

Why this information is here

Accessibility Information

Content Navigation

Reading Order & Textual Equivalents

Visual Accessibility

Structural and Technical Features

Save book to Kindle

Save book to Dropbox

Save book to Google Drive