Skip to main content Accessibility help
×
Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-12T05:03:53.688Z Has data issue: false hasContentIssue false

4 - Variation in Participants and Stimuli in Acceptability Experiments

from Part I - General Issues in Acceptability Experiments

Published online by Cambridge University Press:  16 December 2021

Grant Goodall
Affiliation:
University of California, San Diego
Get access

Summary

Judgments in acceptability judgment tasks are not uniform – because of the conditions involved, but also because of additional variation across participants and across items. Some of the variation is meaningful, some is noise. This chapter discusses both types of variation and provides recommendations on how to deal with them. We show how some of the interspeaker variation stems from micro-differences between grammars. Statistical procedures like distribution analysis or cluster analysis help in detecting such variation. The same procedures can be used to identify variation across items. Further, we outline how to reduce variation across and within items. In particular, we recommend keeping length and complexity of sentences constant as well as the accessibility of NP-antecedents. The rest of the chapter deals with variation stemming from extralinguistic sources. Beside individual differences related to performance factors, e.g. working memory, we discuss methodological artifacts like scale effects and non-cooperative behavior.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ambridge, B., Pine, J. M., Rowland, C. F., & Young, C. R. (2008). The effect of verb semantic class and verb frequency (entrenchment) on children’s and adults’ graded judgements of argument-structure overgeneralization errors. Cognition, 106, 87129.CrossRefGoogle ScholarPubMed
Andersson, S.-G. & Kvam, S. (1984). Satzverschränkung im heutigen Deutsch. Eine syntaktische und funktionale Studie unter Berücksichtigung alternativer Konstruktionen. Tübingen: Narr.Google Scholar
Ariel, M. (1990). Accessing NP Antecedents. Abingdon: Routledge.Google Scholar
Baayen, R. H. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press.Google Scholar
Baayen, R. H. & Milin, P. (2010). Analyzing reaction times. International Journal of Psychological Research, 3(2), 1228.CrossRefGoogle Scholar
Bader, M. & Häussler, J. (2010). Toward a model of grammaticality judgments. Journal of Linguistics, 46(2), 273330.CrossRefGoogle Scholar
Bard, E. G., Robertson, D., & Sorace, A. (1996). Magnitude Estimation of linguistic acceptability. Language, 72(1), 3268.Google Scholar
Bayer, J. (1984). Comp in Bavarian syntax. The Linguistic Review, 3(3), 209274.Google Scholar
Bolinger, D. (1978). Asking more than one thing at a time. In Hiz, H., eds., Questions. Dordrecht: Reidel, pp. 97106.Google Scholar
Brandner, E. (2012). Syntactic microvariation. Language and Linguistics Compass, 6, 113130.Google Scholar
Bresnan, J. & Ford, M. (2010). Predicting syntax: Processing dative constructions in American and Australian varieties of English. Language, 86(1),168213.CrossRefGoogle Scholar
Buchholz, S. & Latorre, J. (2011). Crowdsourcing preference tests, and how to detect cheating. In P. Cosi, R. De Mori, G. Di Fabbrizio, & R. Pieraccini, eds., INTERSPEECH 2011: 12th Annual Conference of the International Speech Communication Association, pp. 30533056. ISCA Archive: www.isca-speech.org/archive/interspeech_2011Google Scholar
Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, MA: MIT Press.Google Scholar
Chomsky, N. (1973). Conditions on transformations. In Anderson, S. & Kiparsky, P., eds., A Festschrift for Morris Halle. New York: Holt, Rinehart & Winston, pp. 232286.Google Scholar
Chomsky, N. (1980). Rules and Representations (Woodbridge Lectures 11). New York: Columbia University Press.Google Scholar
Clifton, C., Jr., Fanselow, G., & Frazier, L. (2006). Amnestying superiority violations: Processing multiple questions. Linguistic Inquiry, 37, 5168.Google Scholar
Clifton, C. Jr., Frazier, L., & Connine, C. (1984). Lexical expectations in sentence comprehension. Journal of Verbal Learning and Verbal Behaviour, 23, 696708.Google Scholar
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20(1), 3746.Google Scholar
Conway, A., Kane, M., Bunting, M., Hambrick, D. Z., Wilhelm, O., & Engle, R. (2005). Working memory span tasks: A methodological review and user’s guide. Psychonomic Bulletin and Review, 12, 769–86.Google Scholar
Cover, T. M. & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1), 127.Google Scholar
Cowart, W. (1997). Experimental Syntax: Applying Objective Methods to Sentence Judgments. Thousand Oaks, CA: Sage.Google Scholar
Cox, E. P (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17(4), 407422.CrossRefGoogle Scholar
Crawford, J. (2012). Using syntactic satiation to investigate subject islands. In J. Choi, E. Hogue, A., Punske, J., Tat, D., Schertz, J., & Trueman, A., eds., Proceedings of the 29th West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Proceedings Project, pp. 3845.Google Scholar
Culicover, P. W. & Jackendoff, R. (2010). Quantitative methods alone are not enough: Response to Gibson and Fedorenko. Trends Cognitive Science, 14, 234235.Google Scholar
Dandurand, F., Shultz, T. R., & Onishi, K. H. (2008). Comparing online and lab methods in a problem-solving experiment. Behavior Research Methods, 40(2), 428434.CrossRefGoogle Scholar
Divjak, D. (2017). The role of lexical frequency in the acceptability of syntactic variants: Evidence from that-clauses in Polish. Cognitive Science, 41(2), 354382.Google Scholar
Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010). Are your participants gaming the system? Screening Mechanical Turk workers. In Mynatt, E., ed., CHI ʼ10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. Atlanta, GA: Association for Computing Machinery, pp. 23992402.Google Scholar
Eickhoff, C. & de Vries, A. P. (2013). Increasing cheat robustness of crowdsourcing tasks. Information Retrieval, 16(2), 121137.Google Scholar
Fanselow, G. & Frisch, S. (2006). Effects of processing difficulty on judgments of acceptability. In Fanselow, G., Féry, C., Schlesewsky, M., & Vogel, R., eds., Gradience in Grammar: Generative Perspectives. Oxford: Oxford University Press, pp. 291316.Google Scholar
Fanselow, G., Kliegl, R., & Schlesewsky, M. (2005). Syntactic variation in German wh-questions: Empirical investigations of weak crossover violations and long wh-movement. Linguistic Variation Yearbook, 5, 3763.CrossRefGoogle Scholar
Featherston, S. (2007). Data in generative grammar: The stick and the carrot. Theoretical Linguistics, 33, 269318.Google Scholar
Featherston, S. (2008). Thermometer judgments as linguistic evidence. In Riehl, C. M. & Rothe, A., eds., Was ist linguistische Evidenz? Aachen: Shaker Verlag, pp. 6989.Google Scholar
Fedorenko, E. & Gibson, E. (2010). Adding a third wh-phrase does not increase the acceptability of object-initial multiple-wh-questions. Syntax, 13(3), 183195.Google Scholar
Fine, A. B., Jaeger, T. F., Farmer, T. A., & Qian, T. (2013). Rapid expectation adaptation during syntactic comprehension. PLoS ONE, 8(10), e77661. DOI:10.1371/journal.pone.0077661CrossRefGoogle ScholarPubMed
Fix, E. & Hodges, J. (1989). Discriminatory analysis. Nonparametric discrimination: Consistency properties. International Statistical Review / Revue internationale de statistique, 57(3), 238247.Google Scholar
Forgy, E. W. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics, 21(3), 768769.Google Scholar
Francom, J. (2009). Experimental syntax: Exploring the effect of repeated exposure to anomalous syntactic structure – evidence from rating and reading tasks. Doctoral dissertation, University of Arizona.Google Scholar
Garland, R. (1991). The mid-point on a rating scale: Is it desirable? Marketing Bulletin, 2, 6670.Google Scholar
Gerbrich, H., Schreier, V., & Featherston, S. (2019). Standard items for English judgment studies: Syntax and semantics. In Featherston, S., Hörnig, R., von Wietersheim, S., & Winkler, S. (eds.), Experiments in Focus: Information Structure and Semantic Processing. Berlin: De Gruyter, pp. 305327.Google Scholar
Gervain, J. (2003). Syntactic microvariation and methodology: problems and perspectives. Acta Linguistica Hungarica, 50(34), 405434.Google Scholar
Ghiselli, E. E. (1939). All or none versus graded response questionnaires. Journal of Applied Psychology, 23, 405415.Google Scholar
Gibson, E. & Fedorenko, E. (2010). Weak quantitative standards in linguistics research. Trends Cognitive Science, 14, 233234.CrossRefGoogle ScholarPubMed
Gibson, E., Piantadosi, S., & Fedorenko, K. (2011). Using Mechanical Turk to obtain and analyze English acceptability judgments. Language and Linguistics Compass, 5(8), 509524.Google Scholar
Givón, T. (1983). Topic Continuity in Discourse: A Quantitative Cross-Language Study. Amsterdam: John Benjamins.Google Scholar
Gries, S. Th. (2013). Statistics for Linguistics with R. A Practical Introduction, 2nd, revised ed. Berlin and Boston, MA: De Gruyter Mouton.Google Scholar
Guajardo, G. & Goodall, , G. (2019). On the status of Concordantia Temporum in Spanish: An experimental approach. Glossa: A Journal of General Linguistics, 4(1), 116. DOI: 10.5334/gjgl.749Google Scholar
Hancock, R. & Bever, T. G. (2013). Genetic factors and normal variation in the organization of language. Biolinguistics, 7, 7595.Google Scholar
Harrington Stack, C. M., James, A. N., & Watson, D. G. (2018). A failure to replicate rapid syntactic adaptation in comprehension. Memory & Cognition, 46, 864877.Google Scholar
Hartsuiker, R. J., Bernolet, S., Schoonbaert, S., Speybroeck, S., & Vanderelst, D. (2008). Syntactic priming persists while the lexical boost decays: Evidence from written and spoken dialogue. Journal of Memory and Language, 58, 214238.Google Scholar
Häussler, J. & Juzek, T. S. (2017). Hot topics surrounding acceptability judgement tasks. In Featherston, S., Hörnig, R., Steinberg, R., Umbreit, B., & Wallis, J., eds., Proceedings of Linguistic Evidence 2016: Empirical, Theoretical, and Computational Perspectives. University of Tübingen, http://dx.doi.org/10.15496/publikation-19039Google Scholar
Hiramatsu, K. (2000). Accessing linguistic competence: Evidence from children’s and adults’ acceptability judgments. Doctoral dissertation, University of Connecticut.Google Scholar
Hofmeister, P., Jaeger, T. F., Sag, I. A., Arnon, I., & Snider, N. (2007). Locality and accessibility in wh-questions. In Featherston, S. & Sternefeld, W., eds., Roots: Linguistics in Search of Its Evidential Base. Berlin: Mouton de Gruyter, pp. 185206.Google Scholar
Hofmeister, P. & Sag, I. A. (2010). Cognitive constraints and island effects. Language, 86, 366415.Google Scholar
Hofmeister, P., Staum Casasanto, L., & Sag, I. A. (2012a). How do individual cognitive differences relate to acceptability judgments? A reply to Sprouse, Wagers, and Phillips. Language, 88, 390400.Google Scholar
Hofmeister, P., Staum, Casasanto, L., & Sag, I. A. (2012b). Misapplying working memory tests: A reductio ad absurdum. Language, 88(2), 408409.Google Scholar
Jegerski, J. (2014). Self-paced reading. In Jegerski, J. & VanPatten, B., eds., Research Methods in Second Language Psycholinguistics. New York: Routledge, pp. 2049.Google Scholar
Just, M. A. & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99(1), 122149.CrossRefGoogle ScholarPubMed
Kaan, E. & Chun, E. (2018). Priming and adaptation in native speakers and second-language learners. Bilingualism: Language and Cognition, 21, 228242.Google Scholar
Kayne, R. (1983). Connectedness. Linguistic Inquiry, 14, 223249.Google Scholar
Kazai, G. (2011). In search of quality in crowdsourcing for search engine evaluation. In Clough, P., Foley, C., Gurrin, C., Jones, G., Kraaij, W., Lee, H., & Murdock, V., eds., Advances in Information Retrieval. Heidelberg: Springer, pp. 165176.Google Scholar
Kazai, G., Kamps, J., & Milic-Frayling, N. (2011). Worker types and personality traits in crowdsourcing relevance labels. In Berendt, B., de Vries, A., Fan, W., Macdonald, C., Ounis, I., & Ruthven, I., eds., Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM’11). New York: ACM, pp. 19411944.Google Scholar
Kilpatrick, F. P. & Cantril, H. (1960). Self-anchoring scaling: A measure of individuals’ unique reality worlds. Journal of Individual Psychology, 16, 158173.Google Scholar
Klaus, J. & Schriefers, H. (2016). Measuring verbal working memory capacity: A reading span task for laboratory and web-based use. OSF Preprints. December 7. DOI:10.31219/osf.io/nj48xGoogle Scholar
Kluender, R. (1998). On the distinction between strong and weak islands: A processing perspective. In Culicover, P. & McNally, L., eds., The Limits of Syntax (Syntax and Semantics, 29). San Diego, CA: Academic Press, pp. 241279.Google Scholar
Kluender, R. (2004). Are subject islands subject to a processing account? In Chand, V., Kelleher, A., Rodríguez, A. J., & Schmeiser, B., eds., Proceedings of the 23rd West Coast Conference on Formal Linguistics. Somerville, MA: Cascadilla Press, pp. 101125.Google Scholar
Krantz, J. H. & Dalal, R. (2000). Validity of Web-based psychological research. In Birnbaum, M., ed., Psychological Experiments on the Internet. New York: Academic Press, pp. 3560.Google Scholar
Kroch, A. (1989). Reflexes of grammar in patterns of language change. Language Variation and Change, 1, 199244.Google Scholar
Kuno, S. & Robinson, J. (1972). Multiple wh-questions. Linguistic Inquiry, 3, 463487.Google Scholar
Labov, W. (1966). The Social Stratification of English in New York City. Washington, DC: Center for Applied Linguistics.Google Scholar
Langsford, S., Perfors, A., Hendrickson, A. T., Kennedy, L. A., & Navarro, D. J. (2018). Quantifying sentence acceptability measures: Reliability, bias, and variability. Glossa: A Journal of General Linguistics, 3 (1), 37. DOI: 10.5334/gjgl.396Google Scholar
Levshina, N. (2015). How to Do Linguistics with R: Data Exploration and Statistical Analysis. Amsterdam: John Benjamins.Google Scholar
Lloyd, S. P. (1982). Least squares quantization in pcm. IEEE Transactions on Information Theory, 28, 129137.Google Scholar
Luce, R. D. (1986). Response Times: Their Role in Inferring Elementary Mental Organization. New York: Oxford University Press.Google Scholar
Lühr, R. (1988). Zur Satzverschränkung im heutigen Deutsch. Groninger Arbeiten zur Germanistischen Linguistik, 29, 7487.Google Scholar
Mason, W. & Suri, S. (2012). Conducting behavioral research on Amazonʼs Mechanical Turk. Behavior Research Methods, 44(1), 123.Google Scholar
Mattel, M. & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Study I: Reliability and validity. Journal of Applied Psychology, 56(6), 506509.Google Scholar
Munro, R., Bethard, S., Kuperman, V., Lai, V. T., Melnick, R., Potts, C., Schnoebelen, T., & Tily, H. (2010). Crowdsourcing and language studies: the new generation of linguistic data. In NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, 122130.Google Scholar
Pakulak, E. & Neville, H. J. (2010). Proficiency differences in syntactic processing of monolingual native speakers indexed by event-related potentials. Journal of Cognitive Neuroscience, 22(12), 27282744.Google Scholar
Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411419.CrossRefGoogle Scholar
Pesetsky, D. (1982). Paths and categories. Doctoral dissertation, Massachusetts Institute of Technology.Google Scholar
Pesetsky, D. (1987). Wh-in-Situ: Movement and unselective binding. In Reuland, E. J. & ter Meulen, A. G. B., eds., The Representation of (In)definitness. Cambridge, MA: MIT Press, pp. 98129.Google Scholar
Phillips, C. (2013). On the nature of island constraints. I: Language processing and reductionist accounts. In Sprouse, J. & Hornstein, N., eds., Experimental Syntax and Island Effects. Cambridge: Cambridge University Press, pp. 64108.Google Scholar
Pickering, M. J. & Branigan, H. P. (1998). The representation of verbs: Evidence from syntactic priming in language production. Journal of Memory and Language, 39, 633651.Google Scholar
Preston, C. C. & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent preferences. Acta Psychologica, 104(1), 115.CrossRefGoogle ScholarPubMed
Rayner, K. & Duffy, S. A. (1986). Lexical complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical ambiguity. Memory & Cognition, 14, 191201.Google Scholar
Ross, J. (1967). Constraints on variables in syntax. Doctoral dissertation, Massachusetts Institute of Technology.Google Scholar
Salzmann, M., Häussler, J., Bayer, J., & Bader, M. (2013). That-trace effects without traces. An experimental investigation. In Keine, S. & Sloggett, S., eds., Proceedings of the 42nd Annual Meeting of the North East Linguistic Society. Amherst, MA: GLSA, vol. 2, pp. 149162.Google Scholar
Schnoebelen, T. & Kuperman, V. (2010). Using Amazon Mechanical Turk for linguistic research. Psihologija, 43(4), 441464.Google Scholar
Schütze, C. T. (1996). The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. Chicago: University of Chicago Press.Google Scholar
Seidenberg, M. S. & MacDonald, M. C. (1999). A probabilistic constraints approach to language acquisition and processing. Cognitive Science, 23(4), 569588.Google Scholar
Snyder, W. (2000). An experimental investigation of syntactic satiation effects. Linguistic Inquiry, 31(3), 575582.Google Scholar
Soleymani, M. & Larson, M. (2010). Crowdsourcing for affective annotation of video: development of a viewer-reported boredom corpus. In ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010), pp. 48.Google Scholar
Sorokin, A. & Forsyth, D. (2008). Utility data annotation with Amazon Mechanical Turk: Computer vision and pattern recognition workshops. In IEEE Computer Society Conference on IEEE (CVPRW’08), pp. 18.Google Scholar
Sprouse, J. (2009). Revisiting satiation: Evidence for an equalization response strategy. Linguistic Inquiry, 40, 329341.Google Scholar
Sprouse, J. (2011). A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behavior Research Methods, 43(1), 155167.Google Scholar
Sprouse, J., Wagers, M., & Phillips, C. (2012a). A test of the relation between working memory and syntactic island effects. Language, 88(1), 82123.CrossRefGoogle Scholar
Sprouse, J., Wagers, M., & Phillips, C. (2012b). Working-memory capacity and island effects: A reminder of the issues and the facts. Language, 88(2), 401407.Google Scholar
Stevens, S. S (1946). On the theory of scales of measurement. Science, 103, 667688.Google Scholar
Tooley, K. M. & Bock, K. (2014). On the parity of structural persistence in language production and comprehension. Cognition, 132(2), 101136.Google Scholar
Tooley, K. M. & Traxler, M. J. (2010). Syntactic priming effects in comprehension: A critical review. Language and Linguistics Compass, 4(10), 925937.Google Scholar
Traxler, M. J. (2008). Lexically independent syntactic priming of adjunct relations in on-line sentence comprehension. Psychonomic Bulletin & Review, 15, 149155.Google Scholar
Traxler, M. J., Tooley, K. M., & Pickering, M. J. (2014). Syntactic priming during sentence comprehension: Evidence for the lexical boost. Journal of Experimental Psychology: Learning, Memory and Cognition, 40(4), 905918.Google Scholar
Tversky, A. & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 11241131.Google Scholar
Vogel, R. (2019). Grammatical taboos: An investigation on the impact of prescription in acceptability judgement experiments. Zeitschrift für Sprachwissenschaft, 38(1), 3779.Google Scholar
Warren, T. & Gibson, E. (2002). The influence of referential processing on sentence complexity. Cognition, 85, 79112.Google Scholar
Warren, T. & Gibson, E. (2005). Effects of NP type in reading cleft sentences in English. Language and Cognitive Processes, 20, 751767.Google Scholar
Wasow, T. (2002). Postverbal Behavior. Stanford: CSLI Publications.Google Scholar
Weijters, B., Cabooter, E., & Schillewaert, N. (2010). The effect of rating scale format on response styles: the number of response categories and response category labels. International Journal of Research in Marketing, 27, 236247.CrossRefGoogle Scholar
Weskott, T. & Fanselow, G. (2011). On the informativity of different measures of linguistic acceptability. Language, 87(2), 249273.Google Scholar
Winter, B. (2019). Statistics for Linguists: An Introduction Using R. New York: Routledge.Google Scholar
Zhu, D. & Carterette, B. (2010). An analysis of assessor behavior in crowdsourced preference judgments. In SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010), pp. 1720.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×