Testing Equivalence with Repeated Measures: Tests of the Difference Model of Two-Alternative Forced-Choice Performance

Miguel A. García-Pérez; Rocío Alcalá-Quintana

doi:10.5209/rev_SJOP.2011.v14.n2.48

Testing Equivalence with Repeated Measures: Tests of the Difference Model of Two-Alternative Forced-Choice Performance

Published online by Cambridge University Press: 10 January 2013

Miguel A. García-Pérez and

Rocío Alcalá-Quintana

Show author details

Miguel A. García-Pérez*: Affiliation:
Universidad Complutense (Spain)
Rocío Alcalá-Quintana: Affiliation:
Universidad Complutense (Spain)
*: Correspondence concerning this article should be addressed to Miguel A. García-Pérez. Departamento de Metodología, Facultad de Psicología, Universidad Complutense, Campus de Somosaguas. 28223 Madrid (Spain). Phone: +34-913943061. Fax: +34-913943189. E-mail: miguel@psi.ucm.es

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Solving theoretical or empirical issues sometimes involves establishing the equality of two variables with repeated measures. This defies the logic of null hypothesis significance testing, which aims at assessing evidence against the null hypothesis of equality, not for it. In some contexts, equivalence is assessed through regression analysis by testing for zero intercept and unit slope (or simply for unit slope in case that regression is forced through the origin). This paper shows that this approach renders highly inflated Type I error rates under the most common sampling models implied in studies of equivalence. We propose an alternative approach based on omnibus tests of equality of means and variances and in subject-by-subject analyses (where applicable), and we show that these tests have adequate Type I error rates and power. The approach is illustrated with a re-analysis of published data from a signal detection theory experiment with which several hypotheses of equivalence had been tested using only regression analysis. Some further errors and inadequacies of the original analyses are described, and further scrutiny of the data contradict the conclusions raised through inadequate application of regression analyses.

Resolver problemas teóricos o empíricos requiere en ocasiones contrastar la equivalencia de dos variables usando medidas repetidas. El mero planteamiento de este objetivo supone un desafío para la lógica subyacente a los métodos de contraste de hipótesis estadísticas, que están diseñados para evaluar la magnitud de la evidencia contraria a la hipótesis nula y de ningún modo permiten evaluar la evidencia a favor de ella. En algunos contextos aplicados se ha abordado el problema utilizando métodos de regresión y contrastando la hipótesis de que la pendiente es 1 y la hipótesis de que la ordenada en el origen es 0 (o simplemente la primera de ellas cuando se fuerza la regresión “por el origen”). Este trabajo muestra que esa estrategia conlleva tasas empíricas de error tipo I muy superiores a las tasas nominales bajo cualquiera de los modelos de muestreo más comúnmente implicados en estudios de equivalencia. Como alternativa, se propone una estrategia basada tanto en pruebas tipo ómnibus que incluyen contrastes de medias y varianzas como en análisis sujeto a sujeto (cuando la situación lo permita). Un estudio de simulación con estas pruebas muestra que la tasa empírica de error tipo I se ajusta a la tasa nominal y que la potencia de los contrastes es adecuada. A modo de ilustración, se aplican estos contrastes para re-analizar los datos de un experimento psicofísico sobre detección de contraste que originalmente sólo fueron analizados mediante regresión por parte de los autores del estudio, pese a que todas las hipótesis consideradas implicaban equivalencia con medidas repetidas. Nuestro reanálisis permite una inspección más minuciosa de los datos que revela contradicciones entre las características empíricas de los datos y las conclusiones extraídas mediante la aplicación inadecuada de métodos de regresión. Los resultados de este re-análisis también invalidan las conclusiones extraídas en la publicación original.

Keywords

statistical equivalence repeated measures Signal Detection Theory Yes–No 2AFC interval bias Standard Difference Model equivalencia estadística medidas repetidas Teoría de D etección de Señales Sí–No elección forzada entre dos alternativas efectos de orden

Information

Type: Research Article
Information: The Spanish Journal of Psychology , Volume 14 , Issue 2 , November 2011 , pp. 1023 - 1049

DOI: https://doi.org/10.5209/rev_SJOP.2011.v14.n2.48 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Alcalá-Quintana, R., & García-Pérez, M. A. (2007). A comparison of fixed-step-size and Bayesian staircases for sensory threshold estimation. Spatial Vision, 20, 197–218.Google Scholar

Altman, D. G., & Bland, J. M. (1983). Measurement in medicine: The analysis of method comparison studies. The Statistician, 32, 307–317. doi:10.2307/2987937Google Scholar

Anderson, S., & Hauck, W. W. (1983). A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Communications in Statistics – Theory and Methods, 12, 2663–2692. doi:10.1080/03610928308828634Google Scholar

Astrua, M., Ichim, D., Pennecchi, F., & Pisani, M. (2007). Statistical techniques for assessing agreement between two instruments. Metrologia, 44, 385–392. doi:10.1088/0026-1394/44/5/015Google Scholar

Baguley, T., Lansdale, M. W., Lines, L. K., & Parkin, J. K. (2006). Two spatial memories are not better than one: Evidence of exclusivity in memory for object location. Cognitive Psychology, 52, 243–289. doi:10.1016/j.cogpsych.2005.08.001Google Scholar

Benjamini, Y. (1983). Is the t test really conservative when the parent distribution is long-tailed? Journal of the American Statistical Association, 78, 645–654. doi:10.2307/2288133Google Scholar

Blackwelder, W. C. (1982). “Proving the null hypothesis” in clinical trials. Controlled Clinical Trials, 3, 345–353. doi:10.1016/0197-2456(81)90059-3Google Scholar

Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327, 307–310. doi:10.1016/j.ijnurstu.2009.10.001Google Scholar

Bland, J. M., & Altman, D. G. (1999). Measuring agreement in method comparison studies. Statistical Methods in Medical Research, 8, 135–160. doi:10.1191/096228099673819272Google Scholar

Bland, J. M., & Altman, D. G. (2003). Applying the right statistics: Analyses of measurement studies. Ultrasound in Obstetrics and Gynecology, 22, 85–93. doi:10.1002/uog.122Google Scholar

Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440. doi:10.1007/s11336-006-1447-6Google Scholar

Bradley, E. L., & Blackwood, L. G. (1989). Comparing paired data: A simultaneous test for means and variances. The American Statistician, 43, 234–235. doi:10.2307/2685368Google Scholar

Brink, W. P. van den, & Koele, P. (1980). Item sampling, guessing and decision-making in achievement testing. British Journal of Mathematical and Statistical Psychology, 33, 104–108.Google Scholar

Casella, G. (1983). Leverage and regression through the origin. The American Statistician, 37, 147–152. doi:10.2307/2685876Google Scholar

Chatterjee, S., Hadi, A. S., & Price, B. (2000). Regression Analysis by Example (3rd edition). New York, NY: Wiley.Google Scholar

Corina, D. P. (1999). On the nature of left hemisphere specialization for signed language. Brain and Language, 69, 230–240. doi:10.1006/brln.1999.2062Google Scholar

Cox, N. J. (2006). Assessing agreement of measurements and predictions in geomorphology. Geomorphology, 76, 332–346. doi:10.1016/j.geomorph.2005.12.001Google Scholar

Cressie, N. (1980). Relaxing assumptions in the one-sample t-test. Australian Journal of Statistics, 22, 143–153 doi:10.1111/j.1467-842X.1980.tb01161.xGoogle Scholar

Cusack, R., & Carlyon, R. P. (2003). Perceptual asymmetries in audition. Journal of Experimental Psychology: Human Perception and Performance, 29, 713–725. doi:10.1037/0096-1523.29.3.713Google Scholar

Diederich, A., & Colonius, H. (2011). Modeling multisensory processes in saccadic responses: Time-window-of-integration model. In Murray, M. M. & Wallace, M. T. (Eds.), The Neural bases of multisensory processes. Boca Raton, FL: CRC Press, in press.Google Scholar

Dierdorff, E. C., & Morgeson, F. P. (2007). Consensus in work role requirements: The influence of discrete occupational context on role expectations. Journal of Applied Psychology, 92, 1228–1241. doi:10.1037/0021-9010.92.5.1228Google Scholar

Dixon, P., & O'Reilly, T. (1999). Scientific versus statistical inference. Canadian Journal of Experimental Psychology, 53, 133–149. doi:10.1037/h0087305Google Scholar

Dunn, G., & Roberts, C. (1999). Modelling method comparison data. Statistical Methods in Medical Research, 8, 161–179. doi:10.1191/096228099668524590Google Scholar

Dunnett, C. W., & Gent, M. (1977). Significance testing to establish equivalence between treatments, with special reference to data in the form of 2 × 2 tables. Biometrics, 33, 593–602. doi:10.2307/2529457Google Scholar

Edgell, S. E. (1995). Commentary on “Accepting the null hypothesis.” Memory & Cognition, 23, 525. doi:10.3758/BF03197252Google Scholar

Eisenhauer, J. G. (2003). Regression through the origin. Teaching Statistics, 25, 76–80. doi:10.1111/1467-9639.00136Google Scholar

Ferrand, L. (1999). Why naming takes longer than reading? The special case of Arabic numbers. Acta Psychologica, 100, 253–266. doi:10.1016/S0001-6918(98)00021-3Google Scholar

Freund, R. J., Wilson, W. J., & Sa, P. (2006). Regression Analysis: Statistical Modeling of a Response Variable (2nd edition). Burlington, MA: Academic Press.Google Scholar

Frick, R. R. (1995a). Accepting the null hypothesis. Memory & Cognition, 23, 132–138. doi:10.3758/BF03210562Google Scholar

Frick, R. R. (1995b). A reply to Edgell. Memory & Cognition, 23, 526. doi:10.3758/BF03197253Google Scholar

García-Pérez, M. A. (1989). Item sampling, guessing, partial information and decision-making in achievement testing. In Roskam, E. E. (Ed.), Mathematical Psychology in Progress (pp. 249–265). Berlin, Germany: Springer.Google Scholar

García-Pérez, M. A. (2010). Statistical criteria for parallel tests: A comparison of accuracy and power. Manuscript submitted for publication.Google Scholar

García-Pérez, M. A., & Alcalá-Quintana, R. (2009). Fixed vs. variable noise in 2AFC contrast discrimination: Lessons from psychometric functions. Spatial Vision, 22, 273–300. doi:10.1163/156856809788746309Google Scholar

García-Pérez, M. A., & Núñez-Antón, V. (2009). Accuracy of power-divergence statistics for testing independence and homogeneity in two-way contingency tables. Communications in Statistics – Simulation and Computation, 38, 503–512. doi:10.1080/03610910802538351Google Scholar

Gigerenzer, G. (1993). The Superego, the Ego, and the Id in statistical reasoning. In Keren, G. & Lewis, C. (Eds.), A handbook for data analysis in the behavioral sciences. Methodological issues. (pp. 311–339). Hillsdale, NJ: Erlbaum.Google Scholar

Gigerenzer, G. (1998). We need statistical thinking, not statistical rituals. Behavioral and Brain Sciences, 21, 199–200. doi:10.1017/S0140525X98281167Google Scholar

Goertzen, J. R., & Cribbie, R. A. (2010). Detecting a lack of association: An equivalence testing approach. British Journal of Mathematical and Statistical Psychology, 63, 527–537. doi:10.1348/000711009X475853Google Scholar

Good, P. I., & Hardin, J. W. (2006). Common errors in statistics (and how to avoid them) (2nd edition). Hoboken, NJ: Wiley.Google Scholar

Goodman, S. N., & Royall, R. (1988). Evidence and scientific research. American Journal of Public Health, 78, 1568–1574. doi:10.2105/AJPH.78.12.1568Google Scholar

Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley.Google Scholar

Hacking, I. (1965). The logic of statistical inference. Cambridge, UK: Cambridge University Press.Google Scholar

Hahn, G. J. (1977). Fitting regression models with no intercept term. Journal of Quality Technology, 9, 56–61.Google Scholar

Hawkins, D. M. (2002). Diagnostics for conformity of paired quantitative measurements. Statistics in Medicine, 21, 1913–1935. doi:10.1002/sim.1013Google Scholar

Hays, S., & McCallum, R. S. (2005). A comparison of the pencil-and-paper and computer-administered Minnesota Multiphasic Personality Inventory–Adolescent. Psychology in the Schools, 42, 605–613. doi:10.1002/pits.20106Google Scholar

Hietanen, J. K., & Leppänen, J. M. (2003). Does facial expression affect attention orienting by gaze direction cues? Journal of Experimental Psychology: Human Perception and Performance, 29, 1228–1243. doi:10.1037/0096-1523.29.6.1228Google Scholar

Hollands, J. G., & Spence, I. (1998). Judging proportion with graphs: The summation model. Applied Cognitive Psychology, 12, 173–190. doi:10.1002/(SICI)1099-0720(199804)12:2<173::AID-ACP499>3.0.CO;2-KGoogle Scholar

Huntsman, L. A. (1998). Testing the direct-access model: GOD does not prime DOG. Perception & Psychophysics, 60, 1128–1140. doi:10.3758/BF03206163Google Scholar

Jäkel, F., & Wichmann, F. A. (2006). Spatial four-alternative forced-choice method is the preferred psychophysical method for naïve observers. Journal of Vision, 6, 1307–1322. doi:10.1167/6.11.13Google Scholar

Jordan, P. J., & Troth, A. C. (2004). Managing emotions during team problem solving: Emotional intelligence and conflict resolution. Human Performance, 17, 195–218. doi:10.1207/s15327043hup1702_4Google Scholar

Kane, M. J., Poole, B. J., Tuholski, S. W., & Engle, R. W. (2006). Working memory capacity and the top-down control of visual search: Exploring the boundaries of “executive attention.” Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 749–777. doi:10.1037/0278-7393.32.4.749Google Scholar

Kirkwood, T. B. L. (1981). Bioequivalence testing – A need to rethink. Biometrics, 37, 589–591. doi:10.2307/2530573Google Scholar

Lin, L. I.-K. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45, 255–268. doi:10.2307/2532051Google Scholar

Lin, L. I.-K. (1992). Assay validation using the concordance correlation coefficient. Biometrics, 48, 599–604. doi:10.2307/2532314Google Scholar

Lin, L. I.-K. (2000). Correction: A note on the concordance correlation coefficient. Biometrics, 56, 324–325.Google Scholar

Lin, L., Hedayat, A. S., Sinha, B., & Yang, M. (2002). Statistical methods for assessing agreement: Models, issues, and tools. Journal of the American Statistical Association, 97, 257–270. doi:10.1198/016214502753479392Google Scholar

Loftus, G. (1985). Johannes Kepler's computer simulation of the universe: Some remarks about theory in psychology. Behavior Research Methods, Instruments, & Computers, 17, 149–156.Google Scholar

Los, S. A. (2004). Inhibition of return and nonspecific preparation: Separable inhibitory control mechanisms in space and time. Perception & Psychophysics, 66, 119–130. doi:10.3758/BF03194866Google Scholar

Macmillan, N. A., & Creelman, C. D. (2005). Detection Theory: A user's guide. Mahwah, NJ: Erlbaum.Google Scholar

McNicol, D. (2005). A primer of Signal Detection Theory. Mahwah, NJ: Erlbaum.Google Scholar

Metzler, C. M. (1974). Bioavailability – A problem in equivalence. Biometrics, 30, 309–317. doi:10.2307/2529651Google Scholar

Miller, J. (1996). The sampling distribution of d'. Perception & Psychophysics, 58, 65–72. doi:10.3758/BF03205476Google Scholar

Mukherjee, C., White, H., & Wuyts, M. (1998). Econometrics and data analysis for developing countries. New York, NY: Routledge.Google Scholar

Myers, R. H. (1990). Classical and modern regression with applications (2nd edition). Boston, MA: PWS-KENT.Google Scholar

Neter, J., Kutner, M. H., Wasserman, W., & Nachtsheim, C. J. (1996). Applied linear statistical models (4th edition). Chicago, IL: Irwin.Google Scholar

Perea, M., & Rosa, E. (2002). Does the proportion of associatively related pairs modulate the associative priming effect at very brief stimulus-onset asynchronies? Acta Psychologica, 110, 103–124. doi:10.1016/S0001-6918(01)00074-9Google Scholar

Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553–565. doi:10.1037//0033-2909.113.3.553Google Scholar

Rorden, C., Karnath, H.O., & Driver, J. (2001). Do neck-proprioceptive and caloric-vestibular stimulation influence covert visual attention in normals, as they influence visual neglect? Neuropsychologia, 39, 364–375. doi:10.1016/S0028-3932(00)00126-3Google Scholar

Russo, R., Fox, E., & Bowles, R. J. (1999). On the status of implicit memory bias in anxiety. Cognition and Emotion, 13, 435–456. doi:10.1080/026999399379258Google Scholar

Saint-Aubin, J., & Poirier, M. (1999). Semantic similarity and immediate serial recall: Is there a detrimental effect on order information? Quarterly Journal of Experimental Psychology, 52(A), 367–394. doi:10.1080/027249899391115Google Scholar

Segrin, C. (2004). Concordance on negative emotion in close relationships: Transmission of emotion or assortative mating? Journal of Social and Clinical Psychology, 23, 836–856. doi:10.1521/jscp.23.6.836.54802Google Scholar

Selwyn, M. R., Demptster, A. P., & Hall, N. R. (1981). A Bayesian approach to bioequivalence for the 2 × 2 changeover design. Biometrics, 37, 11–21. doi:10.2307/2530518Google Scholar

Selwyn, M. R., & Hall, N. R. (1984). On Bayesian methods for bioequivalence. Biometrics, 40, 1103–1108. doi:10.2307/2531161Google Scholar

Sen, A., & Srivastava, M. (1990). Regression analysis. Theory, methods, and applications. New York, NY: Springer.Google Scholar

Smith, R. W., & Kounios, J. (1996). Sudden insight: All-or-none processing revealed by speed–accuracy decomposition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 22, 1443–1462. doi:10.1037//0278-7393.22.6.1443Google Scholar

Spence, C., & Driver, J. (1997). Audiovisual links in exogenous covert spatial orienting. Perception & Psychophysics, 59, 1–22. doi:10.3758/BF03206843Google Scholar

Spence, C., & Driver, J. (1998). Auditory and audiovisual inhibition of return. Perception & Psychophysics, 60, 125–139. doi:10.3758/BF03211923Google Scholar

Stegner, B. L., Bostrom, A. G., & Greenfield, T. K. (1996). Equivalence testing for use in psychosocial and services research: An introduction with examples. Evaluation and Program Planning, 19, 193–198. doi:10.1016/0149-7189(96)00011-0Google Scholar

Van Berkum, J. J. A. (1997). Syntactic processes in speech production: The retrieval of grammatical gender. Cognition, 64, 115–152. doi:10.1016/S0010-0277(97)00026-7Google Scholar

van Stralen, K. J., Jager, K. J., Zoccali, C., & Dekker, F. W. (2008). Agreement between methods. Kidney International, 74, 1116–1120. doi:10.1038/ki.2008.306Google Scholar

Tipples, J., & Sharma, D. (2000). Orienting to exogenous cues and attentional bias to affective pictures reflect separate processes. British Journal of Psychology, 91, 87–97. doi:10.1348/000712600161691Google Scholar

Tryon, W. W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological Methods, 6, 371–386.Google Scholar

Tryon, W. W., & Lewis, C. (2008). An inferential confidence interval method for establishing statistical equivalence that corrects Tryon's (2001) reduction factor. Psychological Methods, 13, 272–277. doi:10.1037/a0013158Google Scholar

Turner, M. E. (1960). Straight line regression through the origin. Biometrics, 16, 483–485. doi:10.2307/2527698Google Scholar

Vatakis, A., & Spence, C. (2008). Evaluating the influence of the ‘unity assumption’ on the temporal perception of realistic audiovisual stimuli. Acta Psychologica, 127, 12–23. doi:10.1016/j.actpsy.2006.12.002Google Scholar

Vatakis, A., Ghazanfar, A. A., & Spence, C. (2008). Facilitation of multisensory integration by the “unity effect” reveals that speech is special. Journal of Vision, 8(9), 1–11. doi:10.1167/8.9.14Google Scholar

Wang, C. M., & Iyer, H. K. (2008). Fiducial approach for assessing agreement between two instruments. Metrologia, 45, 415–421. doi:10.1088/0026-1394/45/4/006Google Scholar

Westgard, J. O., & Hunt, M. R. (1973). Use and interpretation of common statistical tests in method-comparison studies. Clinical Chemistry, 19, 49–57. doi:10.1373/clinchem.2007.094060Google Scholar

Westlake, W. J. (1976). Symmetrical confidence intervals for bioequivalence trials. Biometrics, 32, 741–744. doi:10.2307/2529259Google Scholar

Westlake, W. J. (1979). Statistical aspects of comparative bioavailability trials. Biometrics, 35, 273–280. doi:10.2307/2529949Google Scholar

Westlake, W. J. (1981). Bioequivalence testing – A need to rethink (Reader reaction response). Biometrics, 37, 591–593.Google Scholar

Wickens, T. D. (2002). Elementary Signal Detection Theory. New York, NY: Oxford.Google Scholar

Yeshurun, Y., Carrasco, M., & Maloney, L. T. (2008). Bias and sensitivity in two-interval forced choice procedures: Tests of the difference model. Vision Research, 48, 1837–1851. doi:10.1016/j.visres.2008.05.008Google Scholar

Zampini, M., Brown, T., Shore, D. I., Maravita, A., Röder, B., & Spence, C. (2005). Audiotactile temporal order judgments. Acta Psychologica, 118, 277–291. doi:10.1016/j.actpsy.2004.10.017Google Scholar

Article contents

Testing Equivalence with Repeated Measures: Tests of the Difference Model of Two-Alternative Forced-Choice Performance

Abstract

Keywords

Information

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests