Skip to main content Accessibility help
×
Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-01-12T21:24:42.210Z Has data issue: false hasContentIssue false

20 - Measurement

Reliability, Construct Validation, and Scale Construction

from Part IV - Understanding What Your Data Are Telling You About Psychological Processes

Published online by Cambridge University Press:  12 December 2024

Harry T. Reis
Affiliation:
University of Rochester, New York
Tessa West
Affiliation:
New York University
Charles M. Judd
Affiliation:
University of Colorado Boulder
Get access

Summary

Adequate measurement of psychological phenomena is a fundamental aspect of theory construction and validation. Forming composite scales from individual items has a long and honored tradition, although, for predictive purposes, the power of using individual items should be considered. We outline several fundamental steps in the scale construction process, including (1) choosing between prediction and explanation; (2) specifying the construct(s) to measure; (3) choosing items thought to measure these constructs; (4) administering the items; (5) examining the structure and properties of composites of items (scales); (6) forming, scoring, and examining the scales; and (7) validating the resulting scales.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2024

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Allport, G. W., and Odbert, H. S. (1936). Trait-names: A psycho-lexical study. Psychological Monographs, 47(211), DOI: 10.1037/h0093360.CrossRefGoogle Scholar
Allport, G. W., and Vernon, P. E. (1933). Studies in Expressive Movement. Macmillan.CrossRefGoogle Scholar
Arias, V. B., Garrido, L. E., Jenaro, C., Martinez-Molina, A., and Arias, B. (2020). A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods, 52(6), 24892505.CrossRefGoogle ScholarPubMed
Athenstaedt, U. (2003). On the content and structure of the gender role self-concept: Including gender-stereotypical behaviors in addition to traits. Psychology of Women Quarterly, 27(4), 309318.CrossRefGoogle Scholar
Bernaards, C., and Jennrich, R. (2005). Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement, 65(5), 676696.CrossRefGoogle Scholar
Bernreuter, R. (1931). Bernreuter Personality Inventory. Stanford University Press.Google Scholar
Binet, A., and Simon, T. (1905). New methods for the diagnosis of the intellectual level of subnormals. L’annee psychologique, 12, 191244 (translated in 1916 by E. S. Kite in The Development of Intelligence in Children. Publications of the Training School at Vineland).Google Scholar
Binet, A., and Simon, T. (1916). The Development of Intelligence in Children, translated by Kite, Elizabeth S. (ed. Goddard, H. H.). William and Wilkens Company.Google Scholar
Borsboom, D., Mellenbergh, G. J., and van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 10611071.CrossRefGoogle ScholarPubMed
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3(3), 296322.Google Scholar
Campbell, D. P., and Borgen, F. H. (1999). Holland’s theory and the development of interest inventories. Journal of Vocational Behavior, 55(1), 86101.CrossRefGoogle Scholar
Campbell, D. T., and Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait–multimethod matrix. Psychological Bulletin, 56(8), 81105.CrossRefGoogle ScholarPubMed
Clark, L. A., and Watson, D. (1995). Constructing validity: Basic issues in objective scale development. Psychological Assessment, 7(3), 309319.CrossRefGoogle Scholar
Clark, L. A., and Watson, D. (2019). Constructing validity: New developments in creating objective measuring instruments. Psychological Assessment, 31(12), 14121427.CrossRefGoogle ScholarPubMed
Condon, D. M. (2018). The SAPA Personality Inventory: An empirically-derived, hierarchically-organized self-report personality assessment model. PsyArXiv, /sc4p9/, DOI: 10.31234/osf.io/sc4p9.CrossRefGoogle Scholar
Condon, D. M. (2019). Database of individual differences survey tools. Harvard Dataverse, DOI: 10.7910/DVN/T1NQ4V.CrossRefGoogle Scholar
Condon, D. M. (2022, June). Retest reliability = f (stability, memory, personality)+ɛ. (presented at symposium in honor of Sarah Dubrow).Google Scholar
Condon, D. M., and Revelle, W. (2014). The international cognitive ability resource: Development and initial validation of a public-domain measure. Intelligence, 43, 5264.CrossRefGoogle Scholar
Condon, D. M., and Revelle, W. (2015). Selected personality data from the SAPA-Project: 08dec2013 to 26jul2014. Harvard Dataverse, DOI: 10.7910/DVN/SD7SVE.CrossRefGoogle Scholar
Condon, D. M., Roney, E., and Revelle, W. (2017a). Selected personality data from the sapa-project: 22dec2015 to 07feb2017 (48,350 participant data file and codebook). Harvard Dataverse, DOI: 10.7910/DVN/TZJGAT.CrossRefGoogle Scholar
Condon, D. M., Roney, E., and Revelle, W. (2017b). Selected personality data from the sapa-project: 26jul2014 to 22dec2015 (54,855 participant data file and codebook). Harvard Dataverse, DOI: 10.7910/DVN/GU70EV.CrossRefGoogle Scholar
Condon, D. M., Wood, D., Mõttus, R., Booth, T., Costantini, G., Greiff, S., Johnson, W., Lukaszewski, A., Murray, A., Revelle, W., Wright, A. G. C., Ziegler, M., and Zimmermann, J. (2020). Bottom up construction of a personality taxonomy. European Journal of Psychological Assessment, 36, 923934.CrossRefGoogle Scholar
Cronbach, L. J., and Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281302.CrossRefGoogle ScholarPubMed
Cureton, E. E. (1950). Validity, reliability, and baloney. Educational and Psychological Measurement, 10(1), 9496.CrossRefGoogle Scholar
Dawis, R. V. (1992). The individual differences tradition in counseling psychology. Journal of Counseling Psychology, 39(1), 719.CrossRefGoogle Scholar
Del Giudice, M. (2021). Individual and group differences in multivariate domains: What happens with the number of traits increases? PsyArXiv, DOI: 10.31234/osf.io/rgzd2.CrossRefGoogle Scholar
Eagly, A. H., and Revelle, W. (2022). Understanding the magnitude of psychological differences between women and men requires seeing the forest and the trees. Perspectives on Psychological Science, 17(5), DOI: 10.1177/17456916211046006.CrossRefGoogle ScholarPubMed
Elleman, L. G., McDougald, S., Revelle, W., and Condon, D. (2020). That takes the BISCUIT: A comparative study of predictive accuracy and parsimony of four statistical learning techniques in personality data, with data missingness conditions. European Journal of Psychological Assessment, 36(6), 948958.CrossRefGoogle Scholar
Embretson, S. (2007). Construct validity: A universal validity system or just another test evaluation procedure? Educational Researcher, 36(8), 449455.CrossRefGoogle Scholar
Eysenck, H. J., and Eysenck, S. B. G. (1964). Eysenck Personality Inventory. Educational and Industrial Testing Service.Google Scholar
Fyffe, S., Lee, P., and Kaplan, S. (2023). “transforming” personality scale development: Illustrating the potential of state-of-the-art natural language processing. Organizational Research Methods, DOI: 10.1177/10944281231155771.CrossRefGoogle Scholar
Galton, F. (1865). Hereditary talent and character. Macmillan’s Magazine, 12, 157166.Google Scholar
Galton, F. (1884). Measurement of character. Fortnightly Review, 36, 179185.Google Scholar
Goldberg, L. R. (1972). Parameters of personality inventory construction and utilization: A comparison of prediction strategies and tactics. Multivariate Behavioral Research Monographs. No 72-2, 7.Google Scholar
Goldberg, L. R. (1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 12161229.CrossRefGoogle ScholarPubMed
Goldberg, L. R. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment, 4(1), 2642.CrossRefGoogle Scholar
Goldberg, L. R. (1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In Mervielde, I., Deary, I., De Fruyt, F., and Ostendorf, F. (eds.) Personality Psychology in Europe, vol. 7. Tilburg University Press.Google Scholar
Goldberg, L. R. (2008). The Eugene-Springfield Community Sample: Information Available from the Research Participants (Technical Report No. 48-1). Oregon Research Institute.Google Scholar
Goldberg, L. R. (2010). Personality, demographics and self reported acts: The development of avocational interest scales from estimates of the amount time spent in interest-related activities. In Agnew, C., Carlston, D., Graziano, W., and Kelly, J. (eds.) Then a Miracle Occurs: Focusing on the Behavior in Social Psychological Theory and Research. Oxford University Press.Google Scholar
Goldberg, L. R., and Kilkowski, J. M. (1985). The prediction of semantic consistency in self-descriptions: Characteristics of persons and of terms that affect the consistency of responses to synonym and antonym pairs. Journal of Personality and Social Psychology, 48(1), 8298.CrossRefGoogle ScholarPubMed
Goldberg, L. R., and Saucier, G. (2016). The Eugene-Springfield Community Sample: Information Available from the Research Participants (Technical Report No. 56-1). Oregon Research Institute.Google Scholar
Gough, H. G. (1965) Conceptual analysis of psychological test scores and other diagnostic variables. Journal of Abnormal Psychology, 70, 294302.CrossRefGoogle ScholarPubMed
Graziano, W. G., Jensen-Campbell, L. A., Steele, R. G., and Hair, E. C. (1998). Unknown words in self-reported personality: Lethargic and provincial in Texas. Personality and Social Psychology Bulletin, 24(8), 893905.CrossRefGoogle Scholar
Gruber, F. M., Distlberger, E., Scherndl, T., Ortner, T. M., and Pletzer, B. (2020). Psychometric properties of the multifaceted gender-related attributes survey (GERAS). European Journal of Psychological Assessment, 36(4), 612623.CrossRefGoogle ScholarPubMed
Guttman, L. (1945). A basis for analyzing test–retest reliability. Psychometrika, 10(4), 255282.CrossRefGoogle ScholarPubMed
Hathaway, S., and McKinley, J. (1943). Manual for Administering and Scoring the MMPI. University of Minnesota Press.Google Scholar
Hogan, R., and Nicholson, R. A. (1988). The meaning of personality test scores. American Psychologist, 43(8), 621626.CrossRefGoogle Scholar
Holzinger, K., and Swineford, F. (1937). The bi-factor method. Psychometrika, 2(1), 4154.CrossRefGoogle Scholar
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179185.CrossRefGoogle ScholarPubMed
Johnson, J. A. (2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39(1), 103129.CrossRefGoogle Scholar
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology (140), 153.Google Scholar
Likert, R., Roslow, S., and Murphy, G. (1934). A simple and reliable method of scoring the Thurstone attitude scales. Journal of Social Psychology, 5(2), 228238.CrossRefGoogle Scholar
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports Monograph Supplement 9, 3, 635694.Google Scholar
Lord, F. M., and Novick, M. R. (1968) Statistical Theories of Mental Test Scores. Addison-Wesley.Google Scholar
McDonald, R. P. (1999). Test Theory: A Unified Treatment. L. Erlbaum Associates.Google Scholar
McNemar, Q. (1946). Opinion–attitude methodology. Psychological Bulletin, 43(4), 289374.CrossRefGoogle ScholarPubMed
Meade, A. W., and Craig, S. B. (2012). Identifying careless responses in survey data. Psychological methods, 17(3), 437455.CrossRefGoogle ScholarPubMed
Mõttus, R., Wood, D., Condon, D. M., Back, M. D., Baumert, A., Costantini, G., Epskamp, S., Greiff, S., Johnson, W., Lukaszewski, A., Murray, A., Revelle, W., Wright, A. G. C., Yarkoni, T., Ziegler, M., and Zimmermann, J. (2020). Descriptive, predictive and explanatory personality research: Different goals, different approaches, but a shared need to move beyond the big few traits. European Journal of Personality, 34(6), 11751201.CrossRefGoogle Scholar
Nájera, P., Abad, F. J. and Sorrel, M. A. (in press). Is EFA always to be preferred? A systematic comparison of factor analytic techniques throughout the confirmatory-exploratory continuum. Psychological Methods.Google Scholar
Nichols, D. S., and Greene, R. L. (1997). Dimensions of deception in personality assessment: The example of the MMPI-2. Journal of Personality Assessment, 68(2), 251266.CrossRefGoogle ScholarPubMed
Norman, W. T. (1963). Toward an adequate taxonomy of personality attributes: Replicated factors structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology, 66, 574583.CrossRefGoogle ScholarPubMed
Core Team, R. (2023). R: A Language and Environment for Statistical Computing (computer software manual), www.R-project.org.Google Scholar
Reise, S. P. (2012). The rediscovery of bifactor measurement models. Multivariate Behavioral Research, 47(5), 667696.CrossRefGoogle ScholarPubMed
Reise, S. P., Morizot, J., and Hays, R. (2007). The role of the bifactor model in resolving dimensionality issues in health outcomes measures. Quality of Life Research, 16(0), 1931.CrossRefGoogle ScholarPubMed
Revelle, W. (1979). Hierarchical cluster-analysis and the internal structure of tests. Multivariate Behavioral Research, 14(1), 5774.CrossRefGoogle ScholarPubMed
Revelle, W. (2023a). psych: Procedures for Psychological, Psychometric, and Personality Research, ed. 2.3.3 (computer software manual). psych.Google Scholar
Revelle, W. (2023b). psychTools Tools to Accompany the psych Package for Psychological Research, R package version 2.3.3 (computer software manual). psychTools.Google Scholar
Revelle, W., and Anderson, K. J. (1998). Personality, Motivation and Cognitive Performance: Final Report to the Army Research Institute on Contract MDA 903-93-K-0008. Northwestern University.Google Scholar
Revelle, W., and Condon, D. M. (2019). Reliability from α to ω: A tutorial. Psychological Assessment., 31(12), 13951411.CrossRefGoogle Scholar
Revelle, W., Condon, D. M., Wilt, J., French, J. A., Brown, A., and Elleman, L. G. (2017). Web- and phone-based data collection using planned missing designs. In Fielding, N. G., Lee, R. M., and Blank, G. (eds.) Sage Handbook of Online Research Methods, 2nd ed. Sage Publications, Inc.Google Scholar
Revelle, W., Dworak, E. M., and Condon, D. M. (2021). Exploring the persome: The power of the item in understanding personality structure. Personality and Individual Differences, 169, DOI: 10.1016/j.paid.2020.109905.CrossRefGoogle Scholar
Reyes, D. L. (2020). Combatting carelessness: Can placement of quality check items help reduce careless responses? Current Psychology, 41(2), DOI: 10.1007/s12144-020-01183-4.Google Scholar
Robins, R. W., Hendin, H. M., and Trzesniewski, K. H. (2001). Measuring global self-esteem: Construct validation of a single-item measure and the Rosenberg self-esteem scale. Personality and Social Psychology Bulletin, 27(2), 151161.CrossRefGoogle Scholar
Rodgers, J. L., and Nicewander, W. A. (1988). Thirteen ways to look at the correlation coefficient. American Statistician, 42(1), 5966.CrossRefGoogle Scholar
Sartori, R., and Pasini, M. (2007). Quality and quantity in test validity: How can we be sure that psychological tests measure what they have to? Quality & Quantity, 41(3), 359374.CrossRefGoogle Scholar
Schmid, J. J., and Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 8390.CrossRefGoogle Scholar
Schwaba, T., Rhemtulla, M., Hopwood, C. J., and Bleidorn, W. (2020). A facet atlas: Visualizing networks that describe the blends, cores, and peripheries of personality structure. PLOS ONE, 15(7), 121.CrossRefGoogle ScholarPubMed
Simms, L. J., Zelazny, K., Williams, T. F., and Bernstein, L. (2019). Does the number of response options matter? Psychometric perspectives using personality questionnaire data. Psychological Assessment, 31(4), 557566.CrossRefGoogle ScholarPubMed
Spearman, C. (1904a). “General intelligence,” objectively determined and measured. American Journal of Psychology, 15(2), 201292.CrossRefGoogle Scholar
Spearman, C. (1904b). The proof and measurement of association between two things. American Journal of Psychology, 15(1), 72101.CrossRefGoogle Scholar
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3(3), 271295.Google Scholar
Strong, E. K., Jr. (1927). Vocational interest test. Educational Record, 8(2), 107121.Google Scholar
Thayer, R. E. (1989). The Biopsychology of Mood and Arousal. Oxford University Press.Google Scholar
Ward, M., and Meade, A. W. (2018). Applying social psychology to prevent careless responding during online surveys. Applied Psychology, 67(2), 231263.CrossRefGoogle Scholar
Widaman, K. F., and Revelle, W. (2022). Thinking thrice about sum scores, and then some more about measurement and analysis. Behavior Research Methods, 55(3), DOI: 10.3758/s13428-022-01849-w.CrossRefGoogle Scholar
Woods, S. A., and Hampson, S. E. (2005). Measuring the Big Five with single items using a bipolar response scale. European Journal of Personality, 19(5), 373390.CrossRefGoogle Scholar
Yarkoni, T., and Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science, 12(6), 11001122.CrossRefGoogle ScholarPubMed
Zhang, X., and Savalei, V. (2016). Improving the factor structure of psychological scales: The expanded format as an alternative to the likert scale format. Educational and Psychological Measurement, 76(3), 357386.CrossRefGoogle Scholar
Zimmerman, J. (2020). Descriptive, predictive and explanatory personality research: Different goals, different approaches, but a shared need to move beyond the big few traits. European Journal of Personality, 34(6), DOI: 10.1002/per.2311.Google Scholar
Zinbarg, R. E., Revelle, W., Yovel, I., and Li, W. (2005). Cronbach’s α, Revelle’s β, and McDonald’s ωH: Their relations with each other and two alternative conceptualizations of reliability. Psychometrika, 70(1), 123133.CrossRefGoogle Scholar
Zola, A., Condon, D. M., and Revelle, W. (2021, 08). The convergence of self and informant reports in a large online sample. Collabra: Psychology, 7(1), 25983, DOI: 10.1525/collabra.25983.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×