Chamorro-Premuzic, Winsborough, Sherman, and Hogan (Reference Chamorro-Premuzic, Winsborough, Sherman and Hogan2016) end their focal article with a quote worth remembering from Immanuel Kant: “Theory without data is groundless, but data without theory is just uninterpretable.” We begin with a quote even better known to industrial–organizational (I-O) psychologists in part because it has served for over 65 years as a foundational principle of our field: “There is nothing as practical as a good theory” (Lewin, Reference Lewin and Cartwright1951, p. 169).
Fundamentally, we agree with the authors that, as a science, I-O psychology has developed over decades of theory development and empirical research a growing and more nuanced understanding of both the general and the specific attributes that predict effective performance. Equally important, our field has, through accumulated scientific discovery, developed an understanding of some of the boundary conditions and moderators impacting the degree to which our methods for assessing these attributes are effective as predictors of performance.
Beyond summarizing these key and ongoing accomplishments in talent assessment, the authors go on to describe what we consider to be a regrettable return to the dustbowl empiricism that characterized our field in the past. The vastness of data now available, and the increasing sophistication of the tools available to mine those data, means we have the potential to enhance predictive accuracy in the absence of having any organizing taxonomy, nomological network, or theoretical model to explain those predictions. Moreover, as the authors point out, there are practitioners who seem wholly unconcerned with developing such insights when armed instead with the power of thousands, or even millions, of data points. We have also observed this trend and want to press beyond the authors’ assertion that “predicting behavior is clearly a key priority in talent identification, but understanding behavior is equally important” (Chamorro-Premuzic et al., p. 634). We want to more loudly decry this indifference to the need for understanding.
Let us begin with the obvious. We are I-O psychologists, not human resource technologists or data scientists, and as such we are committed as a profession to understanding behavior at work and, in the process, contributing more broadly to the science of behavior (www.siop.org/mission.aspx). In our view, surrender to using “black box” solutions—when we don't understand why those solutions work—may in isolated cases be expedient but is simply not a long-term option for building our science. Our theories and models let us generalize to novel applications, settings, and roles. Our theories generate rational hypotheses about the critical factors likely to produce defined outcomes and therefore guide what we set about trying to manipulate and measure. Our theories give us insights that let us anticipate the boundary conditions within which our models are more or less likely to be predictive (Locke, Reference Locke2007). What distinguishes us as advisors to organizations around talent issues is in part our grasp of the conceptual frameworks we can apply to develop those insights and produce those hypotheses a priori in addition, of course, to the discipline and techniques for empirically testing those frameworks and hypotheses.
More practically, the models we develop as a science enable us to tell a coherent story to organizational decision makers as they consider alternative talent initiatives. Indeed, our problem with organizational decision makers is often not a preference for pure predictive power but rather quite the opposite, as evidenced by their embrace of “hot” constructs that provide a nice story but are not so well defined or have not yet empirically demonstrated explanatory power beyond more well-defined and validated existing constructs. Examples include notions like resilience, emotional intelligence, and learning agility. If we are to influence these decision makers to use effective, evidence-based approaches to talent management, we need to elaborate our models, tighten our definitions, and accumulate construct validity evidence, not push these efforts off as unimportant. The evidence-based constructs used by Hogan and his collaborators in recent years to create a compelling description of the “dark side” of leadership—constructs like derailers and overused strengths—are prime examples of the power of such explanatory frameworks to impact how managers think about themselves, their colleagues, and their bosses (e.g., Hogan, Reference Hogan2007; Kaplan & Kaiser, Reference Kaplan and Kaiser2013).
Furthermore, we need to remember that for the people in organizations, explanation is not merely a metaphenomenon with no functional value. The reputation of an organization as an employer of choice depends in part—especially given the power of social media—on the perceived procedural justice that produces a hire–not hire or promote–not promote decision (Gilliland, Reference Gilliland1993; Hausknecht, Day, & Thomas, Reference Hausknecht, Day and Thomas2004). Having a language to explain a decision to those affected by the decision in terms of job-relevant constructs is critical to building perceived procedural justice (Colquitt, LePine, Piccolo, Zapata, & Rich, Reference Colquitt, LePine, Piccolo, Zapata and Rich2012). In addition to justice perceptions, an employee's acceptance and self-awareness have been found to be critical mediators of the impact of assessment feedback on behavior change and subsequent increased effectiveness (e.g., Smither, London, & Reilly, Reference Smither, London and Reilly2005). An algorithm that spits out a judgment that an individual has or lacks leadership potential provides neither the affected individual nor the organization with the insight required to stimulate and direct initiatives to grow that leadership talent. For that, an explanatory framework is needed.
We as a field have seen the wasted effort and dead ends of dustbowl empiricism before. For many decades, for instance, personality constructs were seen as irrelevant to predicting organizational outcomes of interest (Guion & Gottier, Reference Guion and Gottier1965; Weiss & Adler, Reference Weiss, Adler, Staw and Cummings1984). Hundreds of personality variables were correlated with multiple performance criteria with no clear model of why particular personality constructs might be conceptually linked to the behaviors producing strong job performance. In part this dustbowl approach to research and practice was due to early, if primitive, versions of “big data”: personality inventories that produced, from a single tool, discrete scores on 10, 16, 32, or more predictor variables with little explicit commonality across tools, thus severely limiting their generalizability and contribution to the broader science. Not until Barrick and Mount (Reference Barrick and Mount1991) organized this hodgepodge of findings into a more coherent framework, the five-factor model, and applied the tools of validity generalization did generalizable patterns emerge. The emergence of this framework and the pattern of findings it revealed overturned our field's perspective on personality and work behavior and revolutionized the application of personality constructs both in our theories and in talent management practices. We saw a similar trend in the use of biodata, as our field moved to more theory-based “rainforest” approaches from decades of dustbowl empiricism where biodata items inexplicably gained or lost validity over time or across populations—inexplicably, because we didn't bother with explanations (Schmitt & Golubovich, Reference Schmitt, Golubovich and Geisinger2013). It may be possible that, with machine learning, contemporary algorithms can more quickly note the loss of validity of individual items and adjust accordingly to maintain predictive power. But we are in no better position than we were with dustbowl biodata tools to anticipate, understand, or explain what it is that we are doing.
We need theory and the empirical support for theory to identify causal relationships and the causal chains that link distal factors through mediating mechanisms to relevant outcomes. For example, what personality and capability dimensions affect behavior at work, how are those distal factors moderated by the climate and values of the organization to produce customer-focused behaviors, and how and when does that behavior translate into customer satisfaction and subsequent customer retention? Equipped with that understanding, organizational decision makers have a range of levers that can be adjusted to enhance desired outcomes. Relying on a black box, organizations trying to change these outcomes would, appropriately enough, be operating in the dark.
Organizations, job candidates, and employees are not the only stakeholders to understanding the factors underlying high stakes talent decisions. Talent practices are also subject to external legal and regulatory review, and the stakes for decisions around a person's present and future livelihood are arguably much higher than the stakes associated with common marketing applications of web-based big data used today (e.g., recommended Netflix films or Amazon books). How will we explain the new talent signals we use (e.g., liking The Godfather trilogy) to a judge? Will courts accept dustbowl empiricism as a defense for the job relatedness of talent assessment approaches when the evidence of job relatedness makes no attempt to reference job-related attributes like ability, personality, knowledge, or experience constructs? Without an explanation of what is being measured, will courts be content with the fact that, thanks to real-time machine learning, last year not liking curly fries contributed to rejecting a candidate, whereas this year a candidate's attitude toward curly fries is irrelevant to a selection decision? Will practitioners be able to interpret for the court what 10,000+ lines of code are doing to manipulate a huge set of 5,000 data points for each of thousands of applicants to maximize predictive accuracy, with no particular rationale for which data points are factored into the selection decision? Will algorithms constructed to avoid directly taking into account demographic identifiers that might create illegal discrimination (e.g., on the basis of race/ethnicity, gender, age, or health) still disadvantage—albeit blindly—other protected groups based on covariates of class membership (e.g., disability, religion, or sexual orientation) embedded in the thousands of data points used? If algorithms specifically exclude all such covariates, will predictive accuracy be affected?
So by all means let us as a field embrace the “datification of talent” and the prospect of new technologies adding breadth and depth to our ability to assess talent. But let's use those enhanced data collection tools to deepen our understanding of behavior at work and of the constructs that explain variation in performance across people, situations, and time. At the end, better theory will make for better practice.
Chamorro-Premuzic, Winsborough, Sherman, and Hogan (Reference Chamorro-Premuzic, Winsborough, Sherman and Hogan2016) end their focal article with a quote worth remembering from Immanuel Kant: “Theory without data is groundless, but data without theory is just uninterpretable.” We begin with a quote even better known to industrial–organizational (I-O) psychologists in part because it has served for over 65 years as a foundational principle of our field: “There is nothing as practical as a good theory” (Lewin, Reference Lewin and Cartwright1951, p. 169).
Fundamentally, we agree with the authors that, as a science, I-O psychology has developed over decades of theory development and empirical research a growing and more nuanced understanding of both the general and the specific attributes that predict effective performance. Equally important, our field has, through accumulated scientific discovery, developed an understanding of some of the boundary conditions and moderators impacting the degree to which our methods for assessing these attributes are effective as predictors of performance.
Beyond summarizing these key and ongoing accomplishments in talent assessment, the authors go on to describe what we consider to be a regrettable return to the dustbowl empiricism that characterized our field in the past. The vastness of data now available, and the increasing sophistication of the tools available to mine those data, means we have the potential to enhance predictive accuracy in the absence of having any organizing taxonomy, nomological network, or theoretical model to explain those predictions. Moreover, as the authors point out, there are practitioners who seem wholly unconcerned with developing such insights when armed instead with the power of thousands, or even millions, of data points. We have also observed this trend and want to press beyond the authors’ assertion that “predicting behavior is clearly a key priority in talent identification, but understanding behavior is equally important” (Chamorro-Premuzic et al., p. 634). We want to more loudly decry this indifference to the need for understanding.
Let us begin with the obvious. We are I-O psychologists, not human resource technologists or data scientists, and as such we are committed as a profession to understanding behavior at work and, in the process, contributing more broadly to the science of behavior (www.siop.org/mission.aspx). In our view, surrender to using “black box” solutions—when we don't understand why those solutions work—may in isolated cases be expedient but is simply not a long-term option for building our science. Our theories and models let us generalize to novel applications, settings, and roles. Our theories generate rational hypotheses about the critical factors likely to produce defined outcomes and therefore guide what we set about trying to manipulate and measure. Our theories give us insights that let us anticipate the boundary conditions within which our models are more or less likely to be predictive (Locke, Reference Locke2007). What distinguishes us as advisors to organizations around talent issues is in part our grasp of the conceptual frameworks we can apply to develop those insights and produce those hypotheses a priori in addition, of course, to the discipline and techniques for empirically testing those frameworks and hypotheses.
More practically, the models we develop as a science enable us to tell a coherent story to organizational decision makers as they consider alternative talent initiatives. Indeed, our problem with organizational decision makers is often not a preference for pure predictive power but rather quite the opposite, as evidenced by their embrace of “hot” constructs that provide a nice story but are not so well defined or have not yet empirically demonstrated explanatory power beyond more well-defined and validated existing constructs. Examples include notions like resilience, emotional intelligence, and learning agility. If we are to influence these decision makers to use effective, evidence-based approaches to talent management, we need to elaborate our models, tighten our definitions, and accumulate construct validity evidence, not push these efforts off as unimportant. The evidence-based constructs used by Hogan and his collaborators in recent years to create a compelling description of the “dark side” of leadership—constructs like derailers and overused strengths—are prime examples of the power of such explanatory frameworks to impact how managers think about themselves, their colleagues, and their bosses (e.g., Hogan, Reference Hogan2007; Kaplan & Kaiser, Reference Kaplan and Kaiser2013).
Furthermore, we need to remember that for the people in organizations, explanation is not merely a metaphenomenon with no functional value. The reputation of an organization as an employer of choice depends in part—especially given the power of social media—on the perceived procedural justice that produces a hire–not hire or promote–not promote decision (Gilliland, Reference Gilliland1993; Hausknecht, Day, & Thomas, Reference Hausknecht, Day and Thomas2004). Having a language to explain a decision to those affected by the decision in terms of job-relevant constructs is critical to building perceived procedural justice (Colquitt, LePine, Piccolo, Zapata, & Rich, Reference Colquitt, LePine, Piccolo, Zapata and Rich2012). In addition to justice perceptions, an employee's acceptance and self-awareness have been found to be critical mediators of the impact of assessment feedback on behavior change and subsequent increased effectiveness (e.g., Smither, London, & Reilly, Reference Smither, London and Reilly2005). An algorithm that spits out a judgment that an individual has or lacks leadership potential provides neither the affected individual nor the organization with the insight required to stimulate and direct initiatives to grow that leadership talent. For that, an explanatory framework is needed.
We as a field have seen the wasted effort and dead ends of dustbowl empiricism before. For many decades, for instance, personality constructs were seen as irrelevant to predicting organizational outcomes of interest (Guion & Gottier, Reference Guion and Gottier1965; Weiss & Adler, Reference Weiss, Adler, Staw and Cummings1984). Hundreds of personality variables were correlated with multiple performance criteria with no clear model of why particular personality constructs might be conceptually linked to the behaviors producing strong job performance. In part this dustbowl approach to research and practice was due to early, if primitive, versions of “big data”: personality inventories that produced, from a single tool, discrete scores on 10, 16, 32, or more predictor variables with little explicit commonality across tools, thus severely limiting their generalizability and contribution to the broader science. Not until Barrick and Mount (Reference Barrick and Mount1991) organized this hodgepodge of findings into a more coherent framework, the five-factor model, and applied the tools of validity generalization did generalizable patterns emerge. The emergence of this framework and the pattern of findings it revealed overturned our field's perspective on personality and work behavior and revolutionized the application of personality constructs both in our theories and in talent management practices. We saw a similar trend in the use of biodata, as our field moved to more theory-based “rainforest” approaches from decades of dustbowl empiricism where biodata items inexplicably gained or lost validity over time or across populations—inexplicably, because we didn't bother with explanations (Schmitt & Golubovich, Reference Schmitt, Golubovich and Geisinger2013). It may be possible that, with machine learning, contemporary algorithms can more quickly note the loss of validity of individual items and adjust accordingly to maintain predictive power. But we are in no better position than we were with dustbowl biodata tools to anticipate, understand, or explain what it is that we are doing.
We need theory and the empirical support for theory to identify causal relationships and the causal chains that link distal factors through mediating mechanisms to relevant outcomes. For example, what personality and capability dimensions affect behavior at work, how are those distal factors moderated by the climate and values of the organization to produce customer-focused behaviors, and how and when does that behavior translate into customer satisfaction and subsequent customer retention? Equipped with that understanding, organizational decision makers have a range of levers that can be adjusted to enhance desired outcomes. Relying on a black box, organizations trying to change these outcomes would, appropriately enough, be operating in the dark.
Organizations, job candidates, and employees are not the only stakeholders to understanding the factors underlying high stakes talent decisions. Talent practices are also subject to external legal and regulatory review, and the stakes for decisions around a person's present and future livelihood are arguably much higher than the stakes associated with common marketing applications of web-based big data used today (e.g., recommended Netflix films or Amazon books). How will we explain the new talent signals we use (e.g., liking The Godfather trilogy) to a judge? Will courts accept dustbowl empiricism as a defense for the job relatedness of talent assessment approaches when the evidence of job relatedness makes no attempt to reference job-related attributes like ability, personality, knowledge, or experience constructs? Without an explanation of what is being measured, will courts be content with the fact that, thanks to real-time machine learning, last year not liking curly fries contributed to rejecting a candidate, whereas this year a candidate's attitude toward curly fries is irrelevant to a selection decision? Will practitioners be able to interpret for the court what 10,000+ lines of code are doing to manipulate a huge set of 5,000 data points for each of thousands of applicants to maximize predictive accuracy, with no particular rationale for which data points are factored into the selection decision? Will algorithms constructed to avoid directly taking into account demographic identifiers that might create illegal discrimination (e.g., on the basis of race/ethnicity, gender, age, or health) still disadvantage—albeit blindly—other protected groups based on covariates of class membership (e.g., disability, religion, or sexual orientation) embedded in the thousands of data points used? If algorithms specifically exclude all such covariates, will predictive accuracy be affected?
So by all means let us as a field embrace the “datification of talent” and the prospect of new technologies adding breadth and depth to our ability to assess talent. But let's use those enhanced data collection tools to deepen our understanding of behavior at work and of the constructs that explain variation in performance across people, situations, and time. At the end, better theory will make for better practice.