What is quality in the context of applied linguistics research? We all care about quality. We all want to read and produce quality research. And I think we’d all agree that there’s little point to any discipline that doesn’t insist on quality by means of peer review, scholarly mentoring, and/or other mechanisms, whether structural or more grassroots in nature. However, we, as a field, have yet to arrive at an agreed-upon understanding of this notion. This paper offers a framework for conceptualizing and assessing study quality. I don’t expect full or immediate consensus, but I hope the proposed framework will serve as a step toward understanding and operationalizing the notion of quality in a way that will support both the intellectual and ethical imperative of the scholarly discipline we call applied linguistics.
What is quality?
One (perhaps obvious) way to start addressing this question would be to turn to the Mertonian Norms of Science. Proposed in Reference Merton and Merton1942 by sociologist Robert Merton, this set of four principles – communism, universalism, disinterestedness, and organized skepticism – was meant as a guide for the modern, institutionally based scientific enterprise. These norms are certainly valuable and are worthy of consideration. And striving to embrace them might satisfy – or at least open up conversations with – some of our colleagues in sociology or philosophy of science. But I don’t think the Mertonian Norms of Science can be used as a definition of quality for applied linguistics; they’re too abstract and would be very difficult – if not impossible – to operationalize.
I should mention that this is not my first attempt to define study quality. I tried to take this on in my dissertation (Plonsky, Reference Plonsky2011) and in the articles that were based on it. In Plonsky (Reference Plonsky2013), for example, I defined quality as “the combination of (a) adherence to standards of contextually appropriate methodological rigor in research practices and (b) transparent and complete reporting of such practices” (p. 657). That definition and its corresponding operationalization served as a useful starting point and led to a number of insights into a wide range of quantitative research practices found in the field. That work also has led to dozens of methodological syntheses seeking to assess the quality within and across different domains (e.g., Burston et al., Reference Burston, Athanasiou and Giannakou2024; Li, Reference Li2023; Sudina, Reference Sudina2023b). Looking back, however, that definition was much too narrow both for a concept as broad as quality and for a field as broad as applied linguistics. For example, the quality of a given study (or set of studies) can also be conceived of and assessed in terms of its contribution to society. In addition, my earlier definition, to some extent, and certainly the way I operationalized it were focused very heavily on quantitative research.
Partly recognizing the limitations of Plonsky (Reference Plonsky2013, Reference Plonsky2014), Sue Gass, Shawn Loewen, and I have argued more recently that a definition of quality in the context of quantitative research should include a concern with estimating the magnitude of the effects or relationships of interest as opposed to their mere presence or absence (Gass et al., Reference Gass, Loewen and Plonsky2021). The inclusion of this facet of quality can be linked directly to what Cumming (Reference Cumming2014) referred to as “estimation thinking” (p. 8), which he contrasts to the “dichotomous thinking” manifest in the (mis)use of p-values which is so prevalent in applied linguistics (e.g., Cohen, Reference Cohen, Harlow, Mulaik and Steiger1997; Klein, Reference Klein2013; Lindstromberg, Reference Lindstromberg2022; Norris, Reference Norris2015). This facet can also be tied to the notion of synthetic mindedness argued for in Norris and Ortega (Reference Norris, Ortega, Norris and Ortega2006) as a perspective on research that prioritizes the cumulative evidence available (rather than any single study) and that prioritizes the extent of an effect or a relationship.
Gass et al.’s (Reference Gass, Loewen and Plonsky2021) definition of quality also expanded on the construct of transparency, making space for thorough (as opposed to selective) reporting of results and for the sharing of materials and data whether for reanalysis, secondary analysis, replication, training, or other purposes (see Nicklin & Plonsky, Reference Nicklin and Plonsky2020). Concern for reproducibility and for open science practices more generally is certainly worthwhile (Marsden et al., Reference Marsden, Morgan-Short, Thompson and Abugaber2018); however, even with these additions, the definitions of study quality available to date fail to capture the full construct of interest and have focused almost exclusively on more quantitatively oriented paradigms.
A proposed framework for study quality
The framework I propose here views study quality as a multidimensional construct comprised of the four following elements or subconstructs: (a) methodological rigor, (b) transparency, (c) societal value, and (d) ethics (see Figure 1). The first two of these overlap with previously proposed definitions. To those, however, I’ve added societal value and ethics.
At first glance, each of the four might appear distinct from the other three. However, I view them as inextricably intertwined. In some cases, the relationships among the four elements are hierarchical; for example, in order for a study to contribute meaningfully to society, it must have been designed and carried out using rigorous methods. In other cases, two or more elements simply coincide or overlap as I illustrate throughout this paper. As I introduce each of the four elements, I will also refer to some of the relevant evidence to date that has assessed them. Although there is reason to believe that we, as a field, are improving, there is also substantial evidence of a lack of quality in a number of areas.
Transparency
As I alluded to above, transparency is what allows us to evaluate – and is therefore a prerequisite for – every other facet of quality. Indeed, as argued by the Open Science Collaboration (2015), transparency is also critical to the trust that society places in scientific institutions and outputs. And from a synthetic perspective (i.e., one that looks for overarching patterns and trends across studies), transparency in terms of thorough description of procedures, analyses, and data is necessary for secondary research and for replicability and reproducibility (Hiver & Al-Hoorie, Reference Hiver and Al-Hoorie2020; In’nami et al., Reference In’nami, Mizumoto, Plonsky and Koizumi2022; Marsden, Reference Marsden, McKinley and Rose2020; Norris & Ortega, Reference Norris and Ortega2000; Porte & McManus, Reference Porte and McManus2019).
Recognizing the value of transparency, a number of institutional and fieldwide initiatives that encourage transparency have been observed in recent years. Sin Wang Chong and Meng Liu recently launched a Research Network (ReN) with the Association Internationale de Linguistique Appliquée (or “AILA,” the International Association of Applied Linguistics) on Open Scholarship, for example, and the theme of the British Association for Applied Linguistics 2023 conference was “Open Applied Linguistics.” Journals have certainly led here as well. Over 20 years ago, TESOL Quarterly published guidelines tailored specifically to quantitative and qualitative research (Chapelle & Duff, Reference Chapelle and Duff2003), which they then updated and expanded upon in 2016 (Mahboob et al., Reference Mahboob, Paltridge, Phakiti, Wagner, Starfield, Burns and De Costa2016). And nearly a decade ago, Language Learning commissioned a fairly thorough set of guidelines for reporting quantitative results (Norris et al., Reference Norris, Plonsky, Ross and Schoonen2015). Another prime example of a journal-led initiative is that some titles, such as Language Testing, Language Learning, and Applied Psycholinguistics, have begun requiring authors to employ certain open science practices (Harding & Winke, Reference Harding and Winke2022). Applied Psycholinguistics also recently appointed one of its Associate Editors, Amanda Huensch, as the journal’s “open science guru” (my term).
We, as a field, have also seen a number of different researcher-led initiatives toward greater transparency. I’ll name just a few that I’m familiar with, but there are surely others worthy of recognition. Kris Kyle’s suite of NLP tools (https://www.linguisticanalysistools.org/) comes immediately to mind, along with the many resources for second language (L2) speech research curated and hosted by Kazuya Saito and colleagues (http://sla-speech-tools.com/), the Task Bank (https://tblt.indiana.edu/index.html), hosted by Laura Gurzynski-Weiss, and the IRIS Database (https://www.iris-database.org/), launched in 2011 by Alison Mackey and Emma Marsden, well before anyone was talking about “open science” in applied linguistics (Marsden et al., Reference Marsden, Mackey, Plonsky, Mackey and Marsden2016). Recognizing the importance of transparency-related practices, some authors now flag efforts such as open data and materials on their websites and CVs.
In light of these bottom-up and top-down efforts, it is not surprising that several aspects of our reporting practices have improved in recent years (e.g., Wei et al., Reference Wei, Hu and Xiong2019). But we have a long way to go in terms of reporting, visualizing, sharing, and making data available (e.g., Larson-Hall, Reference Larson-Hall2017; Vitta et al., Reference Vitta, Nicklin and McLean2022). Methodological syntheses that investigate reporting practices have invariably observed deficiencies in, for example, the availability of reliability estimates (Al-Hoorie & Vitta, Reference Al-Hoorie and Vitta2019; Sudina, Reference Sudina2021, Reference Sudina2023b), statistical assumptions (Hu & Plonsky, Reference Hu and Plonsky2021), and potential conflicts of interest (Isbell & Kim, Reference Isbell and Kim2023). Failing to report these types of information obstructs our ability to assess methodological rigor, thus demonstrating the link between these two elements of study quality.
There is also survey-based evidence linking transparency and the other elements of quality. In a recent survey-based study, 94% of the sample in Isbell et al. (Reference Isbell, Brown, Chen, Derrick, Ghanem, Gutiérrez Arvizu and Plonsky2022; N = 351) admitted to one or more “questionable research practices” (QRPs), many of which concerned reporting practices. For example, 11% had not reported a finding because it ran counter to the literature and 14% had avoided reporting a finding because it contradicted their own or a colleague’s previous research. Even more concerning, 43% excluded nonsignificant results and 44% withheld methodological details in a write-up to avoid criticism (see similar results for the prevalence of these QRPs in Larsson et al., Reference Larsson, Plonsky, Sterling, Kytö, Yaw and Wood2023). Critically, these “sins of omission” and other QRPs are not just a matter of transparency; they introduce “research waste” (Macleod et al., Reference Macleod, Michie, Roberts, Dirnagl, Chalmers, Ioannidis and Glasziou2014; see also Isaacs & Chalmers, in press), they distort the published record, and they pose a serious ethical dilemma for the authors and for the field.
Summing up on this first of four facets for the proposed framework of study quality, there is some momentum behind and evidence for recent increases in transparency. However, both synthetic and survey-based data point to fact that deficiencies in this area are widespread, a problem I attribute at least in part to a lack of fieldwide reporting standards. Compounding my concern here is the fact that thorough and honest reporting is a prerequisite for assessing the three other elements of study quality including methodological quality, which I will now address.
Methodological rigor
The inclusion of methodological rigor in a framework for study quality probably seems like a foregone conclusion. Methodological flaws naturally present threats to the validity of our findings and any corresponding inferences or implications we might draw from those findings, but there are parts of this element that may be less obvious. For example, the methodological choices that we make – many of which may seem equally viable – often have direct effects on study outcomes.
As shown in numerous meta-analyses (e.g., Li, Reference Li2015; Plonsky et al., Reference Plonsky, Marsden, Crowther, Gass and Spinner2020), and as articulated by Vacha-Haase and Thompson (Reference Vacha-Haase and Thompson2004), “effect sizes are not magically independent of the designs that created them” (p. 478). It follows naturally, then, that our methodological decisions should be based on the quality of the evidence they provide rather than convenience or convention. There are, of course, also practical considerations that will play a role in our methodological choices. For example, we might identify a particular school as ideal for collecting data, but we cannot conduct a study there if the administration will not grant us access. Larsson et al.’s (Reference Larsson, Plonsky, Sterling, Kytö, Yaw and Wood2023) survey found that applied linguists’ choices regarding designs, samples, instruments, and analyses are regularly based on ease and familiarity. Findings like these, along with compelling arguments put forth by Kubanyiova (Reference Kubanyiova2008, Reference Kubanyiova and Chapelle2013) and Ortega (Reference Ortega2005), among others, have led me and others to view virtually all methodological choices through the lens of ethics (see Plonsky et al., Reference Plonsky, Larsson, Sterling, Kytö, Yaw, Wood, Costa, Rabie-Ahmed and Cinagliain press and Yaw et al., Reference Yaw, Plonsky, Larsson, Sterling and Kytö2023).
The methods–ethics link is particularly striking in the context of two types of choices related to sampling. First is size. In the context of quantitative research, larger samples are needed to arrive at more accurate and stable results. As the graduate students in my classes have heard me say many times, smaller samples are too “bouncy” – a metaphor I use to emphasize the instability in quantitative outcomes when considering smaller groups of participants. Despite frequent calls to rectify the situation, small samples are exceedingly common (e.g., Hiver et al., Reference Hiver, Al-Hoorie and Evans2022; Loewen & Hui, Reference Loewen and Hui2021; Nicklin & Vitta, Reference Nicklin and Vitta2021; Norouzian, Reference Norouzian2020; Plonsky, Reference Plonsky2013). Of course, a smaller sample can allow for a richer, fuller set of analyses when working with qualitative data. However, quantitative findings based on small samples present a direct threat to internal (and, by extension, external) validity. Publishing such results, unmitigated and unaccounted for (i.e., without sufficient recognition of the corresponding limitations and threats to validity), introduces noise and error into the published record and is, in my view, unethical.
The second choice regarding sampling that I want to highlight concerns not how many but who is included in our samples. Here, too, methodological syntheses, meta-analyses, and three second-order reviews of sampling and participant demographics in applied linguistics have shown that, “Most of what we know […] pertains to formal learning by (highly literate) adolescents and adults in schools and universities” (Ortega, Reference Ortega2009, p. 145; Andringa & Godfroid, Reference Andringa and Godfroid2020; Bylund et al., Reference Bylund, Khafif and Berghoff2023; Plonsky, Reference Plonsky2023b; see also Bigelow & Tarone, Reference Bigelow and Tarone2004, for evidence of longstanding concerns in this area in applied linguistics, and Henrich et al., Reference Henrich, Heine and Norenzayan2010, for similar concerns elsewhere in the social sciences). Also striking is our lack of empirical attention to L1–L2 pairings that don’t involve English. For example, 23 of the 27 studies in Goetze and Driver’s (Reference Goetze and Driver2022) meta-analysis on the relationship between L2 achievement and self-efficacy were concerned with English as the target language. The fact that many meta-analyses and other secondary analyses explicitly limited themselves to papers written in English further exacerbates this problem.
At first glance, the population of interest for a given study might seem somewhat innocuous or arbitrary. “Learners are learners,” we might rationalize. But this is not true, unless we only care about language learners from within a tiny sliver of humanity. In other words, sampling is not a neutral choice. The decisions we make regarding who to study greatly limit our ability to contribute to theory as well as to practice beyond narrow and often very privileged settings. This methodological choice puts us in a position where we have not adequately been able to serve the scientific or practitioner communities (see again, compelling arguments to this effect by Bigelow & Tarone, Reference Bigelow and Tarone2004, among others), thus failing in two of the adjacent elements of study quality: societal value and ethics.
Before moving on, I need to recognize here that studying populations beyond what we have been doing may require special considerations in terms of educational and research cultures, instrument (re)validation (e.g., measurement invariance; see Sudina, Reference Sudina2023a), and so forth. Recruiting non-convenience samples will be challenging for some, but it is the right thing to do, both for the findings we will obtain and for the societal and scientific contributions we will be able to make.
Instrumentation represents another often-overlooked aspect of methodological quality. One particular concern that I and others have is related to the validity of our tools, which cannot be assumed, given the fact that most of what we measure is both latent and qualitative in nature. I’m not at the first to make this observation. In fact, to reinforce this point, I’ve complied a short collection of quotes that express concern over the lack of validity evidence in the field.
1. Chapelle (Reference Chapelle, Winke and Brunfaut2021): “[scale] validation should be of central importance for the credibility of research results” (p. 11);
2. Cohen and Macaro (Reference Cohen, Macaro and Macaro2013): “There is perhaps an unwritten agreement that readers will accept measures used in an SLA study at face value …” (p. 133);
3. Ellis (Reference Ellis2021): “While researchers have always recognized this issue [validity in SLA measurement], they have largely ignored it …” (p. 197);
4. Norris & Ortega (Reference Norris, Ortega, Gass and Mackey2012): “Problematic … is the tendency to assume – rather than build an empirical case for – the validity for whatever assessment method is adopted” (pp. 574–575);
5. Schmitt (Reference Schmitt2019): “Most vocabulary tests are not validated to any great degree” (p. 268).
Are the concerns of these prominent scholars justified? This question has been addressed in at least four different ways. First, taking a synthetic approach, Sudina (Reference Sudina2023b) found very little evidence of convergent, divergent, construct, and criterion-related validity in the context of studies of L2 anxiety and L2 willingness to communicate. Second is collecting (or reanalyzing) primary data to empirically examine the psychometric properties of scales that are in use, as exemplified by Al-Hoorie et al. (Reference Al-Hoorie, Hiver and In’namiin press). Third is addressing researchers directly, as Larsson et al. (Reference Larsson, Plonsky, Sterling, Kytö, Yaw and Wood2023) did in their survey-based approach asking about their engagement with 58 QRPs. The second most frequently reported QRP was “Choosing a design or instrument type that provides comparatively easy or convenient access to data instead of one that has a strong validity argument behind it.”
A fourth approach to understanding the extent to which the field has supplied a sufficient validity argument for its instruments involves a combination of corpus and synthetic methods (see Plonsky et al., Reference Plonsky, Hu, Sudina, Oswald, Mackey and Gass2023). With the help of several graduate students and RAs, I first collected a corpus of research published in 22 mainstream applied linguistics journals (K = 23,142). I then converted the PDFs to txt files and queried them using AntConc for terms that represent eight different facets of validity (e.g., “nomological, divergent, … + validity”). Figure 2 presents the percentage of articles in which each token appears at least once. The most frequent among these is “construct validity,” which appears in just 4% of the sample. In other words, only one in every 25 articles even mentions construct validity. I’m sure there are articles that presented evidence of their instruments’ construct validity without explicitly using the term “construct validity,” but the converse is also true; that a given article mentioned concurrent validity, for example, does not mean that it necessarily provided evidence thereof.
There are clear connections to be drawn between this example of (a lack of) methodological rigor and the other elements of study quality in the model I’m proposing. Perhaps most immediate is the link between rigor – which appears, here, in the form of addressing instrument validity – and transparency. It is incumbent upon researchers to provide explicit evidence of the validity of their measures (see, for example, Arndt, Reference Arndt2023; Driver, Reference Driverin press). Second, a lack of validity evidence for a study’s measures puts into question its potential value to scholarly and/or non-scholarly stakeholders. And third, research that is of low methodological quality is, in my view, unethical in that it wastes time, energy, money, and other resources that could be spent on producing meaningful, higher-validity evidence to inform theory and/or practice. Low quality research is also unethical in that it can mislead future empirical efforts, leading to future inefficiencies.
Societal contribution
High quality research necessarily contributes to society and to the public good. It is not enough for us to produce rigorously executed studies and to report them thoroughly. Our research needs to lead to meaningful if also incremental advances in knowledge.
What do I mean here by “society?” I’m not saying that research can only be considered to be of high quality if it is relevant to the general public; we (academics) are part of society, too. But we are already pretty good at producing research that we, as applied linguists, care about and learn from. We need to expand our audience. Applied linguistics has a long history of borrowing from and relying on neighboring disciplines as a source of both theory and methods. This practice has served us well, but we have done very little, in my view, to give back to those same fields. Despite the broad relevance of language and the many language-related phenomena that we study, the field of applied linguistics is virtually unknown to our colleagues across campus. One exception here is the work of Kazuya Saito and colleagues whose cross-disciplinary collaboration and exceptionally rigorous empirical efforts have led to inroads outside of applied linguistics (e.g., Saito et al., Reference Saito, Petrova, Suzukida, Kachlicka and Tierney2022). Other noteworthy examples include the works of scholars such as Scott Jarvis, Aneta Pavlenko, and Jesse Egbert, who work with and contribute in very meaningful ways to current legal scholarship and to high-stakes cases being argued in court (e.g., Pavlenko et al., Reference Pavlenko, Hepford and Jarvis2020; Tobia et al., Reference Tobia, Egbert and Lee2023).
If applied linguistics remains largely unknown to our university colleagues, it is virtually invisible to the general public – even to some of the nonacademic audiences our research is most relevant for, such as language teachers (Marsden & Kasprowicz, Reference Marsden and Kasprowicz2017; Sato & Loewen, Reference Sato and Loewen2022). To me, our inability to reach these audiences speaks to a lack of quality both on at the level of individual studies as well as our field as a whole (see Coss & Hwang, Reference Coss and Hwang2024, for an analysis of the quantity, salience, and quality of pedagogical implications sections in 118 articles published in TESOL Quarterly).
I also want to recognize, very briefly, five public-facing projects that applied linguists have launched in recent years.Footnote 1
1. The OASIS Database (https://oasis-database.org/) provides freely accessible, one-page summaries of applied linguistics articles written in non-jargony prose (Marsden et al., Reference Marsden, Alferink, Andringa, Bolibaugh, Collins, Jackson and Plonsky2018a). The repository contains over 1,600 summaries which have been downloaded over 65,000 times as of this writing.
2. TESOLgraphics (https://tesolgraphics5.wixsite.com/tesolgraphics), currently led by Sin Wang Chong and Masatoshi Sato, provides infographic summaries of secondary research that is relevant for practitioners (see Chong, Reference Chong2020). The infographics are attractive, professional, and can be read in less than 5 minutes. The project directors have recently started hosting talk-show-styled interviews with authors to discuss timely topics such as the use of chatbots in the L2 classroom.
3. Developed and hosted by graduate students at University of Hawaii, Multiʻōlelo is a multimedia, multilingual platform for sharing language-related projects ranging from poems to podcasts (https://multiolelo.com/). One of the founders, Huy Phung, received American Association for Applied Linguistics (AAAL)’s 2022 Distinguished Service and Engaged Research Graduate Student Award for his work on Multiʻōlelo.
4. Heritage by Design (https://rcs.msu.edu/2023/05/24/heritage-by-design-podcast/) is a podcast, available on major streaming platforms, that seeks to “build up the community [of heritage speakers] and show the struggles and the beauty of being heritage by design.” The hosts – Gabriela DeRobles, Jade Wu, and Meagan Driver – are scholars but the episodes are personal, disarming, and free from the airs of “academese” (i.e., the highly specialized and dense language typically used in academic settings).
5. Háblame Bebé (https://hablamebebe.org/), launched by Melissa Baralt and collaborators, is a mobile app designed to help Hispanic mothers use more Spanish to enhance the amount and type of language input (“language nutrition”), promote bilingualism, and assess linguistic and developmental milestones (see Baralt et al., Reference Baralt, Mahoney and Brito2020).
Producing new knowledge that is meaningful and useful (i.e., that contributes to society), whether for theory advancement, for practical matters, for the public good, or to advance justice, equity, diversity, and inclusion, is also an ethical issue (see Ortega, Reference Ortega2012). We all use public resources of one form or another, so we owe it to society to give back. In addition, most of us in applied linguistics have undergone extensive graduate studies and specialized training. To not at least attempt to contribute to society, therefore, is a waste of those resources and, hence, a breach of ethics. Another one of the speeches I often give in my graduate classes goes something like this:
I fully hope and expect you all to publish your final papers from this class. Doing so is not only good for your careers as academics, it’s your ethical duty. If your research is well motivated and well conducted, you owe it our community to make your findings known. Not doing so amounts to withholding potentially valuable knowledge and is unethical.
My soapbox speech (approximated here, as I’ve given the same one many times) aligns with one of Macleod et al.’s (Reference Macleod, Michie, Roberts, Dirnagl, Chalmers, Ioannidis and Glasziou2014) areas of “research waste,” namely “publication and dissemination of results,” discussed recently in Isaacs and Chalmers (Reference Isaacs and Chalmersin press). At the same time, we shouldn’t be content to spend years of our lives producing papers that live (and die?) on a server somewhere in the Pacific Ocean either. That’s wasteful too.
There are also immediate links between this facet of study quality and both rigor and transparency. As Gass et al. (Reference Gass, Fleck, Leder and Svetics1998) put it, “Respect for the field […] can only come through sound scientific progress” (p. 407). In other words, if a given study is not methodologically sound, it cannot contribute to any corner of society, scholarly or otherwise. Nor can our research contribute if the reporting is opaque or unavailable to its target audiences.
Ethics
Study quality, in my view and according to the framework I’m proposing, involves more than methodological rigor, clear reporting, and an ability to contribute to scholarly and/or lay communities. Quality research must also be ethical.
There are many obvious ways for researchers to fail to meet and/or violate ethical norms. Acts such as plagiarism and data falsification are considered misconduct and are clearly wrong. They are also more common than we might expect. In their survey-based study, Isbell et al. (Reference Isbell, Brown, Chen, Derrick, Ghanem, Gutiérrez Arvizu and Plonsky2022) found that 17% of the sample admitted to one or more of these forms of misconduct.
There are even more ways to find oneself in an ethical gray area, the vast majority of which are not covered by the “macro ethics” addressed by institutional review boards (in the U.S. context or “ethics boards” elsewhere; Kubanyiova, Reference Kubanyiova2008). A growing body of recent work in applied linguistics has sought to catalog the so-called QRPs and to assess their frequency, prevalence, and perceived severity (e.g., Larsson et al., Reference Larsson, Plonsky, Sterling, Kytö, Yaw and Wood2023; Plonsky et al., Reference Plonsky, Larsson, Sterling, Kytö, Yaw, Wood, Costa, Rabie-Ahmed and Cinagliain press; Plonsky et al., Reference Plonsky, Brown, Chen, Ghanem, Gutiérrez Arvizu, Isbell and Zhang2024; Sterling et al., Reference Sterling, Yaw, Larsson, Plonsky and Kytöin preparation). These works, I should note, build on the momentum for greater attention to ethics fostered by Peter De Costa and collaborators (e.g., Reference De Costa2016, Reference De Costa, Sterling, Lee, Li and Rawal2021), Maggie Kubanyiova (Reference Kubanyiova2008), Lourdes Ortega (Reference Ortega2005), and others within applied linguistics (e.g., Sterling & Gass, Reference Sterling and Gass2017; see timeline in Yaw et al., Reference Yaw, Plonsky, Larsson, Sterling and Kytö2023) as well as many others from outside of it (e.g., Fanelli, Reference Fanelli2009). There is also a special issue underway in Research Methods in Applied Linguistics, edited by Dan Isbell and Peter De Costa, that seeks to expand our understanding of ethical concerns in applied linguistics far beyond the QRPs I’ve largely focused on here.
Several of the findings related to QRPs have been mentioned elsewhere in this paper in relation to other areas of study quality. For example, it is ethically questionable to suppress nonstatistically significant findings or to omit methodological complications in order to avoid receiving challenging comments during peer review, both of which are also matters of transparency. It is also questionable, in my view, to rely on public resources – whether through grants or simply by virtue of studying or working at a public institution – to produce research that fails to contribute meaningfully to the public good. I admit that it’s hard to assess whether or not we are meeting this standard. At the very least, though, as a field and as individuals, we need to take a hard look in the mirror to ask whether what we’re doing is really worthwhile and meaningful. I, myself, wonder about this all the time.
The future of quality
I’ve been thinking and writing about study quality for about 15 years. But that doesn’t mean my framework is right. In fact, as the saying goes, “All models are wrong,” including the one I’ve proposed here, I’m sure. I invite anyone who cares about study quality to work with me to refine the model, the elements that it is comprised of, and the different ways those elements can be assessed. The rest of that saying is, of course, “… but some are useful.” And I very much hope that that part applies here as well! Before concluding, I want to lay out very briefly a few different uses that I envision for this model:
1. Graduate and ongoing professional training. Training in graduate programs in applied linguistics focuses mainly on just one facet of quality – methodological rigor – and to varying degrees of depth and breadth (Gönülal et al., Reference Gönülal, Loewen and Plonsky2017; Loewen et al., Reference Loewen, Gonulal, Isbell, Ballard, Crowther, Lim and Tigchelaar2020). Most textbooks and courses in research methods do address ethics but they tend to be limited to the practicalities of ethics boards (Wood et al., Reference Wood, Sterling, Larsson, Plonsky, Kytö and Yawin press; see notable exceptions in Mackey, Reference Mackey2020 and Mackey & Gass, Reference Mackey and Gass2022). There is plenty of room for expanded and more explicit consideration of study quality in graduate curricula and in the professional development offerings of organizations, such as AAAL.
2. Journal and fieldwide standards. An agreed upon model of study quality in applied linguistics could also be used to develop a set of field-specific publication guidelines. Those guidelines could then be used by researcher trainers, journal reviewers, editors, and individual researchers. I would like to see such a resource – likely a living document that is frequently updated and revised – from an established organization such as AAAL or AILA, which could draw on the expertise of its members to produce it. To date, however, the AAAL leadership has not shown a real interest in developing any such standards despite calls, encouragement, and willingness to do so from its membership. Of course, well thought-out guidelines exist from other disciplines, such as education and psychology (e.g., Appelbaum et al., Reference Appelbaum, Cooper, Kline, Mayo-Wilson, Nezu and Rao2018), but we are not education or psychology. I also feel that devising our own field-specific guidelines will contribute to our legitimacy and establishment in the wider academic community.
3. Future studies of study quality. As exemplified throughout this paper, a large-and-growing body of synthetic and survey-based research has assessed different aspects of study quality with a primary focus on methodological rigor and transparency. However, this work has been carried out somewhat inconsistently. I would like to see a more organized agenda and one that addresses the other two elements of quality in this framework: ethics and societal value. Similarly, future meta-analyses, methodological syntheses, and bibliometric analyses (see Plonsky, Reference Plonsky2023a) might consider taking on this framework as a way to decide which aspects of quality to code for. The element I think we know the least about is our value, as a field, to society. It would be useful to assess the extent to which we have contributed to other disciplines. What evidence is there that the field of applied linguistics has made meaningful and demonstrable contributions to practical realms and/or other scientific domains? Does anyone even know that we exist?
Conclusion
My main goal in writing this paper was to lay out a conceptual framework for the notion of study quality in applied linguistics. The framework is multidimensional, consisting of four subconstructs: methodological rigor, transparency, societal value, and ethics. I also believe that this model is practical (realistic), operationalizable (comprised of measurable constructs), and actionable (relevant for training and professional development).
I want to be clear, though, that I am entirely open to suggestions for how the framework could be modified, expanded, or reconceived. I’d like to think that the principles of methodological rigor, transparency, societal value, and ethics pertain to all areas of this “big tent” field of ours. But I’m happy to be told that I’m wrong in the name of arriving at a more comprehensive definition and operationalization of quality for all of applied linguistics. That precise task is for this or any field, I believe, both an intellectual and ethical imperative.