Academic institutions increasingly emphasize “involving undergraduates in the research process” (Boyer Commission 1998, 17). However, putting this general call into action remains challenging. We propose incorporating students directly into the data collection process as an especially promising pathway to student learning while also enhancing data quality and furthering scholarly research.
Constructing large observational datasets offers important public goods and serves as a foundation for theory generation and testing across political science topics (Lieberman Reference Lieberman2010). The scale of these projects often requires sizable research teams of faculty members, graduate students, and undergraduate students. We suggest that undergraduate students in these contexts personally benefit and better support faculty research when they are more fully incorporated in data collection endeavors as important contributors and stakeholders in the research process. A carefully calibrated learning-centered approach provides further advantages—minimizing concerns about the exploitation of student labor (The Guardian 2018) as well as supporting marginalized and minoritized groups historically excluded from research opportunities (Webber, Laird, and BrckaLorenz Reference Webber, Laird and BrckaLorenz2013)—thereby promoting a more equitable and diverse research community.
Data Laboratories (hereinafter Data Labs)—that is, settings in which students learn about data collection through direct engagement with and contribution to a research project in a collaborative environment—help scholars and students alike to integrate teaching and research. Our Data Labs build on laboratory-type precedents, including initiatives coordinated across multiple institutions such as the International Justice Lab, the Political Violence Lab, and the Security and Political Economy Lab; or single-institution programs such as Vanderbilt’s Research on Conflict and Collective Action Lab. Benefits of these programs—which often incorporate undergraduate students across multiple years and different stages of the research process—are well established (Becker Reference Becker2020; Becker, Graham, and Zvobgo Reference Becker, Benjamin and Zvobgo2021). Our Data Labs adopt a flexible form—either as standalone courses or longer research programs—but in which student learning remains centered.
Valuable opportunities thus exist for individual scholars to develop Data Labs, which they can do even absent external funding or institutional buy-in, such as by creating and teaching a dedicated course. We envision Data Labs as a “productive learning” approach in which students are taught about data collection through collaborative engagement in the production of social science knowledge. Because direct student involvement with research has the potential to generate a greater quantity and quality of data collection, Data Labs also are a “win-win” for both student learning and scholars’ broader research trajectory.
We envision Data Labs as a “productive learning” approach in which students are taught about data collection through collaborative engagement in the production of social science knowledge.
SITUATING DATA LABS IN RESEARCH AND EDUCATION
Benefits from integrating research and education have long been recognized despite concomitant obstacles (Brew Reference Brew2010). For students, exposure to real-world research provides a basis for better grasping abstract concepts and gaining practical skills (e.g., writing, data collection, labeling, and teamwork). This integration likewise enables faculty members to open new lines of inquiry, practice communicating ideas to more general audiences, and produce higher-quality research in more compressed timelines than is possible on their own.Footnote 1
However, challenges remain. Instructors must make decisions with inevitable tradeoffs about what can (and cannot) be covered. Generally accepted stages in the research process are as follows: (1) come up with a question (or questions); (2) use theory to generate one or more testable hypotheses; (3) collect data to test the hypotheses; (4) analyze the data to determine whether the hypotheses are supported; (5) write up the results; and, ultimately, (6) publish the final product.Footnote 2 Whereas the exact sequence can be fluid and project dependent, these stages are useful for considering how different courses and learning opportunities connect and focus on certain stages more than others.
Data Labs fill a particular gap in existing undergraduate curricula (figure 1). Few courses appear to be solely or even significantly devoted to advancing student learning and experiences around best practices for collecting data (Stage 3). Despite differences in substance and difficulty, many introductory courses, upper-division classes, and senior seminars often share a similar basic setup. That is, students tackle a series of existing scholarly readings on the relevant subject (Stage 6), dedicating varying attention levels to the motivating research questions (Stage 1) or underlying theory (Stage 2).Footnote 3
Writing-intensive seminars may be oriented around a particular topic but prioritize writing skills (Stage 5). Undergraduate methods courses teach techniques for analyzing data (Stage 4) but often rely on established databases rather than instructing students on gathering their own data.
Honors theses and capstone courses represent valuable experiences, incorporating all stages: students work individually to complete an entire project from start to finish, usually over several terms. Yet, even in these contexts, students often use existing datasets, and varying space and enrollment requirements across schools mean that many students never have the opportunity for more personalized and directed exposure to the research process. As we reflected on our past experiences as thesis advisors, we noted that a major stumbling block correspondingly concerned the many considerations involving data collection (Stage 3).
By exposing students to data collection, Data Labs offer benefits not otherwise covered in many undergraduate degrees—a particularly valuable skill for students throughout their academic career and beyond. First, gaining appreciation for the value and challenges of data collection provides a firmer foundation for achieving broader learning outcomes. Data collection skills can translate into students completing stronger and more ambitious honors theses and capstone projects. For example, they can use data they previously collected in the lab setting; a student in Gade’s Data Lab leveraged a portion of labeled tweets for a subsequent honors thesis that Wallace supervised. More generally, by becoming familiar with the complexity and time intensity inherent in collecting data, students are better positioned to design and complete high-quality yet feasible honors theses and capstone projects.
By exposing students to data collection, Data Labs offer benefits not otherwise covered in many undergraduate degrees—a particularly valuable skill for students throughout their academic career and beyond.
Skills gained in Data Labs extend to other courses: nuanced understandings of data-generation processes inform analysis in methods courses and facilitate comparing and contrasting published works. First, learning to code (e.g., in R or Python) using data that students collected themselves is less daunting and more tangible; this results in greater mastery than learning to code using outside data that has little meaningful personal value.
Second, data collection skills readily transfer to learning outside of the classroom. Many jobs available to undergraduate students after graduation require collecting original data. It has been our experience that exposure to ontology development, database design, data collection protocols, working in teams, and basics of data analytics make these undergraduates more attractive for research-oriented positions. Some of our undergraduate students who went from college directly into think-tank and industry-research roles report that their Data Lab experiences were critical to their successful interviews and excelling after starting a new job. The Brookings Institution, The Carter Center, Tesla Government Inc., and other related organizations and firms all hire “research assistants”; the practical skills that those jobs require often focus on data collection. This even can happen concurrently with taking Data Labs. For example, Wallace had a lab student who put course skills into immediate effect by constructing an original dataset covering municipal eviction proceedings while interning for a nonprofit tenant-rights organization.
Some of our undergraduate students who went from college directly into think-tank and industry-research roles report that their Data Lab experiences were critical to their successful interviews and excelling after starting a new job.
Third, Data Labs reflect the approach taken by many real-world data collection initiatives, which often use collaborative research teams involving faculty members, graduate students, and undergraduate students, including the Political Terror Scale, Polity5, and Varieties of Democracy. Working in larger groups also is the most common setup in many jobs that graduating students later pursue. By contrast, in many other courses, students study on their own with only brief, less-structured interactions with peers through discussion sections or in small groups on short-term assignments. Data Labs provide opportunities for training students in the social and organizational skills necessary for navigating increasingly collaborative workspaces.
Fourth, Data Labs dovetail with proven teaching philosophies that foster general student learning, particularly active learning models. This approach stresses the learning benefits from more openly and interactively engaging with course materials instead of passively absorbing information from instructors (Meyers and Jones Reference Meyers and Jones1993, xi). Data Labs are naturally aligned with active learning by incorporating students directly into the research (and learning) process as they undertake data collection as well as identifying possible challenges and raising new questions.
Data Labs are amenable to another extension of active learning, which we conceptualize as a form of “productive learning.” A central goal of Data Labs is for students to learn through their own personal contribution to the production of original data. Students practice and obtain skills through a learning-by-doing approach: working on a real scholarly project develops a more concrete understanding of the many elements of data collection. Productive learning additionally provides students with a sense of accomplishment and inherent membership in a larger community when they realize that their individual efforts advance ongoing collective goals.
GUIDELINES FOR CREATING, ORGANIZING, AND RUNNING A DATA LAB
The centerpiece of our Data Labs is a large-scale, collaborative data collection project. Only by engaging in the various elements of collecting data can students gain a fuller appreciation of this stage of the research process. Moreover, this approach offers tremendous flexibility across several dimensions.Footnote 4
On the substantive front, almost any topic in political science is amenable to a Data Lab. Our prior offerings hew closely to our own interests in political violence and human rights, including militant alliance formation and infighting, treatment of prisoners of war (POWs), and violence against journalists. However, we readily envision examples in areas of inquiry ranging from political behavior (e.g., elections and social movements), to institutions (e.g., legislative activity and bureaucracies), to security (e.g., war and terrorism), and to political economy (e.g., foreign aid and business relations). Just as collected databases run the gamut topically (Lieberman Reference Lieberman2010), we also believe that almost any scholar who conducts research that relies on collecting data (broadly conceived) is well positioned to create a Data Lab.
This approach also offers substantial room to maneuver regarding the type and difficulty of tasks assigned. Our prior offerings generally involved reading, analyzing, and labeling a wide range of raw materials: secondary historical sources, NGO reports, newspaper clippings, formerly classified archival documents, and social media posts, among others. We also easily can include visual materials (e.g., pictures and propaganda posters) and satellite images. Students need not be assigned the same tasks, which can be tailored to prior skills, experience, and/or interests. This flexibility allows Data Labs to be molded to students’ experience and to evolve as the project progresses. Allowing for a diversity of tasks further provides benefits in terms of equity by encouraging the inclusion of a broader range of students with varying abilities and differing backgrounds.
Our own approach generally focuses on qualitative labeling (for quantitative analysis) of historical materials into conceptual categories, in which students use raw sources to create datasets measuring cases across a number of variables (e.g., location of militant actors, weapons used, or casualties inflicted). Although the substance may differ, a shared characteristic is that labeling work represents a meaningful level of judgment, interpretation, and critical thinking (Schedler Reference Schedler2012). Developing project elements that necessitate active engagement with the materials (i.e., more than simply rote data entry) provides greater opportunities for teaching and learning. Similarly, exciting avenues exist for more qualitative modes of data collection (e.g., interviews, historical research, and discourse analysis), which means that Data Labs could be used by scholars rooted in a range of methodological orientations.
The open structure of Data Labs encourages creativity in how project-specific work is integrated into general student learning. Early weeks of the course are devoted to an overview of data collection processes with an emphasis on training for the assigned project. This includes covering labeling rubrics and accessing source materials along with instructions on labeling procedures through live demonstrations, video recordings, and handouts. Initial training requires significant upfront investment, which can be substantial compared to more standard lecture- and seminar-style courses. We have found that time devoted early in the process bears fruit because labelers produce higher-quality data (e.g., better rates of intercoder reliability) and deeper engagement—which results in longer labeler retention and thus decreased retraining costs—as well as greater student enjoyment and learning. Enlisting students in ontology and rubric development further increases their engagement. Preparation materials can be repeated or adapted in subsequent Data Labs. Furthermore, some duties can be delegated to graduate-student teaching assistants (TAs) or to returning undergraduate students who are entrusted as lead labelers for portions of data collection or lab managers for overall projects.
During subsequent weekly class sessions, Data Labs are structured around two complementary types of activities. First, regular weekly or twice-weekly team meetings convene all members of the Data Lab to discuss labeling work completed, as well as problems or concerns arising in previous weeks. Meetings serve an immediate purpose of continuing project-specific training, clarifying procedures, answering new questions, and sharing best practices.
These gatherings have another educational purpose of promoting more open discussions about labeling tasks and how they fit into larger questions of data collection. We have found it essential to foster an environment that encourages collaboration and constructive criticism in which students bring their own voice to the project. This approach facilitates lively discussions of how best to collect particular pieces of data; in our past experiences, this frequently yielded important changes to an overall project. Student contributions can shape how particular variables are measured, introduce new variables and sources to include, and generate theoretical insights. Recurring meetings reinforce for students a core theme of the research process—that is, data collection involves inevitable tradeoffs with constant reappraisal and adjustment in the face of new information as projects progress.
The second main educational component involves activities that connect project specifics to more general facets of the data collection process. Connecting abstract concepts to project-specific work provides an immediacy and tangibility that enriches student learning. These activities are most effective when integrated into regular project-specific discussions. The following are examples of activities and topics that we have incorporated:
-
• relating how the type of research question (Stage 1) or development of theoretical conjectures (Stage 2) affects the choice and construction of data (e.g., because rebel–rebel relationships in civil wars are not often reliably reported in major news outlets, we might consider social media data despite potential biases involved)
-
• the importance of concept formation and refinement (e.g., who counts as a POW and what constitutes prisoner abuse)
-
• choosing among different types of variables (e.g., nominal, ordinal, and interval measures for micro-events, including patrols, shootings, and bombings during “The Troubles” in Northern Ireland) while comparing their validity and reliability
-
• evaluating the quality and credibility of raw source materials (e.g., different monitoring organizations’ tallies of journalist killings)
-
• relative merits of different general data sources (e.g., government and NGO reports, newspapers, interviews, and surveys)
-
• research ethics (e.g., Q&A with a member of the Institutional Review Board)
-
• questions of trauma, both for research subjects and researchersFootnote 5
Instructors also can incorporate latter stages of the research process, such as teaching students about data analysis (e.g., statistical software such as Stata and R or specific analysis techniques) as well as writing up subsequent results. Data Labs (in this imagining) ultimately are human powered. Researchers must be attentive to numerous human-based practical and logistical concerns when designing and running labs.
ISSUES TO CONSIDER AND POTENTIAL CHALLENGES
This section highlights several considerations that we have found to be notable for fostering greater efficiency, effectiveness, and equity in our own Data Labs.
Number of Students
Selecting the right combination of students, amount of training required for appropriate intercoder reliability, and support needed to be successful in their labeling roles will depend on project needs. We use “time-per-task” (e.g., minutes to label a tweet or to classify a police report) to estimate the number of students needed per project, allocating extra time for student learning, and then expecting students to perform at (vastly) disparate rates. This aligns with how different students learn, anxiety about choosing the “correct” answer, and other work or personal obligations. We incentivize quality over quantity to ensure data reliability but also to stress research honesty and transparency. Along with data management skills, we emphasize time management (e.g., Pomodoro cycles).
Recruitment
We suggest beginning a Data Lab as a dedicated course. Initial “trial testing” using course credit ensures that students are interested in participating as part of a research team. Special care should be taken during recruitment to avoid potential biases and to approach Data Labs as an opportunity to incorporate traditionally marginalized and minoritized groups into research opportunities.Footnote 6
Lead Labelers
Student “lead labelers” or graduate-student TAs can provide extra student support throughout a project. Drop-in labeling office hours can improve student productivity and labeler accuracy. A Q&A spreadsheet or Slack channel (where students add new questions, which then are answered by instructors, lead labelers, or TAs) catalogs questions in a central location, ensuring the development of common knowledge across questions and teams.
Training
Deliberate, incremental training yields long-term dividends. We implement several onboarding sessions for labelers, in which labelers and the instructor (or lead labeler) review rules and procedures with live-labeling test sets of cases. Students receive a pilot dataset that they individually label before comparing answers collectively. This process repeats until satisfactory rates of intercoder reliability are obtained.Footnote 7 Ontology and labeling rules sometimes are updated during this process in ways that can substantially increase the consistency and quality of labeling procedures.
We generally follow a “10-to-1 rule”—that is, for every 10 hours of student work, instructors and TAs should expect to devote approximately one hour for checking work and following up on questions. We actively use Slack or similar platforms for regular communication among the research-team members. Assessing student performance is based primarily on effort and engagement rather than timeliness or quantity, which minimizes incentive systems (e.g., strict deadlines or quotas) that inadvertently might encourage shoddy or misleading work.Footnote 8
Type of Tasks
Task variety allows a broader range of students with different skills and backgrounds to participate in Data Labs while also reinforcing student experience, enjoyment, work quality, and retention. Conducting an initial assessment (through either a brief questionnaire or a one-on-one meeting) of student abilities and interests is an essential first step. During the course of a Data Lab (or series of labs), students who master tasks or achieve certain quality standards can advance to more challenging responsibilities, including training new labelers, monitoring Q&A sessions, background research, literature reviews, fact-checking, learning to code in R or Python, and performing basic data cleaning and analytics. Furthermore, this Data Lab format is well suited to a variety of other methodological approaches (either standalone or multimethod), including qualitative approaches such as case studies, archival research, interviews, and even developing experimental designs.
Time Commitment
Irrespective of the type of tasks, expectations on time commitments must be considered carefully, especially because students of varying backgrounds often have other class, employment, or personal commitments. For a full-credit course-style Data Lab, the total number of hours required across all lab-related activities (i.e., training, labeling, group meetings, and assignments) should align with department or university requirements for other courses of similar credit size. For Data Labs outside of the formal course setting, expectations should be established at the outset for a workload that fits with students’ other obligations. In general, we emphasize the importance of flexibility for time commitments to be sensitive to students with different needs but also to accommodate the widest range of students possible.
Retention
Maximizing student retention has obvious benefits: instructors do not need to spend significant time retraining labelers; data are likely to be more internally consistent; managing experienced labelers is easier; and, finally, students are likely to learn more through longer-term relationships with a project than shorter exposure to only one part of it. Incentivizing student retention can involve expanding roles and fostering growth, allowing students to use lab-generated data for their own projects (e.g., an honors thesis), and—perhaps most important—facilitating a positive collaborative learning culture in which students feel supported, valued, and heard throughout the research process.
CONCLUSION
Data Labs foster a collaborative dialogue among professors and students, attenuate hierarchies and patterns of historical exclusion, transmit valuable skills for other courses and later in the workplace, and can be genuinely enjoyable for both faculty members and students. This model likewise takes steps to avoid exploitative practices in higher education by including students as creative partners. Data Labs can fill a critical gap in many existing undergraduate curricula by exposing students to the many merits and challenges of collecting original data. Studying through lab environments facilitates student learning of this underappreciated dimension of the research process, simultaneously advancing faculty members’ own research programs while building a stronger and more equitable scholarly community.
Supplementary Materials
To view supplementary material for this article, please visit http://doi.org/10.1017/S1049096523000276.
ACKNOWLEDGMENTS
This work would not have been possible without the intellectual contributions of student founders, leaders, and researchers of the Oppression & Resistance Lab at Emory University and the Law and Violence Data Laboratory at the University of Washington. We especially thank Bernadette Bresee, Sarah K. Dreier, Ramin Farrokhi, Willa Jeffers, Bree Bang Jensen, Marcella Moriss, Arica Schuett, Ava Sharifi, Danielle Villa, and Sophia Jordán Wallace for their intellectual contributions to our understanding of Data Labs.
CONFLICTS OF INTEREST
The authors declare that there are no ethical issues or conflicts of interest in this research.