We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
Field recordings are the data from which CA research proceeds. Keeping recordings well organized and accessible, securely backed up, and as reusable as possible is important for avoiding data loss, enabling collaboration, and ensuring future uses of the data. In this chapter, we outline some essential data management practices for backing up, encrypting, and sharing data. We explain how conversation analysts organize audiovisual files, transcripts, and metadata. We also help the reader to navigate the complexities of dealing with multiple recording sources, and for choosing digital file formats and codecs. This chapter aims to support CA researchers from the first moment of having recorded some interactional field data to the point of being ready to start doing detailed forms of analysis.
This chapter discusses record keeping, like maintaining a lab notebook. Historically, lab notebooks were analog, pen-and-paper affairs. With so much work being performed on the computer and with most scientific instruments creating digital data directly, most record-keeping efforts are digital. Therefore, we focus on strategies for establishing and maintaining records of computer-based work. Keeping good records of your work is essential. These records inform your future thoughts as you reflect on the work you have already done, acting as reminders and inspiration. They also provide important details for collaborators, and scientists working in large groups often have predefined standards for group members to use when keeping lab notebooks and the like. Computational work differs from traditional bench science, and this chapter describes practices for good record-keeping habits in the more slippery world of computer work.
To better understand and prevent research errors, we conducted a first-of-its-kind scoping review of clinical and translational research articles that were retracted because of problems in data capture, management, and/or analysis.
Methods:
The scoping review followed a preregistered protocol and used retraction notices from the Retraction Watch Database in relevant subject areas, excluding gross misconduct. Abstracts of original articles published between January 1, 2011 and January 31, 2020 were reviewed to determine if articles were related to clinical and translational research. We reviewed retraction notices and associated full texts to obtain information on who retracted the article, types of errors, authors, data types, study design, software, and data availability.
Results:
After reviewing 1,266 abstracts, we reviewed 884 associated retraction notices and 786 full-text articles. Authors initiated the retraction over half the time (58%). Nearly half of retraction notices (42%) described problems generating or acquiring data, and 28% described problems with preparing or analyzing data. Among the full texts that we reviewed: 77% were human research; 29% were animal research; and 6% were systematic reviews or meta-analyses. Most articles collected data de novo (77%), but only 5% described the methods used for data capture and management, and only 11% described data availability. Over one-third of articles (38%) did not specify the statistical software used.
Conclusions:
Authors may improve scientific research by reporting methods for data capture and statistical software. Journals, editors, and reviewers should advocate for this documentation. Journals may help the scientific record self-correct by requiring detailed, transparent retraction notices.
The amount of data within the product development process requires a structured approach to coordinate them. Knowledge management solutions, such as ontologies, are a suitable way of linking data and representing semantic relationships. However, making all relevant data usable to ensure their target-oriented application is still a challenge. Thus, this contribution presents an approach to identify and classify heterogeneous data in product development. Besides this single ontology approach, interface solutions for data integration into an ontology are proposed.
The Stanford Population Health Sciences Data Ecosystem was created to facilitate the use of large datasets containing health records from hundreds of millions of individuals. This necessitated technical solutions optimized for an academic medical center to manage and share high-risk data at scale. Through collaboration with internal and external partners, we have built a Data Ecosystem to host, curate, and share data with hundreds of users in a secure and compliant manner. This platform has enabled us to host unique data assets and serve the needs of researchers across Stanford University, and the technology and approach were designed to be replicable and portable to other institutions. We have found, however, that though these technological advances are necessary, they are not sufficient. Challenges around making data Findable, Accessible, Interoperable, and Reusable remain. Our experience has demonstrated that there is a high demand for access to real-world data, and that if the appropriate tools and structures are in place, translational research can be advanced considerably. Together, technological solutions, management structures, and education to support researcher, data science, and community collaborations offer more impactful processes over the long-term for supporting translational research with real-world data.
Archaeologists frequently use written guidelines such as site manuals, recording forms, and digital prompts during excavations to create usable data within and across projects. Most written guidelines emphasize creating either standardized datasets or narrative summaries; however, previous research has demonstrated that the resulting datasets are often difficult to (re)use. Our study analyzed observations and interviews conducted with four archaeological excavation teams, as well as interviews with archaeological data reusers, to evaluate how archaeologists use and implement written guidelines. These excavation team and reuser experiences suggest that archaeologists need more specific best practices to create and implement written guidelines that improve the quality and usability of archaeological data. We present recommendations to improve written guidelines that focus on a project's methods, end-of-season documentation, and naming practices. We also present a Written Guidelines Checklist to help project directors improve their written guidelines before, during, and after fieldwork as part of a collaborative process. Ideally, these best practices for written guidelines will make it easier for team members and future reusers to incorporate their own and others’ archaeological data into their research.
This chapter is an introduction to Stata. We note the essential features and commands of the Stata statistical software package. Our objective is to familiarize the reader with the skills that will allow them to understand and complete the examples in the later chapters of our book. We describe the main components of the interface, followed by those of each Stata file option, e.g., do file, log file, and graphs. The third section of this chapter gives examples that readers can use to practice the most commonly used commands. Last, we summarize best practices for data management.
This chapter guides the researcher through key elements of developing a research methodology for conducting research on and at global environmental negotiations and agreement-making sites. It addresses four important components: 1) Methodological: how to develop a research project; 2) Ethical: how to reflect on and comply with ethical standards; 3) Legal: how to protect, manage and store data and 4) Organizational: how to prepare research on-site. We address key cross-cutting issues relevant to all chapters of the book and the central question of how to decide whether you need to be on-site to answer your research question and advance the state of the art on global environmental agreement-making. The chapter includes three main takeaways: First, the ethical, legal, and organizational aspects of this kind of research are as important as the conceptual and methodological work that prepares scholars for data collection and participant observation on-site. Second, access, funding, and data protection need to be addressed early in the research process and should be reflected at different stages of the research process. Third, regardless of the research puzzle and methodology, conducting research on and at negotiations will always imply a high degree of reflexivity and preparedness.
The primary objective was to analyze the impact of the national cyberattack in May 2021 on patient flow and data quality in the Paediatric Emergency Department (ED), amid the SARS-CoV-2 (COVID-19) pandemic.
Methods:
A single site retrospective time series analysis was conducted of three 6-week periods: before, during, and after the cyberattack outage. Initial emergent workflows are described. Analysis includes diagnoses, demographic context, key performance indicators, and the gradual return of information technology capability on ED performance. Data quality was compared using 10 data quality dimensions.
Results:
Patient visits totaled 13 390. During the system outage, patient experience times decreased significantly, from a median of 188 minutes (pre-cyberattack) down to 166 minutes, most notable for the period from registration to triage, and from clinician review to discharge (excluding admitted patients). Following system restoration, most timings increased. Data quality was significantly impacted, with data imperfections noted in 19.7% of data recorded during the system outage compared to 4.7% before and 5.1% after.
Conclusions:
There was a reduction in patient experience time, but data quality suffered greatly. A hospital’s major emergency plan should include provisions for digital disasters that address essential data requirements and quality as well as maintaining patient flow.
The next generation of high-power lasers enables repetition of experiments at orders of magnitude higher frequency than what was possible using the prior generation. Facilities requiring human intervention between laser repetitions need to adapt in order to keep pace with the new laser technology. A distributed networked control system can enable laboratory-wide automation and feedback control loops. These higher-repetition-rate experiments will create enormous quantities of data. A consistent approach to managing data can increase data accessibility, reduce repetitive data-software development and mitigate poorly organized metadata. An opportunity arises to share knowledge of improvements to control and data infrastructure currently being undertaken. We compare platforms and approaches to state-of-the-art control systems and data management at high-power laser facilities, and we illustrate these topics with case studies from our community.
High-quality data are necessary for drawing valid research conclusions, yet errors can occur during data collection and processing. These errors can compromise the validity and generalizability of findings. To achieve high data quality, one must approach data collection and management anticipating the errors that can occur and establishing procedures to address errors. This chapter presents best practices for data cleaning to minimize errors during data collection and to identify and address errors in the resulting data sets. Data cleaning begins during the early stages of study design, when data quality procedures are set in place. During data collection, the focus is on preventing errors. When entering, managing, and analyzing data, it is important to be vigilant in identifying and reconciling errors. During manuscript development, reporting, and presentation of results, all data cleaning steps taken should be documented and reported. With these steps, we can ensure the validity, reliability, and representative nature of the results of our research.
Research is increasingly conducted through multi-institutional consortia, and best practices for establishing multi-site research collaborations must be employed to ensure efficient, effective, and productive translational research teams. In this manuscript, we describe how the Population-based Research to Optimize the Screening Process Lung Research Center (PROSPR-Lung) utilized evidence-based Science of Team Science (SciTS) best practices to establish the consortium’s infrastructure and processes to promote translational research in lung cancer screening. We provide specific, actionable examples of how we: (1) developed and reinforced a shared mission, vision, and goals; (2) maintained a transparent and representative leadership structure; (3) employed strong research support systems; (4) provided efficient and effective data management; (5) promoted interdisciplinary conversations; and (6) built a culture of trust. We offer guidance for managing a multi-site research center and data repository that may be applied to a variety of settings. Finally, we detail specific project management tools and processes used to drive collaboration, efficiency, and scientific productivity.
This chapter focuses on the mechanics of collecting and analyzing outcome data. It reviews the foundational functions of data management as they pertain to measuring outcomes. Then it discusses different data collection mechanisms such as using spreadsheets, REDCap, registries, and electronic health records. Additional considerations for data collection are outlined such as establishing the measurement timeline and ethical and legal considerations when establishing an outcome measurement program. This chapter also discusses the steps of integrating and validating data as well as extracting and analyzing outcome data. The primary audience for this chapter is individual clinicians who want to start measuring outcomes in their clinical practice.
With the aim of producing a 3D representation of tumors, imaging and molecular annotation of xenografts and tumors (IMAXT) uses a large variety of modalities in order to acquire tumor samples and produce a map of every cell in the tumor and its host environment. With the large volume and variety of data produced in the project, we developed automatic data workflows and analysis pipelines. We introduce a research methodology where scientists connect to a cloud environment to perform analysis close to where data are located, instead of bringing data to their local computers. Here, we present the data and analysis infrastructure, discuss the unique computational challenges and describe the analysis chains developed and deployed to generate molecularly annotated tumor models. Registration is achieved by use of a novel technique involving spherical fiducial marks that are visible in all imaging modalities used within IMAXT. The automatic pipelines are highly optimized and allow to obtain processed datasets several times quicker than current solutions narrowing the gap between data acquisition and scientific exploitation.
In this book, Monika Amsler explores the historical contexts in which the Babylonian Talmud was formed in an effort to determine whether it was the result of oral transmission. Scholars have posited that the rulings and stories we find in the Talmud were passed on from one generation to the next, each generation adding their opinions and interpretations of a given subject. Yet, such an oral formation process is unheard of in late antiquity. Moreover, the model exoticizes the Talmud and disregards the intellectual world of Sassanid Persia. Rather than taking the Talmud's discursive structure as a sign for orality, Amsler interrogates the intellectual and material prerequisites of composers of such complex works, and their education and methods of large-scale data management. She also traces and highlights the marks that their working methods inevitably left in the text. Detailing how intellectual innovation was generated, Amsler's book also sheds new light on the content of the Talmud. This title is also available as Open Access on Cambridge Core.
Most archaeological investigations in the United States and other countries must comply with preservation laws, especially if they are on government property or supported by government funding. Academic and cultural resource management (CRM) studies have explored various social, temporal, and environmental contexts and produce an ever-increasing volume of archaeological data. More and more data are born digital, and many legacy data are digitized. There is a building effort to synthesize and integrate data at a massive scale and create new data standards and management systems. Taxpayer dollars often fund archaeological studies that are intended, in spirit, to promote historic preservation and provide public benefits. However, the resulting data are difficult to access and interoperationalize, and they are rarely collected and managed with their long-term security, accessibility, and ethical reuse in mind. Momentum is building toward open data and open science as well as Indigenous data sovereignty and governance. The field of archaeology is reaching a critical point where consideration of diverse constituencies, concerns, and requirements is needed to plan data collection and management approaches moving forward. This theme issue focuses on challenges and opportunities in archaeological data collection and management in academic and CRM contexts.
A stream of research on co-authorship, used as a proxy of scholars’ collaborative behavior, focuses on members of a given scientific community defined at discipline and/or national basis for which co-authorship data have to be retrieved. Recent literature pointed out that international digital libraries provide partial coverage of the entire scholar scientific production as well as under-coverage of the scholars in the community. Bias in retrieving co-authorship data of the community of interest can affect network construction and network measures in several ways, providing a partial picture of the real collaboration in writing papers among scholars. In this contribution, we collected bibliographic records of Italian academic statisticians from an online platform (IRIS) available at most universities. Even if it guarantees a high coverage rate of our population and its scientific production, it is necessary to deal with some data quality issues. Thus, a web scraping procedure based on a semi-automatic tool to retrieve publication metadata, as well as data management tools to detect duplicate records and to reconcile authors, is proposed. As a result of our procedure, it emerged that collaboration is an active and increasing practice for Italian academic statisticians with some differences according to the gender, the academic ranking, and the university location of scholars. The heuristic procedure to accomplish data quality issues in the IRIS platform can represent a working case report to adapt to other bibliographic archives with similar characteristics.
With the expanding adoption of technology and intelligent applications in every aspect of our life, energy, resource, data, and product management are all improving. So, modern management has recently surged to cope with modern societies. Numerous optimization approaches and algorithms are used to effectively optimize the literature while taking into account its many restrictions. With their dependability and superior solution quality for overcoming the numerous barriers to generation, distribution, integration, and management, nature-inspired meta-heuristic optimization algorithms have stood out among these methods. Hence, this article aims to review the application of nature-inspired optimization algorithms to modern management. Besides, the created clusters introduce the top authors in this field. The results showed that nature-inspired optimization algorithms contribute significantly to cost, resource, and energy efficiency. The genetic algorithm is also the most important and widely used method in the previous literature.
CHDs are the most common type of birth defect. One in four newborns with a heart defect has a critical CHD. In Mexico, there is a lack of data available to determine its prevalence. Pulse oximetry screening programmes have been implemented worldwide, reporting opportunity areas in algorithm interpretation and data management. Our study aims to share preliminary results of a 3-year experience of a multicentre pulse oximetry screening programme that addresses critical challenges.
Materials and methods:
This retrospective study examined the reports of newborns screened from February 2016 to July 2019 from five hospitals. Two algorithms –the New Jersey and the American Academy of Pediatrics– were implemented over consecutive periods. The algorithms’ impact was assessed through the calculation of the false-positive rate in an eligible population.
Results:
A total of 8960 newborns were eligible for the study; from it, 32.27% were screened under the New Jersey and 67.72% under the American Academy of Pediatrics algorithm – false-positive rate: 1% (CI 95: ± 0.36%) and 0.71% (CI 95: ± 0.21%), respectively. Seventy-nine newborns were referred, six were diagnosed with critical CHD, and six with CHD. The critical CHD estimated prevalence was 6.69:10,000 newborns (CI 95: ± 5.36). Our results showed that the algorithm was not related to the observable false-positive rate reduction.
Discussion:
Other factors may play a role in decreasing the false-positive rate. Our experience implementing this programme was that a systematic screening process led to more confident results, newborn’s report interpretation, and follow-up.
This paper outlines frameworks to use for reserving validation and gives the reader an overview of current techniques being employed. In the experience of the authors, many companies lack an embedded reserve validation framework and reserve validation can appear piecemeal and unstructured. The paper outlines a case study demonstrating how successful machine learning techniques will become and then goes on to discuss the implications of machine learning to the future of reserving departments, processes, data and validation techniques. Reserving validation can take many forms, from simple checks to full independent reviews to add value to the reserving process, enhance governance and increase confidence in and reliability in results. This paper discusses covers common weaknesses and their solutions and suggestions of a framework in which to apply validation tools. The impacts of the COVID-19 pandemic on reserving validation is also covered as are early warning indicators and the topic of IFRS 17 from the standpoint of reserving validation. The paper looks at the future for reserving validation and discusses the data challenges that need overcoming on the path to embedded reserving process validation.