Hostname: page-component-cd9895bd7-dzt6s Total loading time: 0 Render date: 2024-12-27T05:14:41.414Z Has data issue: false hasContentIssue false

17 Data Loofah: A web-based app for efficiently identifying erroneous data

Published online by Cambridge University Press:  24 April 2023

Jeffrey R. Fine
Affiliation:
University of California, Davis
Sandra L. Taylor
Affiliation:
University of California, Davis
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

OBJECTIVES/GOALS: The goal was to create and deploy an intuitive, easy-to-use tool that clinical investigators can apply to their data to identify erroneous or inconsistent data entries. Investigators can then correct any errors prior to sharing the data with their statistician for analysis. METHODS/STUDY POPULATION: We developed an interactive shiny app, the Data Loofah, using R Studio that researchers or data analysts can use to examine data. After an investigator uploads data, the app reports which variables are numeric or categorical. Means, standard deviation, median, 25th and 75th quantiles, range and number of missing values are reported for numeric variables. Counts and percentages of categorical variables are summarized. Graphical displays further enhance identification of errors. Access to the Data Loofah is through a secure, university-maintained website with access restricted to university personnel. Supporting materials consisting of instructional step-by-step handouts and videos were developed to assist investigators in the use of the app. RESULTS/ANTICIPATED RESULTS: We will integrate use of the Data Loofah into our Clinical and Translational Science Program’s biostatistics consultative practice. Investigators will use the Data Loofah to pre-screen their data prior to sending it to a statistician, identify errors and inconsistencies, and facilitate making necessary corrections. Statisticians will also use the Data Loofah to review data with investigators prior to starting analyses. Through use of this app, investigators are expected to develop a better understanding of their data specifically and more generally about requirements for preparing data for statistical analysis. Most significantly, regular use of the Data Loofah is expected to result in higher quality data and more efficient use of statistician resources due to reduced effort for data cleaning. DISCUSSION/SIGNIFICANCE: Data cleaning is a time-consuming task and finding data errors can be difficult for data analysts not familiar with clinical variables under study. Further, failure to identify data errors can lead to erroneous results. By facilitating identification of data errors by clinical investigators, the Data Loofah will improve and enhance research output.

Type
Biostatistics, Epidemiology, and Research Design
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - ND
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives licence (https://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is unaltered and is properly cited. The written permission of Cambridge University Press must be obtained for commercial re-use or in order to create a derivative work.
Copyright
© The Author(s), 2023. The Association for Clinical and Translational Science