No CrossRef data available.
Published online by Cambridge University Press: 24 April 2023
OBJECTIVES/GOALS: The goal was to create and deploy an intuitive, easy-to-use tool that clinical investigators can apply to their data to identify erroneous or inconsistent data entries. Investigators can then correct any errors prior to sharing the data with their statistician for analysis. METHODS/STUDY POPULATION: We developed an interactive shiny app, the Data Loofah, using R Studio that researchers or data analysts can use to examine data. After an investigator uploads data, the app reports which variables are numeric or categorical. Means, standard deviation, median, 25th and 75th quantiles, range and number of missing values are reported for numeric variables. Counts and percentages of categorical variables are summarized. Graphical displays further enhance identification of errors. Access to the Data Loofah is through a secure, university-maintained website with access restricted to university personnel. Supporting materials consisting of instructional step-by-step handouts and videos were developed to assist investigators in the use of the app. RESULTS/ANTICIPATED RESULTS: We will integrate use of the Data Loofah into our Clinical and Translational Science Program’s biostatistics consultative practice. Investigators will use the Data Loofah to pre-screen their data prior to sending it to a statistician, identify errors and inconsistencies, and facilitate making necessary corrections. Statisticians will also use the Data Loofah to review data with investigators prior to starting analyses. Through use of this app, investigators are expected to develop a better understanding of their data specifically and more generally about requirements for preparing data for statistical analysis. Most significantly, regular use of the Data Loofah is expected to result in higher quality data and more efficient use of statistician resources due to reduced effort for data cleaning. DISCUSSION/SIGNIFICANCE: Data cleaning is a time-consuming task and finding data errors can be difficult for data analysts not familiar with clinical variables under study. Further, failure to identify data errors can lead to erroneous results. By facilitating identification of data errors by clinical investigators, the Data Loofah will improve and enhance research output.