Hostname: page-component-cd9895bd7-7cvxr Total loading time: 0 Render date: 2024-12-26T05:29:14.256Z Has data issue: false hasContentIssue false

Using Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy

Published online by Cambridge University Press:  23 December 2021

Paul Bartus*
Affiliation:
School of Computer Science and Mathematics Lake Superior State University Sault Ste. Marie, Michigan, USA, 49783 email: pbartus@lssu.edu
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

During the last years, the amount of data has skyrocketed. As a consequence, the data has become more expensive to store than to generate. The storage needs for astronomical data are also following this trend. Storage systems in Astronomy contain redundant copies of data such as identical files or within sub-file regions. We propose the use of the Hadoop Distributed and Deduplicated File System (HD2FS) in Astronomy. HD2FS is a deduplication storage system that was created to improve data storage capacity and efficiency in distributed file systems without compromising Input/Output performance. HD2FS can be developed by modifying existing storage system environments such as the Hadoop Distributed File System. By taking advantage of deduplication technology, we can better manage the underlying redundancy of data in astronomy and reduce the space needed to store these files in the file systems, thus allowing for more capacity per volume.

Type
Poster Paper
Copyright
© The Author(s), 2021. Published by Cambridge University Press on behalf of International Astronomical Union

References

Bartus, P. 2018, Using Deduplication to Improve Storage Efficiency in Distributed File Systems, PhD Dissertation, University of Puerto Rico, Mayaguez CampusGoogle Scholar
Bartus, P., & Arzuaga, E. 2018, Gdedup: Distributed file system level deduplication for genomic big data., IEEE International Congress on Big Data, July 2-7, 2018, San Francisco, CA, USA.CrossRefGoogle Scholar
Bartus, P., & Arzuaga, E. 2017, Using file-aware deduplication to Improve capacity in storage systems., IEEE Colombian Conference on Communications and Computing (COLCOM), pages 1–6.Google Scholar