Hostname: page-component-78c5997874-mlc7c Total loading time: 0 Render date: 2024-11-10T07:34:43.310Z Has data issue: false hasContentIssue false

Learnings from developing an applied data science curricula for undergraduate and graduate students

Published online by Cambridge University Press:  24 February 2020

Roger H. French*
Affiliation:
SDLE Research Center, Case Western Reserve University, Cleveland OH, 44106 Dept. of Materials Science & Engineering, Case Western Reserve University, Cleveland OH, 44106 Dept. of Macromolecular Science & Engineering Case Western Reserve University, Cleveland OH, 44106 Dept. of Computer & Data Sciences, Case Western Reserve University, Cleveland OH 44106
Laura S. Bruckman
Affiliation:
SDLE Research Center, Case Western Reserve University, Cleveland OH, 44106 Dept. of Materials Science & Engineering, Case Western Reserve University, Cleveland OH, 44106
*
Get access

Abstract

Data science has advanced significantly in recent years and allows scientists to harness large-scale data analysis techniques using open source coding frameworks. Data science is a tool that should be taught to science and engineering students in addition to their chosen domain knowledge. An applied data science minor allows students to understand data and data handling as well as statistics and model development. This move will improve reproducibility and openness of research as well as allow for greater interdisciplinarity and more analyses focusing on critical scientific challenges.

Type
Articles
Copyright
Copyright © Materials Research Society 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Wackler, T.: Strategy for American Leadership in Advanced Manufacturing, National Science and Technology Policy, White House, 40 (2018). https://www.whitehouse.gov/wp-content/uploads/2018/10/Advanced-Manufacturing-Strategic-Plan-2018.pdf. (accessed 4 January 2020).Google Scholar
Weinelt, B.: Digital Transformation Initiative, World Economic Forum, (2015). http://wef.ch/2hU0x7I (accessed 4 January 2020).Google Scholar
Grossman, R., The Industries That Are Being Disrupted the Most by Digital, Harvard Business Review, (2016). https://hbr.org/2016/03/the-industries-that-are-being-disrupted-the-most-by-digital (accessed January 4, 2020).Google Scholar
Jordan, M. I., editor, Frontiers in Massive Data Analysis, National Research Council, National Academies Press, (2013). http://www.nap.edu/catalog.php?record_id=18374. (accessed 4 January 2020).Google Scholar
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E., Bigtable: A distributed storage system for structured data, ACM Transactions on Computer Systems, 26, 4 (2008). http://dl.acm.org/citation.cfm?id=1365816. (accessed January 26, 2016).CrossRefGoogle Scholar
Taylor, R.C., An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, BMC Bioinformatics. 11, S1 (2010). http://www.biomedcentral.com/1471-2105/11/S12/S1. (accessed October 28, 2014).CrossRefGoogle ScholarPubMed
Zaharia, M., Xin, R.S., Wendell, P., Das, T., Armbrust, M., Dave, A., Meng, X., Rosen, J., Venkataraman, S., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Apache Spark: A Unified Engine for Big Data Processing, Commun. ACM. 59, 56-65 (2016). https://doi.org/10.1145/2934664. (accessed 4 January 2020).CrossRefGoogle Scholar
Maxwell, E.: Harnessing Openness to Improve Research, Teaching and Learning in Higher Education. Innovations: Technology, Governance, Globalization, 5(2), 155 (2010). http://dx.doi.org/10.1162/inov_a_00019. (accessed 4 January 2020).CrossRefGoogle Scholar
Maxwell, E., Open Standards, Open Source, and Open Innovation: Harnessing the Benefits of Openness, Innovations: Technology, Governance, Globalization, 1, 119176 (2006). https://doi.org/10.1162/itgg.2006.1.3.119. (accessed 4 January 2020).CrossRefGoogle Scholar
Ince, D. C., Hatton, L., and Graham-Cumming, J.: The case for open computer programs. Nature, 482, 7386, 485 (2012). http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html. (accessed 4 January 2020).CrossRefGoogle ScholarPubMed
Andraka, J.: Open Access: The Pathway to Innovation, OSTP, (2013). https://obamawhitehouse.archives.gov/blog/2013/06/20/open-access-pathway-innovation. (accessed 4 January 2020).Google Scholar
Lowndes, J. S. S., Best, B. D., Scarborough, C., Afflerbach, J. C., Frazier, M. R., O’Hara, C. C., Jiang, N., and Halpern, B. S.: Our path to better science in less time using open data science tools. Nat. Ecol. Evol., 1(6), 160 (2017). https://dx.doi.org/10.1038/s41559-017-0160. (accessed 4 January 2020).CrossRefGoogle ScholarPubMed
Obama, B.: Executive Order -- Making Open and Machine Readable the New Default for Government Information, The White House (2013). https://obamawhitehouse.archives.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-. (accessed 4 January 2020).Google Scholar
Group of 8 (G8): G8 Open Data Charter and Technical Annex (Gov.UK), (2013). https://www.gov.uk/government/publications/open-data-charter/g8-open-data-charter-and-technical-annex. (accessed 4 January 2020).Google Scholar
Holdren, J. P.: Increasing Access to the Results of Federally Funded Scientific Research, Executive Office of the President: Office of Science and Technology Policy, (2013). https://obamawhitehouse.archives.gov/blog/2016/02/22/increasing-access-results-federally-funded-science. (accessed 4 January 2020).Google Scholar
Wadia, C., Stebbins, M.: It’s Time to Open Materials Science Data, Executive Office of the President: Office of Science and Technology Policy, (2015). https://obamawhitehouse.archives.gov/blog/2015/02/06/its-time-open-materials-science-data. (accessed 4 January 2020).Google Scholar
Collins, F. S. and Tabak, L. A., “Policy: NIH plans to enhance reproducibility,” Nature, 505, 7485, 612–613, (Jan. 2014). http://www.nature.com/news/policy-nih-plans-to-enhance-reproducibility-1.14586. (accessed 4 January 2020).Google Scholar
Fineberg, H. V., “Reproducibility and Replicability in Science,” National Academies Press, (May 2019) https://www.nap.edu/catalog/25303/reproducibility-and-replicability-in-science. (accessed 4 January 2020).Google Scholar
Wang, Y.E., Wei, G.-Y., Brooks, D., Benchmarking TPU, GPU, and CPU Platforms for Deep Learning, ArXiv:1907.10701 [Cs, Stat]. (2019). http://arxiv.org/abs/1907.10701 (accessed January 8, 2020).Google Scholar
Jouppi, N.P., et al., In-Datacenter Performance Analysis of a Tensor Processing Unit, ArXiv:1704.04760 [Cs]. (2017). http://arxiv.org/abs/1704.04760 (accessed January 8, 2020).Google Scholar
LeCun, Y., Bengio, Y., Hinton, G., Deep learning, Nature. 521, 436-444 (2015). https://doi.org/10.1038/nature14539. (accessed 4 January 2020).CrossRefGoogle ScholarPubMed
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L., ImageNet: A Large-Scale Hierarchical Image Database, Proc. of IEEE Computer Vision and Pattern Recognition, 8, (2009). https://wordnet.cs.princeton.edu/papers/imagenet_cvpr09.pdf. (accessed 4 January 2020).Google Scholar
ImageNet, (n.d.). http://image-net.org/ (accessed January 8, 2020).Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E., Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 1097-1105, (2012). https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf. (accessed 4 January 2020).Google Scholar
Simonyan, K., Zisserman, A., Very Deep Convolutional Networks for Large-Scale Image Recognition, ArXiv:1409.1556 [Cs]. (2014). http://arxiv.org/abs/1409.1556. (accessed 4 January 2020).Google Scholar
Al-Rfou, R., et al., Theano: A Python framework for fast computation of mathematical expressions, ArXiv:1605.02688 [Cs]. (2016). http://arxiv.org/abs/1605.02688 (accessed January 8, 2020).Google Scholar
Abadi, M., et al., TensorFlow: A System for Large-Scale Machine Learning, Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation , 265-283, (2016). https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi (accessed January 8, 2020).Google Scholar
Chollet, F., Allaire, J. J., Deep Learning with R, Manning Publications, (2018). https://www.manning.com/books/deep-learning-with-r (accessed May 29, 2019).Google Scholar
Marcus, G., Deep Learning: A Critical Appraisal, ArXiv:1801.00631 [Cs, Stat]. (2018). http://arxiv.org/abs/1801.00631 (accessed January 8, 2020).Google Scholar
Dean, J., The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design, ArXiv:1911.05289 [Cs, Stat]. (2019). http://arxiv.org/abs/1911.05289 (accessed January 8, 2020).Google Scholar
Silver, D. et al., “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354359, (Oct. 2017). https://www.nature.com/articles/nature24270. (accessed 4 January 2020).CrossRefGoogle ScholarPubMed
Silver, D. et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, 529, 7587, 484489, (Jan. 2016). https://www.nature.com/articles/nature16961. (accessed 4 January 2020).CrossRefGoogle ScholarPubMed
David, E. E. Jr.: Responsible Science, Volume I: Ensuring the Integrity of the Research Process, National Academies Press, (1992). http://www.nap.edu/catalog/1864/responsible-science-volume-i-ensuring-the-integrity-of-the-research. (accessed 4 January 2020).Google Scholar
Peng, R. D.: Reproducible Research in Computational Science. Science , 334, 6060, 1226 (2011). https://dx.doi.org/10.1126/science.1213847. (accessed 4 January 2020).CrossRefGoogle ScholarPubMed
Announcement: Reducing our irreproducibility. Nature , 496(7446), 398 (2013). http://www.nature.com/news/announcement-reducing-our-irreproducibility-1.12852. (accessed 4 January 2020).CrossRefGoogle Scholar
Leek, J. T. and Peng, R. D.: Statistics: P values are just the tip of the iceberg. Nature, 520, 7549, 612 (2015). http://www.nature.com/doifinder/10.1038/520612a. (accessed 4 January 2020).CrossRefGoogle ScholarPubMed
Guterres, A., “The Sustainable Development Goals Report 2018,” United Nations, Department of Economic and Social Affairs, (2018) https://www.un.org/development/desa/publications/the-sustainable-development-goals-report-2018.html. (accessed 4 January 2020).Google Scholar
French, R. H. et al., “Degradation science: Mesoscopic evolution and temporal analytics of photovoltaic energy materials,” Current Opinion in Solid State and Materials Science, 19, 4, 212226, (Aug. 2015). http://www.sciencedirect.com/science/article/pii/S1359028614000989. (accessed 4 January 2020).CrossRefGoogle Scholar
Yang, H. E., French, R. H., Bruckman, L. S., Eds., Durability and Reliability of Polymers and Other Materials in Photovoltaic Modules, 1st Edition. Amsterdam: Elsevier, William Andrew Applied Science Publishers, (2019). https://www.sciencedirect.com/book/9780128115459/durability-and-reliability-of-polymers-and-other-materials-in-photovoltaic-modules. (accessed 4 January 2020).Google Scholar
International Energy Agency, World Energy Outlook 2019, (2019). https://www.iea.org/weo/weo2019/secure/data/. (accessed 4 January 2020).Google Scholar
Pollock, T. M.: Integrated Computational Materials Engineering, National Academies Press, (2008). https://nae.edu/25043/Integrated-Computational-Materials-Engineering. (accessed 4 January 2020).Google Scholar
Holdren, J. P.: Goals of the Materials Genome Initiative (2011). https://www.mgi.gov/sites/default/files/documents/materials_genome_initiative-final.pdf. (accessed 4 January 2020).Google Scholar
Dudley, R.M., Dudley, R.M., Uniform Central Limit Theorems, Cambridge University Press, (1999). https://doi.org/10.1017/CBO9780511665622. (accessed 4 January 2020).CrossRefGoogle Scholar
Lasi, H., Fettke, P., Kemper, H.-G., Feld, T., and Hoffmann, M.: Industry 4.0. Business & Information Systems Engineering, 6, 4, 239 (2014). DOI: 10.1007/s12599-014-0334-4. (accessed 4 January 2020).CrossRefGoogle Scholar
Xu, L. D., Xu, E. L., and Li, L.: Industry 4.0: State of the Art and Future Trends. International Journal of Production Research, 56, 8, 2941 (2018). DOI: 10.1080/00207543.2018.1444806. (accessed 4 January 2020).CrossRefGoogle Scholar
Lee, J., Bagheri, B., and Kao, H.-A.: A Cyber-Physical Systems Architecture for Industry 4.0-based Manufacturing Systems. Manufacturing Letters , 3, 18 (2015). http://dx.doi.org10.1016/j.mfglet.2014.12.001. (accessed 4 January 2020).CrossRefGoogle Scholar
Lu, Y.: Industry 4.0: A Survey on Technologies, Applications and Open Research Issues. Journal of Industrial Information Integration, 6, 1 (2017). DOI: 10.1016/j.jii.2017.04.005CrossRefGoogle Scholar
Hughes, D. and French, R. H., “Crafting a Minor to Produce T-Shaped Graduates,” National Academies, Washington DC, 21 March 2016. http://tsummit.org/files/T-Summit_Speaker_Abstracts-2016.pdf. (accessed 4 January 2020).Google Scholar
Business Higher Education Forum, “Creating a Minor in Applied Data Science | BHEF,” The Business Higher Education Forum, Case Study, Aug. 2016. Available: http://www.bhef.com/publications/creating-minor-applied-data-science. (accessed 4 January 2020).Google Scholar
R Core Team, “R: The R Project for Statistical Computing”, (2019). https://www.r-project.org/. (accessed 4 January 2020).Google Scholar
RStudio: Integrated Development Environment for R, RStudio, Inc., Boston, MA (2015). http://www.rstudio.com/. (accessed 4 January 2020)Google Scholar
Wickham, H., Grolemund, G., “R for Data Science: Import, Tidy, Transform, Visualize, and Model Data”, 1 edition, O’Reilly Media, (2017). http://r4ds.had.co.nz/. (accessed 4 January 2020).Google Scholar
van Rossum, G., Python tutorial, technical report CS-R9526, National Research Institute for Mathematics and Computer Science, Amsterdam, The Netherlands (1995), p.71. https://ir.cwi.nl/pub/5007/05007D.pdf. (accessed 4 January 2020).Google Scholar
Van Rossum, G. and Drake, Fred L., Python 3 Reference Manual, CreateSpace, Scotts Valley, CA (2009).Google Scholar
Python Software Foundation: Python 3.8.1 documentation”, (n.d.). https://docs.python.org/3.8/contents.html. (accessed 4 January 2020).Google Scholar
Van Styn, H., Git – Revision Control Perfected, Linux Journal, 208 (2011). https://www.linuxjournal.com/content/git-revision-control-perfected. (accessed 4 January 2020).Google Scholar
Brown, Z., A Git Origin Story, Linux Journal, 288 (2018). https://www.linuxjournal.com/content/git-origin-story. (accessed 4 January 2020).Google Scholar
Ram, K., “Git can facilitate greater reproducibility and increased transparency in science,” Source Code for Biology and Medicine, 8, 1, 7, (Feb. 2013). https://doi.org/10.1186/1751-0473-8-7. (accessed 4 January 2020).CrossRefGoogle Scholar
Swartz, A., “Aaron Swartz’s A Programmable Web: An Unfinished Work,” Synthesis Lectures on the Semantic Web: Theory and Technology, 3, 2, 164, (Feb. 2013). https://www.morganclaypool.com/doi/abs/10.2200/S00481ED1V01Y201302WBE005. (accessed 4 January 2020).CrossRefGoogle Scholar
Kline, M., Modern LaTeX, 2nd Ed. (2018). https://assets.bitbashing.io/modern-latex.pdf. (accessed 4 January 2020).Google Scholar
Wickham, H. et al., “Welcome to the Tidyverse,” Journal of Open Source Software, vol. 4, no. 43, p. 1686, (Nov. 2019). https://joss.theoj.org/papers/10.21105/joss.01686. (accessed 4 January 2020).CrossRefGoogle Scholar
Wickham, H., ggplot2: Elegant Graphics for Data Analysis, 2nd ed.Springer International Publishing, (2016). https://www.springer.com/gp/book/9783319242750. (accessed 4 January 2020).CrossRefGoogle Scholar
Knuth, D. E., “Literate Programming,” Comput J, 27, 2, 97111, (Jan. 1984). https://academic.oup.com/comjnl/article/27/2/97/343244/Literate-Programming. (accessed 4 January 2020).CrossRefGoogle Scholar