
As I was consolidating and mapping out learning progressions for folks coming from pure mathematics, I compiled this list of Data Science reference materials. Enjoy!
Update: I use Zotero to compile this list, and have an online library of papers here. Please contact me to join the group, and share your collection of papers for collaboration!
Textbooks
Aggarwal, C. C. (2018). Neural Networks and Deep Learning: A Textbook. Springer International Publishing. https://doi.org/10.1007/978-3-319-94463-0
Al Sarkhi, A., & Talburt, J. (2019). The Journal of Computing Sciences in Colleges Papers of the 17th Annual CCSC Mid-South Conference. https://doi.org/10.13140/RG.2.2.29810.12481
Chollet, F. (2018). Deep learning with Python (1st edition). Manning Publications Co.
Dunn, P. K., & Smyth, G. K. (2018). Generalized Linear Models With Examples in R. Springer New York. https://doi.org/10.1007/978-1-4419-0118-7
Efron, B., & Hastie, T. (n.d.). Computer Age Statistical Inference.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Gron, A. (2017). Hands-on machine learning with scikit-learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems (1st ed.). O’Reilly Media, Inc.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.
Irizarry, R. A. (n.d.). Introduction to Data Science. Retrieved February 23, 2022, from https://rafalab.github.io/dsbook/
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 103). Springer New York. https://doi.org/10.1007/978-1-4614-7138-7
Langtangen, H. P. (Ed.). (2008). Python Scripting for Computational Science (Vol. 3). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-73916-6
Mahoney, M. W., Duchi, J., Gilbert, A. C., Institute for Advanced Study (Princeton, N.J.), Society for Industrial and Applied Mathematics, & Park City Mathematics Institute (Eds.). (2018). The mathematics of data. American Mathematical Society.
Mills, C. W. (1997). The racial contract. Cornell University Press.
Minow, M. (1991). Making All the Difference. https://www.cornellpress.cornell.edu/book/9780801499777/making-all-the-difference/
Morgan, S. L., & Winship, C. (2014). Counterfactuals and Causal Inference: Methods and Principles for Social Research (2nd ed.). Cambridge University Press; Cambridge Core. https://doi.org/10.1017/CBO9781107587991
Ross, S. M. (2010). A first course in probability (8th ed). Pearson Prentice Hall.
Vershynin, R. (n.d.). High-Dimensional Probability.
Young, I. M. (2002). Inclusion and Democracy. Oxford University Press.
Online Resources
AutoPEP8—Packages—Package Control. (n.d.). Retrieved March 10, 2022, from https://packagecontrol.io/packages/AutoPEP8
Beautiful Soup Documentation—Beautiful Soup 4.9.0 documentation. (n.d.). Retrieved March 10, 2022, from https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Boston, 677 Huntington Avenue, & Ma 02115 +1495‑1000. (2012, September 10). Definitions. Harvard Transdisciplinary Research in Energetics and Cancer Center. https://www.hsph.harvard.edu/trec/about-us/definitions/
Configuring a Python Application on CircleCI – CircleCI. (n.d.). Retrieved March 10, 2022, from https://circleci.com/docs/2.0/language-python/
Data Gymnasia. (n.d.). Mathigon. Retrieved February 23, 2022, from https://mathigon.org/data-gymnasia
Data Mining: Algorithms, Geometry, and Probability. (n.d.). Retrieved February 24, 2022, from https://www.cs.utah.edu/~jeffp/DMBook/DM-AGP.html
Department of Computer and Data Sciences < Case Western Reserve University. (n.d.). Retrieved March 10, 2022, from https://bulletin.case.edu/schoolofengineering/compdatasci/#courseinventory
Dive into Deep Learning—Dive into Deep Learning 0.17.2 documentation. (n.d.). Retrieved February 23, 2022, from http://d2l.ai/
Evans, J. (2014, July 29). What is Transdisciplinarity? – Purdue Polytechnic Institute. https://polytechnic.purdue.edu/blog/what-transdisciplinarity
gitignore.io—Create Useful .gitignore Files For Your Project. (n.d.). Retrieved March 10, 2022, from https://www.toptal.com/developers/gitignore
Install Ubuntu desktop. (n.d.). Ubuntu. Retrieved March 10, 2022, from https://ubuntu.com/tutorials/install-ubuntu-desktop
Interactive network visualizations—Pyvis 0.1.3.1 documentation. (n.d.). Retrieved March 10, 2022, from https://pyvis.readthedocs.io/en/latest/index.html
Interpretivism—SAGE Research Methods. (n.d.). Retrieved February 25, 2022, from https://methods.sagepub.com/book/key-concepts-in-ethnography/n21.xml
Jupyter/IPython Notebook Quick Start Guide—Jupyter/IPython Notebook Quick Start Guide 0.1 documentation. (n.d.). Retrieved March 10, 2022, from https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/
Laplacian Matrices | An Introduction to Algebraic Graph Theory. (n.d.). Retrieved March 10, 2022, from https://www.geneseo.edu/~aguilar/public/notes/Graph-Theory-HTML/ch4-laplacian-matrices.html
Mathematical Foundations for Data Analysis. (n.d.). Math for Data. Retrieved February 24, 2022, from https://mathfordata.github.io/
Matplotlib—Visualization with Python. (n.d.). Retrieved March 10, 2022, from https://matplotlib.org/
Mining of Massive Datasets. (n.d.). Retrieved February 24, 2022, from http://www.mmds.org/
Python Data Science Handbook | Python Data Science Handbook. (n.d.). Retrieved February 18, 2022, from https://jakevdp.github.io/PythonDataScienceHandbook/
Python, R. (n.d.). Setting Up Sublime Text 3 for Full Stack Python Development – Real Python. Retrieved March 10, 2022, from https://realpython.com/setting-up-sublime-text-3-for-full-stack-python-development/
Reflexivity. (n.d.). Retrieved February 25, 2022, from https://warwick.ac.uk/fac/soc/ces/research/current/socialtheory/maps/reflexivity/
Requests: HTTP for HumansTM—Requests 2.27.1 documentation. (n.d.). Retrieved March 10, 2022, from https://docs.python-requests.org/en/latest/
Watson, S. (2021). Data 1050 Course Cheatsheet. DATA 1050. https://data1050.github.io/docs/cheatsheets/DATA_1050_Cheatsheet.pdf
Journal Articles
Brunsdon, C., & Comber, A. (2021). Opening practice: Supporting reproducibility and critical spatial data science. Journal of Geographical Systems, 23(4), 477–496. https://doi.org/10.1007/s10109-020-00334-2
Cao, L. (2018). Data Science: A Comprehensive Overview. ACM Computing Surveys, 50(3), 1–42. https://doi.org/10.1145/3076253
Darwin Holmes, A. G. (2020). Researcher Positionality—A Consideration of Its Influence and Place in Qualitative Research—A New Researcher Guide. Shanlax International Journal of Education, 8(4), 1–10. https://doi.org/10.34293/education.v8i4.3232
DasGupta, A. (n.d.). Probability for Statistics and Machine Learning: 10.
De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., … Ye, P. (2017). Curriculum Guidelines for Undergraduate Programs in Data Science. Annual Review of Statistics and Its Application, 4(1), 15–30. https://doi.org/10.1146/annurev-statistics-060116-053930
DEVELOPING A MASTER’S DEGREE PROGRAM IN DATA SCIENCE. (2021). Issues In Information Systems. https://doi.org/10.48009/3_iis_2021_58-68
Elo, S., Kääriäinen, M., Kanste, O., Pölkki, T., Utriainen, K., & Kyngäs, H. (2014). Qualitative Content Analysis: A Focus on Trustworthiness. SAGE Open, 4(1), 2158244014522633. https://doi.org/10.1177/2158244014522633
Evans, T. M., Bira, L., Gastelum, J. B., Weiss, L. T., & Vanderford, N. L. (2018). Evidence for a mental health crisis in graduate education. Nature Biotechnology, 36(3), 282–284. https://doi.org/10.1038/nbt.4089
Feelders, A., Daniels, H., & Holsheimer, M. (2000). Methodological and practical aspects of data mining. Information & Management, 37(5), 271–281. https://doi.org/10.1016/S0378-7206(99)00051-8
Incidence matrix. (2021). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Incidence_matrix&oldid=1053190313
Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 2053951714528481. https://doi.org/10.1177/2053951714528481
Kross, S., Peng, R. D., Caffo, B. S., Gooding, I., & Leek, J. T. (2017). The democratization of data science education [Preprint]. PeerJ Preprints. https://doi.org/10.7287/peerj.preprints.3195v1
Leonelli, S. (2021). Data Science in Times of Pan(dem)ic. Harvard Data Science Review. https://doi.org/10.1162/99608f92.fbb1bdd6
Leslie, D. (2021). The Arc of the Data Scientific Universe. Harvard Data Science Review. https://doi.org/10.1162/99608f92.938a18d7
Leung, L. (2015). Validity, reliability, and generalizability in qualitative research. Journal of Family Medicine and Primary Care, 4(3), 324. https://doi.org/10.4103/2249-4863.161306
Mahoney, M. W. (2019). The Difficulties of Addressing Interdisciplinary Challenges at the Foundations of Data Science. ArXiv:1909.03033 [Cs]. http://arxiv.org/abs/1909.03033
Martinez, I., Viles, E., & Olaizola, I. G. (2021). Data Science Methodologies: Current Challenges and Future Approaches. Big Data Research, 24, 100183. https://doi.org/10.1016/j.bdr.2020.100183
Martinez-Plumed, F., Contreras-Ochando, L., Ferri, C., Hernandez-Orallo, J., Kull, M., Lachiche, N., Ramirez-Quintana, M. J., & Flach, P. (2021). CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Transactions on Knowledge and Data Engineering, 33(8), 3048–3061. https://doi.org/10.1109/TKDE.2019.2962680
McPhee, C., & Bliemel, M. (2018). Editorial: Transdisciplinary Innovation. Technology Innovation Management Review, 8(8), 5.
Meng, X.-L. (2021). Data Science: A Happy Marriage of Quantitative and Qualitative Thinking? Harvard Data Science Review. https://doi.org/10.1162/99608f92.cee621a9
Newman, M. E. J. (2003). The Structure and Function of Complex Networks. SIAM Review, 45(2), 167–256. https://doi.org/10.1137/S003614450342480
Nolan, D., & Stoudt, S. (2021). The Promise of Portfolios: Training Modern Data Scientists. Harvard Data Science Review. https://doi.org/10.1162/99608f92.3c097160
Pedaste, M., Mäeots, M., Siiman, L., Jong, T., Riesen, S., Kamp, E., Manoli, C., Zacharia, Z., & Tsourlidaki, E. (2015). Phases of inquiry-based learning: Definitions and the inquiry cycle. Educational Research Review, 14. https://doi.org/10.1016/j.edurev.2015.02.003
Regnault, A., Willgoss, T., & Barbic, S. (2018). Towards the use of mixed methods inquiry as best practice in health outcomes research. Journal of Patient-Reported Outcomes, 2(1), 19. https://doi.org/10.1186/s41687-018-0043-8
Tang, R., & Sae-Lim, W. (2016). Data science programs in U.S. higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information, 32(3), 269–290. https://doi.org/10.3233/EFI-160977
Tanweer, A., Gade, E. K., Krafft, P. M., & Dreier, S. K. (2021). Why the Data Revolution Needs Qualitative Thinking. Harvard Data Science Review. https://doi.org/10.1162/99608f92.eee0b0da
Warfa, A.-R. M. (2016). Mixed-Methods Design in Biology Education Research: Approach and Uses. CBE—Life Sciences Education, 15(4), rm5. https://doi.org/10.1187/cbe.16-01-0022
Woolston, C. (n.d.). Why mental health matters. 3.
Woolston, C. (2018). Feeling overwhelmed by academia? You are not alone. Nature, 557(7703), 129–131. https://doi.org/10.1038/d41586-018-04998-1
Conference Papers
Demchenko, Y., Belloum, A., Los, W., Wiktorski, T., Manieri, A., Brocks, H., Becker, J., Heutelbeck, D., Hemmje, M., & Brewer, S. (2016). EDISON Data Science Framework: A Foundation for Building Data Science Profession for Research and Industry. 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 620–626. https://doi.org/10.1109/CloudCom.2016.0107
Kross, S., & Guo, P. J. (2019). Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–14. https://doi.org/10.1145/3290605.3300493
Li, D., Milonas, E., & Zhang, Q. (2021). Content Analysis of Data Science Graduate Programs in the U.S. 2021 ASEE Virtual Annual Conference Content Access Proceedings, 36841. https://doi.org/10.18260/1-2–36841
Milonas, E., Li, D., & Zhang, Q. (2021). Content Analysis of Two-year and Four-year Data Science Programs in the United States. 2021 ASEE Virtual Annual Conference Content Access Proceedings, 36842. https://doi.org/10.18260/1-2–36842
Salloum, M., Jeske, D., Ma, W., Papalexakis, V., Shelton, C., Tsotras, V., & Zhou, S. (2021). Developing an Interdisciplinary Data Science Program. Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, 509–515. https://doi.org/10.1145/3408877.3432454
Blog Posts
Chan, T. (2022a, February 17). From Pure Math to Data Science. Medium. https://mathtodata.medium.com/from-pure-math-to-data-science-293883864cb2
Chan, T. (2022b, February 28). Mixed Methods Data Science: Qualitative Sensibilities. Medium. https://mathtodata.medium.com/data-science-curriculum-starting-point-for-pure-mathematicians-347efe61f743
Chan, T. (2022c, March 10). Data Science Project Based Learning. Medium. https://mathtodata.medium.com/data-science-project-based-learning-afd2bd6f8f11
Dolon, B. (2020, June 9). You Don’t Always Have to Loop Through Rows in Pandas! Medium. https://towardsdatascience.com/you-dont-always-have-to-loop-through-rows-in-pandas-22a970b347ac
Ganegedara, T. (2021, November 15). Intuitive Guide to Latent Dirichlet Allocation. Medium. https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158
Gour, R. (2019, April 20). Why Choose Data Science for Your Career. Towards Data Science. https://towardsdatascience.com/why-choose-data-science-for-your-career-ca38db0c28d4
To stay in academia or not, that is the question | MIT Graduate Admissions. (n.d.). Retrieved February 15, 2022, from https://gradadmissions.mit.edu/blog/stay-academia-or-not-question
What is a community of practice? (n.d.). Community of Practice. Retrieved February 15, 2022, from https://www.communityofpractice.ca/background/what-is-a-community-of-practice/