References

Open Educational Resource Logic Model

As I was consolidating and mapping out learning progressions for folks coming from pure mathematics, I compiled this list of Data Science reference materials. Enjoy!

Update: I use Zotero to compile this list, and have an online library of papers here. Please contact me to join the group, and share your collection of papers for collaboration!

Textbooks

Aggarwal, C. C. (2018). Neural Networks and Deep Learning: A Textbook. Springer International Publishing. https://doi.org/10.1007/978-3-319-94463-0

Al Sarkhi, A., & Talburt, J. (2019). The Journal of Computing Sciences in Colleges Papers of the 17th Annual CCSC Mid-South Conference. https://doi.org/10.13140/RG.2.2.29810.12481

Chollet, F. (2018). Deep learning with Python (1st edition). Manning Publications Co.

Dunn, P. K., & Smyth, G. K. (2018). Generalized Linear Models With Examples in R. Springer New York. https://doi.org/10.1007/978-1-4419-0118-7

Efron, B., & Hastie, T. (n.d.). Computer Age Statistical Inference.

Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.

Gron, A. (2017). Hands-on machine learning with scikit-learn and TensorFlow: Concepts, tools, and techniques to build intelligent systems (1st ed.). O’Reilly Media, Inc.

Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer.

Irizarry, R. A. (n.d.). Introduction to Data Science. Retrieved February 23, 2022, from https://rafalab.github.io/dsbook/

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning (Vol. 103). Springer New York. https://doi.org/10.1007/978-1-4614-7138-7

Langtangen, H. P. (Ed.). (2008). Python Scripting for Computational Science (Vol. 3). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-73916-6

Mahoney, M. W., Duchi, J., Gilbert, A. C., Institute for Advanced Study (Princeton, N.J.), Society for Industrial and Applied Mathematics, & Park City Mathematics Institute (Eds.). (2018). The mathematics of data. American Mathematical Society.

Mills, C. W. (1997). The racial contract. Cornell University Press.

Minow, M. (1991). Making All the Difference. https://www.cornellpress.cornell.edu/book/9780801499777/making-all-the-difference/

Morgan, S. L., & Winship, C. (2014). Counterfactuals and Causal Inference: Methods and Principles for Social Research (2nd ed.). Cambridge University Press; Cambridge Core. https://doi.org/10.1017/CBO9781107587991

Ross, S. M. (2010). A first course in probability (8th ed). Pearson Prentice Hall.

Vershynin, R. (n.d.). High-Dimensional Probability.

Young, I. M. (2002). Inclusion and Democracy. Oxford University Press.

Online Resources

AutoPEP8—Packages—Package Control. (n.d.). Retrieved March 10, 2022, from https://packagecontrol.io/packages/AutoPEP8

Beautiful Soup Documentation—Beautiful Soup 4.9.0 documentation. (n.d.). Retrieved March 10, 2022, from https://www.crummy.com/software/BeautifulSoup/bs4/doc/

Boston, 677 Huntington Avenue, & Ma 02115 +1495‑1000. (2012, September 10). Definitions. Harvard Transdisciplinary Research in Energetics and Cancer Center. https://www.hsph.harvard.edu/trec/about-us/definitions/

Configuring a Python Application on CircleCI – CircleCI. (n.d.). Retrieved March 10, 2022, from https://circleci.com/docs/2.0/language-python/

Data Gymnasia. (n.d.). Mathigon. Retrieved February 23, 2022, from https://mathigon.org/data-gymnasia

Data Mining: Algorithms, Geometry, and Probability. (n.d.). Retrieved February 24, 2022, from https://www.cs.utah.edu/~jeffp/DMBook/DM-AGP.html

Department of Computer and Data Sciences < Case Western Reserve University. (n.d.). Retrieved March 10, 2022, from https://bulletin.case.edu/schoolofengineering/compdatasci/#courseinventory

Dive into Deep Learning—Dive into Deep Learning 0.17.2 documentation. (n.d.). Retrieved February 23, 2022, from http://d2l.ai/

Evans, J. (2014, July 29). What is Transdisciplinarity? – Purdue Polytechnic Institute. https://polytechnic.purdue.edu/blog/what-transdisciplinarity

gitignore.io—Create Useful .gitignore Files For Your Project. (n.d.). Retrieved March 10, 2022, from https://www.toptal.com/developers/gitignore

Install Ubuntu desktop. (n.d.). Ubuntu. Retrieved March 10, 2022, from https://ubuntu.com/tutorials/install-ubuntu-desktop

Interactive network visualizations—Pyvis 0.1.3.1 documentation. (n.d.). Retrieved March 10, 2022, from https://pyvis.readthedocs.io/en/latest/index.html

Interpretivism—SAGE Research Methods. (n.d.). Retrieved February 25, 2022, from https://methods.sagepub.com/book/key-concepts-in-ethnography/n21.xml

Jupyter/IPython Notebook Quick Start Guide—Jupyter/IPython Notebook Quick Start Guide 0.1 documentation. (n.d.). Retrieved March 10, 2022, from https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/

Laplacian Matrices | An Introduction to Algebraic Graph Theory. (n.d.). Retrieved March 10, 2022, from https://www.geneseo.edu/~aguilar/public/notes/Graph-Theory-HTML/ch4-laplacian-matrices.html

Mathematical Foundations for Data Analysis. (n.d.). Math for Data. Retrieved February 24, 2022, from https://mathfordata.github.io/

Matplotlib—Visualization with Python. (n.d.). Retrieved March 10, 2022, from https://matplotlib.org/

Mining of Massive Datasets. (n.d.). Retrieved February 24, 2022, from http://www.mmds.org/

Python Data Science Handbook | Python Data Science Handbook. (n.d.). Retrieved February 18, 2022, from https://jakevdp.github.io/PythonDataScienceHandbook/

Python, R. (n.d.). Setting Up Sublime Text 3 for Full Stack Python Development – Real Python. Retrieved March 10, 2022, from https://realpython.com/setting-up-sublime-text-3-for-full-stack-python-development/

Reflexivity. (n.d.). Retrieved February 25, 2022, from https://warwick.ac.uk/fac/soc/ces/research/current/socialtheory/maps/reflexivity/

Requests: HTTP for HumansTM—Requests 2.27.1 documentation. (n.d.). Retrieved March 10, 2022, from https://docs.python-requests.org/en/latest/

Watson, S. (2021). Data 1050 Course Cheatsheet. DATA 1050. https://data1050.github.io/docs/cheatsheets/DATA_1050_Cheatsheet.pdf

Journal Articles

Brunsdon, C., & Comber, A. (2021). Opening practice: Supporting reproducibility and critical spatial data science. Journal of Geographical Systems, 23(4), 477–496. https://doi.org/10.1007/s10109-020-00334-2

Cao, L. (2018). Data Science: A Comprehensive Overview. ACM Computing Surveys, 50(3), 1–42. https://doi.org/10.1145/3076253

Darwin Holmes, A. G. (2020). Researcher Positionality—A Consideration of Its Influence and Place in Qualitative Research—A New Researcher Guide. Shanlax International Journal of Education, 8(4), 1–10. https://doi.org/10.34293/education.v8i4.3232

DasGupta, A. (n.d.). Probability for Statistics and Machine Learning: 10.

De Veaux, R. D., Agarwal, M., Averett, M., Baumer, B. S., Bray, A., Bressoud, T. C., Bryant, L., Cheng, L. Z., Francis, A., Gould, R., Kim, A. Y., Kretchmar, M., Lu, Q., Moskol, A., Nolan, D., Pelayo, R., Raleigh, S., Sethi, R. J., Sondjaja, M., … Ye, P. (2017). Curriculum Guidelines for Undergraduate Programs in Data Science. Annual Review of Statistics and Its Application, 4(1), 15–30. https://doi.org/10.1146/annurev-statistics-060116-053930

DEVELOPING A MASTER’S DEGREE PROGRAM IN DATA SCIENCE. (2021). Issues In Information Systems. https://doi.org/10.48009/3_iis_2021_58-68

Elo, S., Kääriäinen, M., Kanste, O., Pölkki, T., Utriainen, K., & Kyngäs, H. (2014). Qualitative Content Analysis: A Focus on Trustworthiness. SAGE Open, 4(1), 2158244014522633. https://doi.org/10.1177/2158244014522633

Evans, T. M., Bira, L., Gastelum, J. B., Weiss, L. T., & Vanderford, N. L. (2018). Evidence for a mental health crisis in graduate education. Nature Biotechnology, 36(3), 282–284. https://doi.org/10.1038/nbt.4089

Feelders, A., Daniels, H., & Holsheimer, M. (2000). Methodological and practical aspects of data mining. Information & Management, 37(5), 271–281. https://doi.org/10.1016/S0378-7206(99)00051-8

Incidence matrix. (2021). In Wikipedia. https://en.wikipedia.org/w/index.php?title=Incidence_matrix&oldid=1053190313

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1), 2053951714528481. https://doi.org/10.1177/2053951714528481

Kross, S., Peng, R. D., Caffo, B. S., Gooding, I., & Leek, J. T. (2017). The democratization of data science education [Preprint]. PeerJ Preprints. https://doi.org/10.7287/peerj.preprints.3195v1

Leonelli, S. (2021). Data Science in Times of Pan(dem)ic. Harvard Data Science Review. https://doi.org/10.1162/99608f92.fbb1bdd6

Leslie, D. (2021). The Arc of the Data Scientific Universe. Harvard Data Science Review. https://doi.org/10.1162/99608f92.938a18d7

Leung, L. (2015). Validity, reliability, and generalizability in qualitative research. Journal of Family Medicine and Primary Care, 4(3), 324. https://doi.org/10.4103/2249-4863.161306

Mahoney, M. W. (2019). The Difficulties of Addressing Interdisciplinary Challenges at the Foundations of Data Science. ArXiv:1909.03033 [Cs]. http://arxiv.org/abs/1909.03033

Martinez, I., Viles, E., & Olaizola, I. G. (2021). Data Science Methodologies: Current Challenges and Future Approaches. Big Data Research, 24, 100183. https://doi.org/10.1016/j.bdr.2020.100183

Martinez-Plumed, F., Contreras-Ochando, L., Ferri, C., Hernandez-Orallo, J., Kull, M., Lachiche, N., Ramirez-Quintana, M. J., & Flach, P. (2021). CRISP-DM Twenty Years Later: From Data Mining Processes to Data Science Trajectories. IEEE Transactions on Knowledge and Data Engineering, 33(8), 3048–3061. https://doi.org/10.1109/TKDE.2019.2962680

McPhee, C., & Bliemel, M. (2018). Editorial: Transdisciplinary Innovation. Technology Innovation Management Review, 8(8), 5.

Meng, X.-L. (2021). Data Science: A Happy Marriage of Quantitative and Qualitative Thinking? Harvard Data Science Review. https://doi.org/10.1162/99608f92.cee621a9

Newman, M. E. J. (2003). The Structure and Function of Complex Networks. SIAM Review, 45(2), 167–256. https://doi.org/10.1137/S003614450342480

Nolan, D., & Stoudt, S. (2021). The Promise of Portfolios: Training Modern Data Scientists. Harvard Data Science Review. https://doi.org/10.1162/99608f92.3c097160

Pedaste, M., Mäeots, M., Siiman, L., Jong, T., Riesen, S., Kamp, E., Manoli, C., Zacharia, Z., & Tsourlidaki, E. (2015). Phases of inquiry-based learning: Definitions and the inquiry cycle. Educational Research Review, 14. https://doi.org/10.1016/j.edurev.2015.02.003

Regnault, A., Willgoss, T., & Barbic, S. (2018). Towards the use of mixed methods inquiry as best practice in health outcomes research. Journal of Patient-Reported Outcomes, 2(1), 19. https://doi.org/10.1186/s41687-018-0043-8

Tang, R., & Sae-Lim, W. (2016). Data science programs in U.S. higher education: An exploratory content analysis of program description, curriculum structure, and course focus. Education for Information, 32(3), 269–290. https://doi.org/10.3233/EFI-160977

Tanweer, A., Gade, E. K., Krafft, P. M., & Dreier, S. K. (2021). Why the Data Revolution Needs Qualitative Thinking. Harvard Data Science Review. https://doi.org/10.1162/99608f92.eee0b0da

Warfa, A.-R. M. (2016). Mixed-Methods Design in Biology Education Research: Approach and Uses. CBE—Life Sciences Education, 15(4), rm5. https://doi.org/10.1187/cbe.16-01-0022

Woolston, C. (n.d.). Why mental health matters. 3.

Woolston, C. (2018). Feeling overwhelmed by academia? You are not alone. Nature, 557(7703), 129–131. https://doi.org/10.1038/d41586-018-04998-1

Conference Papers

Demchenko, Y., Belloum, A., Los, W., Wiktorski, T., Manieri, A., Brocks, H., Becker, J., Heutelbeck, D., Hemmje, M., & Brewer, S. (2016). EDISON Data Science Framework: A Foundation for Building Data Science Profession for Research and Industry. 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), 620–626. https://doi.org/10.1109/CloudCom.2016.0107

Kross, S., & Guo, P. J. (2019). Practitioners Teaching Data Science in Industry and Academia: Expectations, Workflows, and Challenges. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–14. https://doi.org/10.1145/3290605.3300493

Li, D., Milonas, E., & Zhang, Q. (2021). Content Analysis of Data Science Graduate Programs in the U.S. 2021 ASEE Virtual Annual Conference Content Access Proceedings, 36841. https://doi.org/10.18260/1-2–36841

Milonas, E., Li, D., & Zhang, Q. (2021). Content Analysis of Two-year and Four-year Data Science Programs in the United States. 2021 ASEE Virtual Annual Conference Content Access Proceedings, 36842. https://doi.org/10.18260/1-2–36842

Salloum, M., Jeske, D., Ma, W., Papalexakis, V., Shelton, C., Tsotras, V., & Zhou, S. (2021). Developing an Interdisciplinary Data Science Program. Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, 509–515. https://doi.org/10.1145/3408877.3432454

Blog Posts

Chan, T. (2022a, February 17). From Pure Math to Data Science. Medium. https://mathtodata.medium.com/from-pure-math-to-data-science-293883864cb2

Chan, T. (2022b, February 28). Mixed Methods Data Science: Qualitative Sensibilities. Medium. https://mathtodata.medium.com/data-science-curriculum-starting-point-for-pure-mathematicians-347efe61f743

Chan, T. (2022c, March 10). Data Science Project Based Learning. Medium. https://mathtodata.medium.com/data-science-project-based-learning-afd2bd6f8f11

Dolon, B. (2020, June 9). You Don’t Always Have to Loop Through Rows in Pandas! Medium. https://towardsdatascience.com/you-dont-always-have-to-loop-through-rows-in-pandas-22a970b347ac

Ganegedara, T. (2021, November 15). Intuitive Guide to Latent Dirichlet Allocation. Medium. https://towardsdatascience.com/light-on-math-machine-learning-intuitive-guide-to-latent-dirichlet-allocation-437c81220158

Gour, R. (2019, April 20). Why Choose Data Science for Your Career. Towards Data Science. https://towardsdatascience.com/why-choose-data-science-for-your-career-ca38db0c28d4

To stay in academia or not, that is the question | MIT Graduate Admissions. (n.d.). Retrieved February 15, 2022, from https://gradadmissions.mit.edu/blog/stay-academia-or-not-question

What is a community of practice? (n.d.). Community of Practice. Retrieved February 15, 2022, from https://www.communityofpractice.ca/background/what-is-a-community-of-practice/