Source: https://drive.google.com/file/d/1yFuP9fMxoUEDPV9HPh2UJbBGU4S8otKU/view
The Future of Data Science
Where We Are Today
Advice from Vicki Boykis - ML Engineer at Mozilla AI
- Learn SQL
- Learn a programming language extremely well and learn programming concepts
- Learn how to work in the cloud
- This stuff is really hard for everyone, and there are a million things it seems like you have to know. Don’t get discouraged.
Where are we going?
What you and your friends/colleagues would do is not always reflective of what the world thinks or would do.
Algorithms are fragile & Powerful
Algorithms often affect our privacy
What to learn outside of class going forward
- Start working with AWS (S3 buckets, EC2 instances, EBS drives)
- Learn how to use Linux and the terminal (commands like
pwd
,cd
,ls
,…)- Get comfortable with git and github
- Get good at (python or R), SQL, and understand how a RDMS works
- Learn how the web works (HTTP status codes, front vs back end, how do web apps and websites work, what is the web
ruby on rails or python flask) - Begin working on a small project that interests you
- Grab some data and just start to play with it, learn to make plots, massage the data, build simple models and analyze them. TIME AT KEYBOARD
reps, reps, reps…
- Linear Regression, Decision Trees, PCA, Naiva Bayes, General linear models
- Neural networks
pytorch and or keras CPU first then GPU + cloud - Docker Containers
- Begin to discover and learn about a ‘domain’ i.e. healthcare, finance, etc.
Books!
General Data Science/Engineering Knowledge "being informed"
- Factfulness - Hans Rosling
- Algorithms to Live By - Brian Christian, Tom Griffiths
- The Model Thinker - Scott Page
- The Signal and the Noise - Nate Silver
- Calling Bullshit - Carl Bergstrom, Jevin West
- Confident Data Skills - Kirill Eremenko (skip second half)
- The Black Swan - Nassim Nicholas Taleb
- Smart Cities - Anthony Townsend
- Super Forecasting - Phillip Tetlock, Dan Gardner
- The Mythical Man-Month - Frederick Brooks Jr.
- Complex Adaptive Systems - John Miller, Scott Page
- Chaos - James Gleick
- The Information - James Gleick
- The Phoenix Project - Gene Kim, Kevin Behr, George Spafford
- Data Science for Business - Foster Provost, Tom Fawett
SQL Warrior or I really love RDBs
- SQL Queries for Mere Mortals - John Viescas
- Practical SQL
- Database Systems Design Implementation, and Management - Carlos Coronel, Steven Morris
- SQL QuickStart Guide - Walter Shields
- Database Design for Mere Mortals - Michael Hernandez
- SQL Performance Explained - Markus Winand
- Database Systems The Complete Book - Hector Garcia-Mollina, Jeffrey Ullman, Jennifer Widom
Data Wrangling/EDA
- Python for Data Analysis - Wes McKinney
- Data Science from Scratch - Joel Grus
- Exploratory Data Analysis - John Tukey
- Python Data Science Handbook - Jake VanderPlas
- Data Analysis with Python and PySpark - Jonathan Rioux
- Web Scraping with Python - Ryan Mitchell
Data Visualizations
- Fundamentals of Data Visualization - Claus Wilke
- Data Visualization - Kieran Healy
- The Visual Display of Quantitative Information - Edward Tufte
- Storytelling with Data - Cole Knaflic
- Show Me the Numbers - Stephen Few
- Envisioning Information - Edward Tufte
- The Truthful Art - Alberto Cairo
- The Grammar of Graphics - Leland Wilkinson
- Interactive Data Visualization for the Web - Scott Murray
- Visualization Analysis & Design - Tamara Munzner
I have a need for Stats and Probability
- Practical Statistics for Data Scientists - Peter Bruce, Andrew Bruce
- Statistics - David Freedman, Robert Pisani, Roger Purves
- Naked Statistics - Charles Wheelan
- The Manga Guide to Regression Analysis - Shin Takahashi, Iroha Inoue
- Computer Age Statistical Inference - Bradley Efron, Trevor Hastie
- Linear Regression and Correlation - Scott Hartshorn
- The Art of Statistics - David Spiegelhalter
- Causal Inference in Statistics - Judea Pearl, Madelyn Glymour, Nicholas Jewell
- Data Analysis Using Regression and Multilevel/Hierarchal Models - Andrew Gelman, Jennifer Hill
- All of Statistics - A Concise Course in Statistical Inference
- Statistical Rethinking - Richard McElreath
- Regression Modeling Strategies - Frank Harrell Jr.
- An Introduction to Generalized Linear Models - Annette Dobson, Adrian Barnett
Machine Learning and Deep Learning
- Python Machine Learning - Sebastian Rashka, Vahid Mirjaliti
- An Introduction to Statistical Learning - Garreth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
- The Hundred-Page Machine Learning Book - Andriy Burkov
- Deep Learning for Coders with fastai & PyTorch - Jeremy Howard, Sylvain Gugger
- Deep Learning with Python - Francois Challet
- Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow - Aurelien Geron
- Artificial Intelligence - Stuart Russell, Peter Norwig
- Applied Predictive Modeling - Max Kuhn, Kjell Johnson
- Pattern Recognition and Machine Learning - Christopher Bishop
- The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirari, Jerome Friedman
- Mathematics for Machine Learning - Marc Deisenroth, A Faisal, Cheng Ong
- Machine Learning - Kevin Murphy
- Deep Learning - Ian Goodfellow, Yoshua Bengio, Aaron Courville
Production Systems in ML and building scalable systems
- Designing Machine Learning Systems - Chip Huyen
- Building Machine Learning Powered Applications - Emmanuel Ameisen
- Building Intelligent Systems - Geoff Hulten
- Web Scalability for Startup Engineers - Artur Ejsmont
- Data Science on AWS - Chris Fregly, Antje Barth
- Designing Data-Intensive Applications - Martin Kleppmann
- System Design Interview - Alex Xu
- Clean Architecture - Robert Martin
If you remember anything from this course…
Ethics should always be a priority in your work.
Data wrangling is a puzzle and a big part of the job. When done well, it’s not boring!
Data science is a competitive, but rewarding field. You have a change to make a big difference!
Your grade in this course is probably not predictive of future success.