Source: https://drive.google.com/file/d/1yFuP9fMxoUEDPV9HPh2UJbBGU4S8otKU/view

The Future of Data Science

Where We Are Today

Advice from Vicki Boykis - ML Engineer at Mozilla AI

  1. Learn SQL
  2. Learn a programming language extremely well and learn programming concepts
  3. Learn how to work in the cloud
  4. This stuff is really hard for everyone, and there are a million things it seems like you have to know. Don’t get discouraged.

Where are we going?

What you and your friends/colleagues would do is not always reflective of what the world thinks or would do.

Algorithms are fragile & Powerful

Algorithms often affect our privacy

What to learn outside of class going forward

  1. Start working with AWS (S3 buckets, EC2 instances, EBS drives)
  2. Learn how to use Linux and the terminal (commands like pwd, cd, ls,…)
  3. Get comfortable with git and github
  4. Get good at (python or R), SQL, and understand how a RDMS works
  5. Learn how the web works (HTTP status codes, front vs back end, how do web apps and websites work, what is the web ruby on rails or python flask)
  6. Begin working on a small project that interests you
  • Grab some data and just start to play with it, learn to make plots, massage the data, build simple models and analyze them. TIME AT KEYBOARD reps, reps, reps…
  1. Linear Regression, Decision Trees, PCA, Naiva Bayes, General linear models
  2. Neural networks pytorch and or keras CPU first then GPU + cloud
  3. Docker Containers
  4. Begin to discover and learn about a ‘domain’ i.e. healthcare, finance, etc.

Books!

General Data Science/Engineering Knowledge "being informed"

  • Factfulness - Hans Rosling
  • Algorithms to Live By - Brian Christian, Tom Griffiths
  • The Model Thinker - Scott Page
  • The Signal and the Noise - Nate Silver
  • Calling Bullshit - Carl Bergstrom, Jevin West
  • Confident Data Skills - Kirill Eremenko (skip second half)
  • The Black Swan - Nassim Nicholas Taleb
  • Smart Cities - Anthony Townsend
  • Super Forecasting - Phillip Tetlock, Dan Gardner
  • The Mythical Man-Month - Frederick Brooks Jr.
  • Complex Adaptive Systems - John Miller, Scott Page
  • Chaos - James Gleick
  • The Information - James Gleick
  • The Phoenix Project - Gene Kim, Kevin Behr, George Spafford
  • Data Science for Business - Foster Provost, Tom Fawett

SQL Warrior or I really love RDBs

  • SQL Queries for Mere Mortals - John Viescas
  • Practical SQL
  • Database Systems Design Implementation, and Management - Carlos Coronel, Steven Morris
  • SQL QuickStart Guide - Walter Shields
  • Database Design for Mere Mortals - Michael Hernandez
  • SQL Performance Explained - Markus Winand
  • Database Systems The Complete Book - Hector Garcia-Mollina, Jeffrey Ullman, Jennifer Widom

Data Wrangling/EDA

  • Python for Data Analysis - Wes McKinney
  • Data Science from Scratch - Joel Grus
  • Exploratory Data Analysis - John Tukey
  • Python Data Science Handbook - Jake VanderPlas
  • Data Analysis with Python and PySpark - Jonathan Rioux
  • Web Scraping with Python - Ryan Mitchell

Data Visualizations

  • Fundamentals of Data Visualization - Claus Wilke
  • Data Visualization - Kieran Healy
  • The Visual Display of Quantitative Information - Edward Tufte
  • Storytelling with Data - Cole Knaflic
  • Show Me the Numbers - Stephen Few
  • Envisioning Information - Edward Tufte
  • The Truthful Art - Alberto Cairo
  • The Grammar of Graphics - Leland Wilkinson
  • Interactive Data Visualization for the Web - Scott Murray
  • Visualization Analysis & Design - Tamara Munzner

I have a need for Stats and Probability

  • Practical Statistics for Data Scientists - Peter Bruce, Andrew Bruce
  • Statistics - David Freedman, Robert Pisani, Roger Purves
  • Naked Statistics - Charles Wheelan
  • The Manga Guide to Regression Analysis - Shin Takahashi, Iroha Inoue
  • Computer Age Statistical Inference - Bradley Efron, Trevor Hastie
  • Linear Regression and Correlation - Scott Hartshorn
  • The Art of Statistics - David Spiegelhalter
  • Causal Inference in Statistics - Judea Pearl, Madelyn Glymour, Nicholas Jewell
  • Data Analysis Using Regression and Multilevel/Hierarchal Models - Andrew Gelman, Jennifer Hill
  • All of Statistics - A Concise Course in Statistical Inference
  • Statistical Rethinking - Richard McElreath
  • Regression Modeling Strategies - Frank Harrell Jr.
  • An Introduction to Generalized Linear Models - Annette Dobson, Adrian Barnett

Machine Learning and Deep Learning

  • Python Machine Learning - Sebastian Rashka, Vahid Mirjaliti
  • An Introduction to Statistical Learning - Garreth James, Daniela Witten, Trevor Hastie, Robert Tibshirani
  • The Hundred-Page Machine Learning Book - Andriy Burkov
  • Deep Learning for Coders with fastai & PyTorch - Jeremy Howard, Sylvain Gugger
  • Deep Learning with Python - Francois Challet
  • Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow - Aurelien Geron
  • Artificial Intelligence - Stuart Russell, Peter Norwig
  • Applied Predictive Modeling - Max Kuhn, Kjell Johnson
  • Pattern Recognition and Machine Learning - Christopher Bishop
  • The Elements of Statistical Learning - Trevor Hastie, Robert Tibshirari, Jerome Friedman
  • Mathematics for Machine Learning - Marc Deisenroth, A Faisal, Cheng Ong
  • Machine Learning - Kevin Murphy
  • Deep Learning - Ian Goodfellow, Yoshua Bengio, Aaron Courville

Production Systems in ML and building scalable systems

  • Designing Machine Learning Systems - Chip Huyen
  • Building Machine Learning Powered Applications - Emmanuel Ameisen
  • Building Intelligent Systems - Geoff Hulten
  • Web Scalability for Startup Engineers - Artur Ejsmont
  • Data Science on AWS - Chris Fregly, Antje Barth
  • Designing Data-Intensive Applications - Martin Kleppmann
  • System Design Interview - Alex Xu
  • Clean Architecture - Robert Martin

If you remember anything from this course…
Ethics should always be a priority in your work.
Data wrangling is a puzzle and a big part of the job. When done well, it’s not boring!
Data science is a competitive, but rewarding field. You have a change to make a big difference!
Your grade in this course is probably not predictive of future success.