Source: https://drive.google.com/file/d/1oAbv_BsYtSI1IoWkmEl_9pHEt9iEQ0kO/view

Machine Learning Continued…

Why is ML so hard?

  • Curse of dimensionality
  • No free lunch theorem
  • Debugging nightmares
  • Bias variance tradeoff
  • Time & Money

Curse of Dimensionality

It is all about the number of samples needed to approximate an arbitrary function with some set level of accuracy. The growth is exponential.
You have a finite number of samples… You need to learn some function in n-dimensional space!

No Free Lunch Theorem

All models are a simplification of reality
Simplifications are based on assumptions (our bias)
Assumptions fall under certain circumstances
No one model works best in all situations

Debugging madness

You must implement many things correctly to have working code, while it is extremely easy to have something that doesn’t work

  • Enough Correct Data
  • Not Enough Data
  • Weak Labels

Machine Learning Complexity

  • Overfitting - high variance
  • Underfitting - high bias
  • Good balance - low bias, low variance
    Training vs. Testing Accuracy

Time is Money


Fun ML Experiments


Recap of Worked Example 1

  1. Machine learning identifies patterns using statistical learning and computes by unearthing boundaries in data sets. You can use it to make predictions.
  2. One method for making predictions is called a decision tree, which uses a series of if-then statements to identify boundaries and define patterns in the data.
  3. Overfitting happens when some boundaries are based on distinctions that don’t make a difference. You can see if a model overfits by having test data flow through the model.

What to do about bias

  1. Anticipate and plan for potential biases before model generation. Check for bias after.
  2. Use machine learning to improve lives rather than for punitive purposes.
  3. Revisit your models. Update your algorithms.
  4. You are responsible for the models you put out into the world, unintended consequences and all.