Source: https://drive.google.com/file/d/1oAbv_BsYtSI1IoWkmEl_9pHEt9iEQ0kO/view
Machine Learning Continued…
Why is ML so hard?
- Curse of dimensionality
- No free lunch theorem
- Debugging nightmares
- Bias variance tradeoff
- Time & Money
Curse of Dimensionality
It is all about the number of samples needed to approximate an arbitrary function with some set level of accuracy. The growth is exponential.
You have a finite number of samples… You need to learn some function in n-dimensional space!
No Free Lunch Theorem
All models are a simplification of reality
Simplifications are based on assumptions (our bias)
Assumptions fall under certain circumstances
No one model works best in all situations
Debugging madness
You must implement many things correctly to have working code, while it is extremely easy to have something that doesn’t work
- Enough Correct Data
- Not Enough Data
- Weak Labels
Machine Learning Complexity
- Overfitting - high variance
- Underfitting - high bias
- Good balance - low bias, low variance
Training vs. Testing Accuracy
Time is Money
Fun ML Experiments
Recap of Worked Example 1
- Machine learning identifies patterns using statistical learning and computes by unearthing boundaries in data sets. You can use it to make predictions.
- One method for making predictions is called a decision tree, which uses a series of if-then statements to identify boundaries and define patterns in the data.
- Overfitting happens when some boundaries are based on distinctions that don’t make a difference. You can see if a model overfits by having test data flow through the model.
What to do about bias
- Anticipate and plan for potential biases before model generation. Check for bias after.
- Use machine learning to improve lives rather than for punitive purposes.
- Revisit your models. Update your algorithms.
- You are responsible for the models you put out into the world, unintended consequences and all.