Source: https://drive.google.com/file/d/14-xuR8p5mXivM52VjypE2J2AixCDuf02/view

p-value

The probability of getting the observed results (or results more extreme) by chance alone

Regression

Does change in one variable mean change in another?
i.e. simple regression, multiple regression

Linear Regression

best-fitting line is a model of the data

can be used to determine whether a change in one variable is related to the change in the other variable

the magnitude of the relationship is measured by the slope of the line (referred to as the model’s effect size)

Models

Mathematical equations generated to represent the real life situation

“All models are wrong, but some are useful. - George Box

Correlation

Measures the strength of the linear relationship between two variables

Effect Size ( $β$ )

can be estimated using the slope of the line

Standard Error

The closer the points are to the regression line, the less uncertain we are in our estimate.

P-value takes into account effect size () and the standard error

Assumptions of linear regression

Linear relationship

Multivariate normality

No multicollinearity

No autocorrelation

Homoscedasticity

Multicollinearity

Linear regression assumes no multicollinearity. Multicollinearity occurs when the independent variables (in multiple linear regression) are too highly correlated with each other.

Autocorrelation

Autocorrelation occurs when the observations are not independent of one another (i.e. stock prices)

Homoscedastic

Points are relatively equidistant from the line of best fit at all points on the line

p-hacking

Many forms of p-hacking

Using a subset of data

Not adjusting for, or reporting multiple ‘testing’

Trying different tests with the same hypothesis

Experimenting with your data during model fitting

Inclusionary/Exclusionary protocols for data, i.e. outliers, definitions e.g. “college students”, or “developing nations”, etc.

Optional stopping of data collection based on results thus far

Changing your alpha values on the fly

Rounding your p-values arbitrarily, e.g. 0.0558 0.05

Confounding

Variable1 Confounder Variable2

You can plan ahead to avoid confounding and/or include confounders in your models to account for their role on the outcome variable.
Ignoring confounders will lead you to draw incorrect conclusions from your analyses.

Machine Learning

Predictive Machine Learning

Apply machine learning techniques to data you have currently to generate a model that will be able to make a prediction on future data

What is machine learning?

“Machine Learning (ML) is a fascinating field of artificial intelligence (AI) research and practice where we investigate how computer agents can improve their perception, cognition, and action with experience. Machine learning is about machines improving from data, knowledge, experience, and interaction.”

Manuela Veloso, Head of ML at Carnegie Mellon

Software Engineering vs. ML Systems

Data/Input, Program Computation Output/Result
Data/Input, Output Result Computation Program

ML is the field of study that gives computers the ability to learn without being explicity programmed. - Arthur Samuel (1959)

Machine learning approaches use data to make predictions in the future

Three Main Machine Learning Generalizations

Supervised Learning

Labeled data

Make predictions

Classification or Regression!

Unsupervised Learning

Unlabeled data

Find structure

Reduce dimensions

Reinforcement Learning

Learn a set of actions

Reward feedback system

Agent explores a world

Two Types of Supervised Learning

Predicting a Continuous Value
Predicted a Class

What is Labeled Data?

Labeled images vs. unlabeled images
Labeled column vs. unlabeled column

Why does the label matter?

Supervised Approach

Train on labels

Predict labels
Unsupervised approach

Predict groups

Machine learning in the real world

Classification: Spam Filter

Classification: Image Recognition

Regression: Electricity Demand

Simple ML System

Historical data

Training Data

+ Code

Train a model

Win!

Carter's Digital Garden

Explorer

COGS 9 Lecture 18

p-hacking

Confounding

Machine Learning

Three Main Machine Learning Generalizations

Graph View

Table of Contents

Backlinks