Source: https://drive.google.com/file/d/1G9UA2k7KRaprGxA79ZzbusKUYbH24SCz/view?usp=sharing

Inferential Analysis

Inference and Sampling

Problem: Does Sesame Street affect kids brain development?
Data science question: Is there a relationship between watching Sesame Street and test scores among children?
Type of analysis: Inferential analysis

Inferential analysis is done by taking a representative sample of the population and using the data collected to extrapolate the results to the rest of the population.

Approaches to Inference

Correlation

Association between variables
Positive and negative correlation
Stronger relationship = higher correlation
i.e. Pearson Correlation, Spearman Correlation

Pearson's

Linear correlation between two variables
Takes values [-1,1]

Correlation does not equal slope.
Correlation does not equal causation.
Correlation establishes a relationship.
It does NOT establish causation.

Comparison of Means

Difference in means between variables
i.e. t-test, ANOVA

t-test

tests for difference in means between groups
Assumptions:

  • Data are continuous
  • Normally distributed
  • Large enough sample size
  • Equal variance b/w groups

p-value

the probability of getting the observed results (or results more extreme) by chance alone

Regression

Does change in one variable mean change in another?
i.e. simple regression, multiple regression

Non-Parametric Tests

For when assumptions in these 3 other categories are not met
i.e. Wilcoxon rank-sum test, Wilcoxon sign-rank test, sign test