Source: https://drive.google.com/file/d/1dZnXorO3mHcLTs1tggE6NFs_t3xgRSmJ/view
Why Effective Data Communication Matters
- It’s often the only thing your coworkers/bosses see
- It can set your work apart from others’
- It helps show off the awesome stuff you’ve done
- Cognitive load is a thing
Less is More
effective, attractive, impactive
Descriptive and Exploratory Data Analyses
Analytical Approaches
Typically Less Effort
Descriptive Analysis
- 1st thing you do on new data
- Summarize the data
- Univariate plots of variables
Exploratory Analysis
- Exploring relationships
- Asking/Defining Questions
- Univariate/Bivariate/Multivariate analysis and plotting
Inferential Analysis
- Estimating uncertainty
- Test theories (inferring) about the population (world)
- Building inference models
Predictive Analysis
- Building predictive models
- Use historical knowledge to predict future events
- Finding patterns
Causal Analysis
- Determine the average change in one variable when you alter another
- Typically requires experiments (e.g. randomized studies)
- “Gold” standard in data analysis
Mechanistic Analysis
- Understand precise changes one variable has on another
- Typically modeled using deterministic equations
Descriptive Analysis
- Size
- Missingness
- Shape
- Central tendency
- Variability
Outliers can occur due to...
- Data entry errors
- Poor sampling procedures
- Technical or mechanical error
- Unexpected changes in weather
- People providing inaccurate information
Observations should only be removed from your dataset if you have a valid reason to do so.
mean and median are used to summarize the central tendency for quantitative variables
mode is most helpful in describing the central tendency for categorical variables
The central tendency tells you part of the story. The variability in the values in your observation helps fill in the rest.
Variability tells how spread out the values are
Range: highest score - lowest score
Interquartile range (IQR): 75th percentile - 25th percentile
Variance: measures how close the values in the distribution are to the middle of the distribution
- average squared difference from the mean
: sample variance : ith element of the sample : mean of the sample : sample size
Standard Deviation (SD): square root of the variance