sample = np.random_choice([True, False], size=len(df))
df[sample] # randomly select rows from dfnumpy overview
fast computation involving arrays and matrices
numpy arrays
- homogeneous - all values are of the same type
 - (potentially) multi-dimensional
 
numpy computation is fast because
- much of it is implemented in 
C numpyarrays are stored more efficiently in memory than Python lists
array.argmax() gets the first index of the max element in the array
for loop vs. vectorized arithmetic in numpy
timeit
squares = np.arange(1_000_000) ** 22D arrays
Using axis to compress along axes for computations
Using square brackets to index using same slicing as Python or commas
a[0][1]
a[0, 1]Pictures as numpy arrays
Each image has pixels of RGB
Grayscale by averaging RGB values
mean_2d = img.mean(axis=2)
mean_3d = np.repeat(mean_2d[:, :, np.newaxis], 3, axis=2)Sepia filter
sepia_filter = np.array([
    [0.393, 0.769, 0.189],
    [0.349, 0.686, 0.168],
    [0.272, 0.534, 0.131]
])
filtered = (img @ sepia_filter.T).clip(0, 1)
filteredPandas
DataFrame: 2d tables
Series: 1d array-like object
Index: sequence of column or row labels
import pandas as pd
import numpy as np- The standard way to select a column in 
pandasis by using the[]operator - Specifying a column name returns a Series
 - Specifying a list of column names returns a DataFrame
 
Querying with multiple conditions: use & and |
Querying using .query() - can use and and ignore parentheses