sample = np.random_choice([True, False], size=len(df))
df[sample] # randomly select rows from df

`numpy` overview

fast computation involving arrays and matrices

numpy arrays

homogeneous - all values are of the same type
(potentially) multi-dimensional

numpy computation is fast because

much of it is implemented in C
numpy arrays are stored more efficiently in memory than Python lists

array.argmax() gets the first index of the max element in the array

for loop vs. vectorized arithmetic in numpy

timeit
squares = np.arange(1_000_000) ** 2

2D arrays

Using axis to compress along axes for computations

Using square brackets to index using same slicing as Python or commas

a[0][1]
a[0, 1]

Pictures as numpy arrays

Each image has pixels of RGB

Grayscale by averaging RGB values

mean_2d = img.mean(axis=2)
mean_3d = np.repeat(mean_2d[:, :, np.newaxis], 3, axis=2)

Sepia filter

sepia_filter = np.array([
    [0.393, 0.769, 0.189],
    [0.349, 0.686, 0.168],
    [0.272, 0.534, 0.131]
])
filtered = (img @ sepia_filter.T).clip(0, 1)
filtered

Pandas

DataFrame: 2d tables
Series: 1d array-like object
Index: sequence of column or row labels

import pandas as pd
import numpy as np

The standard way to select a column in pandas is by using the [] operator
Specifying a column name returns a Series
Specifying a list of column names returns a DataFrame

Querying with multiple conditions: use & and |
Querying using .query() - can use and and ignore parentheses

Carter's Digital Garden

Explorer

DSC 80 Lecture 2

`numpy` overview

2D arrays

Pictures as numpy arrays

Pandas

Graph View

Table of Contents

Backlinks

Carter's Digital Garden

Explorer

DSC 80 Lecture 2

numpy overview

2D arrays

Pictures as numpy arrays

Pandas

Graph View

Table of Contents

Backlinks

`numpy` overview