Numpy
Misc
Linear algebra resources
Optimization
- {{numba}} - JIT compiler that translates a subset of Python and NumPy code into fast machine code.
Terms
- Broadcasting is a mechanism that allows Numpy to handle (nd)arrays of different shapes during arithmetic operations.
- See article for details on how this works and when it fails (ValueErrors)
- A smaller (nd)array being “broad-casted” into the same shape as the larger (nd)array, before doing certain operations.
- The smaller (nd)array will be copied multiple times, until it reaches the same shape as the larger (nd)array.
- Fast, since it vectorizes array operations so that looping occurs in optimized C code
- Memory Views: Working with views can be highly desirable since it avoids making unnecessary copies of arrays to save memory resources
np.may_share_memory(new_array, old_array)
- if the result is TRUE, then new_array is a memory view
- ndarrays - multi-dimensional arrays of fixed-size items.
- Pandas will typically outperform numpy ndarrays in cases that involve significantly larger volume of data (say >500K rows) (not sure if this is true)
- Broadcasting is a mechanism that allows Numpy to handle (nd)arrays of different shapes during arithmetic operations.
Info (no parentheses after method)
- Number of dimensions:
ary.ndim
- Shape:
ary.shape
- Number of elements:
ary.size
- Number of rows (i.e. 1st dim):
len(ary)
- Number of dimensions:
Random Number Generator
= np.random.default_rng(seed=123) rng2 3) rng2.random( 0.68235186, 0.05382102, 0.22035987]) array([
Sample w/replacement
3) np.random.seed(# a parameter: generate a list of unique random numbers (from 0 to 11) # size parameter: how many samples we want (12) # replace = True: sample with replacement =12, size=12, replace=True) np.random.choice(a
Create a grid of values
= np.linspace(number1,number2,num_vals).reshape(-1,1) grid_q_low = np.linspace(number3,number4,num_vals).reshape(-1,1) grid_q_high = np.concatenate((grid_q_low,grid_q_high),1) grid_q
- linspace returns evenly spaced numbers over a specified interval.
- “number1,2,3,4” are numeric values for args: start and stop
reshape
coerces the results into m x 1 column arrays (-1 is a placeholder)
concantenate
axis = 1 says stack column-wise, so this results in a m x 2 array
- linspace returns evenly spaced numbers over a specified interval.
Create or Coerce
Comparison with R DataFrame
>>> X = np.arange(6).reshape(3, 2) >>> X 0, 1], array([[2, 3], [4, 5]]) [# r <- data.frame(x1 = c(0,2,4), x2 = c(1,3,5)) X
- Variables are column in the array
Create column-wise array
# example 1 = np.array((1,2,3)) a = np.array((2,3,4)) b np.column_stack((a,b))1, 2], array([[2, 3], [3, 4]]) [ # example 2 np.column_stack([=(alpha/2)*100), model.predict(X_cal, quantile=(1-alpha/2)*100)]) model.predict(X_cal, quantile
Create a constant array
= np.full((other_array.shape), 5) constant_arr # ** Don't really need this, since other_array + 5 works through broadcasting **
- “other_array” the array we want the constant array to do arithmetic with
.shape
method outputs other_array’s dimensions
Coerce from list
= [1, 2, 3] a np.array(a) = [[1,2,3], [4,5,6]] a = np.float32) np.array(a, dtype
- dtype is optional
Convert pandas df to ndarray
new_array = pandas_df.values
pandas_df.to_numpy()
np.array(df)
Manipulation
Subsetting a row
= np.array([[1, 2, 3], ary 4, 5, 6]]) [ = ary[0] first_row = ary[1:3] first_row
- Any changes to “first_row” also change “ary”
- Produces a “memory view” which conserves memory and increases speed
- Can only subset contiguous indices
Subsetting columns using Fancy Indexing
= ary[:, [0, 2]] # first and and last column ary_copy
- Uses tuple or list objects of non-contiguous integer indices to return desired array elements
- ** produces a copy of the array. So takes-up more memory**
Boolean masking
= (ary > 3) & (ary % 2 == 0) ary_bool1 = ary > 3 ary_bool2 ary_bool2 False, False, False], array([[True, True, True]]) [
Subsetting 1st elt of all dimensions using ellipsis
# create an array with a random number of dimensions = np.random.randint(1,10) dimensions = 2 items_per_dimension = items_per_dimension**dimensions max_items = np.repeat(items_per_dimension, dimensions) axes = np.arange(max_items).reshape(axes) arr 0] arr[..., 0, 2], array([[[[ 4, 6]], [ 8, 10], [[ 12, 14]]], [ 16, 18], [[[20, 22]], [ 24, 26], [[28, 30]]]]) [
- ellipsis makes it so if you have a large (or unknown) number of dimensions, you don’t have to use a ton of colons to subset the array
- Here, “arr” has five dimensions
Filter by boolean mask
ary[ary_bool2] 4, 5, 6]) array([
Reshaping
1 dim to 2 dim
= np.array([1, 2, 3, 4, 5, 6]) ary1d = ary1d.reshape(2, 3) ary2d_view ary2d_view 1, 2, 3], array([[4, 5, 6]]) [
- 2 x 3 array
Need 2 columns
-1, 2) ary1d.reshape(
- -1 is a placeholder
- Useful if we don’t know the number of rows, but we know we want 2 columns
Flatten array
= np.array([[[1, 2, 3], ary 4, 5, 6]]]) [-1) ary.reshape( ary.ravel() ary.flatten() 1, 2, 3, 4, 5, 6]) array([
- reshape and ravel produce memory views; flatten produces a copy in memory
- -1 is a placeholder
Combine arrays
= np.array([[1, 2, 3]]) ary # stack along the first axis (here: rows) =0) np.concatenate((ary, ary), axis
- axis=1 would be stack column-wise (i.e. side-by-side)
- Computationally ineffiicient, so should avoid if possible.
Sort vector (arrange)
# asc >>> boris = np.maximum(moose, squirrel) # see above >>> np.sort(boris) -2, -1, 1, 4]) array([ # desc >>> np.sort(boris,0)[::-1] 4, 1, -1, -2]) array([
Sort array (arrange)
>>> squirrel = np.array([-2,-2,-2,-2]) >>> moose = np.array([-3,-1,4,1]) >>> natascha = np.vstack((moose, squirrel)) -3, -1, 4, 1], array([[-2, -2, -2, -2]]) [ # column-wise (default) >>> np.sort(natascha) -3, -1, 1, 4], array([[-2, -2, -2, -2]]) [# row-wise >>> np.sort(natascha, 0) -3, -2, -2, -2], array([[-2, -1, 4, 1]]) [# row-wise desc >>> np.sort(natascha, 0)[::-1] -2, -1, 4, 1], array([[-3, -2, -2, -2]]) [
Change values by condition
= np.array([1, 2, 3, 4, 5]) ary > 2, 1, 0) np.where(ary
- Any values > 2 get changed to a 1 and the rest are changed to 0
Mathematics
Incrementing the values
+= 99 ary_copy 100, 102], array([[103, 105]]) [
Matrix multiplication
= np.array([[1, 2, 3], matrix 4, 5, 6]]) [= np.array([1, 2, 3]).reshape(-1, 1) column_vector np.matmul(matrix, column_vector)
Dot product
= np.array([1, 2, 3]) row_vector np.matmul(row_vector, row_vector) np.dot(row_vector, row_vector)
- One or the other can be slightly faster on specific machines and versions of BLAS
Transpose a matrix
= np.array([[1, 2, 3], matrix 4, 5, 6]]) [ matrix.transpose() 1, 4], array([[2, 5], [3, 6]]) [
Find pairwise maximum (pmax)
>>> squirrel = np.array([-2,-2,-2,-2]) >>> moose = np.array([-3,-1,4,1]) >>> np.maximum(moose, squirrel) -2, -1, 4, 1]) array([