Numpy
Misc
Linear algebra resources
Optimization
- {{numba}} - JIT compiler that translates a subset of Python and NumPy code into fast machine code.
Terms
- Broadcasting is a mechanism that allows Numpy to handle (nd)arrays of different shapes during arithmetic operations.
- See article for details on how this works and when it fails (ValueErrors)
- A smaller (nd)array being “broad-casted” into the same shape as the larger (nd)array, before doing certain operations.
- The smaller (nd)array will be copied multiple times, until it reaches the same shape as the larger (nd)array.
- Fast, since it vectorizes array operations so that looping occurs in optimized C code
- Memory Views: Working with views can be highly desirable since it avoids making unnecessary copies of arrays to save memory resources
np.may_share_memory(new_array, old_array)- if the result is TRUE, then new_array is a memory view
- ndarrays - multi-dimensional arrays of fixed-size items.
- Pandas will typically outperform numpy ndarrays in cases that involve significantly larger volume of data (say >500K rows) (not sure if this is true)
- Broadcasting is a mechanism that allows Numpy to handle (nd)arrays of different shapes during arithmetic operations.
Info (no parentheses after method)
- Number of dimensions:
ary.ndim - Shape:
ary.shape - Number of elements:
ary.size - Number of rows (i.e. 1st dim):
len(ary)
- Number of dimensions:
Random Number Generator
rng2 = np.random.default_rng(seed=123) rng2.random(3) array([0.68235186, 0.05382102, 0.22035987])Sample w/replacement
np.random.seed(3) # a parameter: generate a list of unique random numbers (from 0 to 11) # size parameter: how many samples we want (12) # replace = True: sample with replacement np.random.choice(a=12, size=12, replace=True)Create a grid of values
grid_q_low = np.linspace(number1,number2,num_vals).reshape(-1,1) grid_q_high = np.linspace(number3,number4,num_vals).reshape(-1,1) grid_q = np.concatenate((grid_q_low,grid_q_high),1)- linspace returns evenly spaced numbers over a specified interval.
- “number1,2,3,4” are numeric values for args: start and stop
reshapecoerces the results into m x 1 column arrays (-1 is a placeholder)
concantenateaxis = 1 says stack column-wise, so this results in a m x 2 array
- linspace returns evenly spaced numbers over a specified interval.
dtypes
Create or Coerce
Comparison with R DataFrame
>>> X = np.arange(6).reshape(3, 2) >>> X array([[0, 1], [2, 3], [4, 5]]) # r X <- data.frame(x1 = c(0,2,4), x2 = c(1,3,5))- Variables are column in the array
Create column-wise array
# example 1 a = np.array((1,2,3)) b = np.array((2,3,4)) np.column_stack((a,b)) array([[1, 2], [2, 3], [3, 4]]) # example 2 np.column_stack([ model.predict(X_cal, quantile=(alpha/2)*100), model.predict(X_cal, quantile=(1-alpha/2)*100)])Create an empty array
Using a generic object type
results_array = np.empty((len(grid), 4), dtype=object)Specifying columns and types
results_array = np.empty((len(grid), 4), dtype=[ ('row_id', 'i4'), ('estimate', 'f8'), ('p_value', 'f8'), ('singular', 'bool') ])
Create a constant array
constant_arr = np.full((other_array.shape), 5) # ** Don't really need this, since other_array + 5 works through broadcasting **- “other_array” the array we want the constant array to do arithmetic with
.shapemethod outputs other_array’s dimensions
Coerce from list
a = [1, 2, 3] np.array(a) a = [[1,2,3], [4,5,6]] np.array(a, dtype = np.float32)- dtype is optional
Convert pandas df to ndarray
new_array = pandas_df.valuespandas_df.to_numpy()np.array(df)
Manipulation
Subsetting a row
ary = np.array([[1, 2, 3], [4, 5, 6]]) first_row = ary[0] first_row = ary[1:3]- Any changes to “first_row” also change “ary”
- Produces a “memory view” which conserves memory and increases speed
- Can only subset contiguous indices
Subsetting columns using Fancy Indexing
ary_copy = ary[:, [0, 2]] # first and and last column- Uses tuple or list objects of non-contiguous integer indices to return desired array elements
- ** produces a copy of the array. So takes-up more memory**
Boolean masking
ary_bool1 = (ary > 3) & (ary % 2 == 0) ary_bool2 = ary > 3 ary_bool2 array([[False, False, False], [ True, True, True]])Subsetting 1st elt of all dimensions using ellipsis
# create an array with a random number of dimensions dimensions = np.random.randint(1,10) items_per_dimension = 2 max_items = items_per_dimension**dimensions axes = np.repeat(items_per_dimension, dimensions) arr = np.arange(max_items).reshape(axes) arr[..., 0] array([[[[ 0, 2], [ 4, 6]], [[ 8, 10], [12, 14]]], [[[16, 18], [20, 22]], [[24, 26], [28, 30]]]])- ellipsis makes it so if you have a large (or unknown) number of dimensions, you don’t have to use a ton of colons to subset the array
- Here, “arr” has five dimensions
Filter by boolean mask
ary[ary_bool2] array([4, 5, 6])Reshaping
1 dim to 2 dim
ary1d = np.array([1, 2, 3, 4, 5, 6]) ary2d_view = ary1d.reshape(2, 3) ary2d_view array([[1, 2, 3], [4, 5, 6]])- 2 x 3 array
Need 2 columns
ary1d.reshape(-1, 2)- -1 is a placeholder
- Useful if we don’t know the number of rows, but we know we want 2 columns
Flatten array
ary = np.array([[[1, 2, 3], [4, 5, 6]]]) ary.reshape(-1) ary.ravel() ary.flatten() array([1, 2, 3, 4, 5, 6])- reshape and ravel produce memory views; flatten produces a copy in memory
- -1 is a placeholder
Combine arrays
ary = np.array([[1, 2, 3]]) # stack along the first axis (here: rows) np.concatenate((ary, ary), axis=0)- axis=1 would be stack column-wise (i.e. side-by-side)
- Computationally ineffiicient, so should avoid if possible.
Sort vector (arrange)
# asc >>> boris = np.maximum(moose, squirrel) # see above >>> np.sort(boris) array([-2, -1, 1, 4]) # desc >>> np.sort(boris,0)[::-1] array([ 4, 1, -1, -2])Sort array (arrange)
>>> squirrel = np.array([-2,-2,-2,-2]) >>> moose = np.array([-3,-1,4,1]) >>> natascha = np.vstack((moose, squirrel)) array([[-3, -1, 4, 1], [-2, -2, -2, -2]]) # column-wise (default) >>> np.sort(natascha) array([[-3, -1, 1, 4], [-2, -2, -2, -2]]) # row-wise >>> np.sort(natascha, 0) array([[-3, -2, -2, -2], [-2, -1, 4, 1]]) # row-wise desc >>> np.sort(natascha, 0)[::-1] array([[-2, -1, 4, 1], [-3, -2, -2, -2]])Change values by condition
ary = np.array([1, 2, 3, 4, 5]) np.where(ary > 2, 1, 0)- Any values > 2 get changed to a 1 and the rest are changed to 0
Mathematics
Incrementing the values
ary_copy += 99 array([[100, 102], [103, 105]])Matrix multiplication
matrix = np.array([[1, 2, 3], [4, 5, 6]]) column_vector = np.array([1, 2, 3]).reshape(-1, 1) np.matmul(matrix, column_vector)Dot product
row_vector = np.array([1, 2, 3]) np.matmul(row_vector, row_vector) np.dot(row_vector, row_vector)- One or the other can be slightly faster on specific machines and versions of BLAS
Transpose a matrix
matrix = np.array([[1, 2, 3], [4, 5, 6]]) matrix.transpose() array([[1, 4], [2, 5], [3, 6]])Find pairwise maximum (pmax)
>>> squirrel = np.array([-2,-2,-2,-2]) >>> moose = np.array([-3,-1,4,1]) >>> np.maximum(moose, squirrel) array([-2, -1, 4, 1])