Numpy

Misc

Linear algebra resources
- numpy.linalg
- scipy.linalg documentation
Optimization
- {{numba}} - JIT compiler that translates a subset of Python and NumPy code into fast machine code.
Terms
- Broadcasting is a mechanism that allows Numpy to handle (nd)arrays of different shapes during arithmetic operations.
  - See article for details on how this works and when it fails (ValueErrors)
  - A smaller (nd)array being “broad-casted” into the same shape as the larger (nd)array, before doing certain operations.
  - The smaller (nd)array will be copied multiple times, until it reaches the same shape as the larger (nd)array.
  - Fast, since it vectorizes array operations so that looping occurs in optimized C code
- Memory Views: Working with views can be highly desirable since it avoids making unnecessary copies of arrays to save memory resources
  - np.may_share_memory(new_array, old_array) - if the result is TRUE, then new_array is a memory view
- ndarrays - multi-dimensional arrays of fixed-size items.
- Pandas will typically outperform numpy ndarrays in cases that involve significantly larger volume of data (say >500K rows) (not sure if this is true)
Info (no parentheses after method)
- Number of dimensions: ary.ndim
- Shape: ary.shape
- Number of elements: ary.size
- Number of rows (i.e. 1st dim): len(ary)

Random Number Generator

rng2 = np.random.default_rng(seed=123)
rng2.random(3)

array([0.68235186, 0.05382102, 0.22035987])

Sample w/replacement

np.random.seed(3)
# a parameter: generate a list of unique random numbers (from 0 to 11)
# size parameter: how many samples we want (12)
# replace = True: sample with replacement
np.random.choice(a=12, size=12, replace=True)

Create a grid of values
```
grid_q_low = np.linspace(number1,number2,num_vals).reshape(-1,1)
grid_q_high = np.linspace(number3,number4,num_vals).reshape(-1,1)
grid_q = np.concatenate((grid_q_low,grid_q_high),1)
```
- linspace returns evenly spaced numbers over a specified interval.
  - “number1,2,3,4” are numeric values for args: start and stop
  - reshape coerces the results into m x 1 column arrays (-1 is a placeholder)
- concantenate axis = 1 says stack column-wise, so this results in a m x 2 array

Create or Coerce

Comparison with R DataFrame

>>> X = np.arange(6).reshape(3, 2)
>>> X
array([[0, 1],
      [2, 3],
      [4, 5]])
# r
X <- data.frame(x1 = c(0,2,4), x2 = c(1,3,5))

Variables are column in the array

Create column-wise array

# example 1
a = np.array((1,2,3))
b = np.array((2,3,4))
np.column_stack((a,b))
array([[1, 2],
      [2, 3],
      [3, 4]])

# example 2
np.column_stack([
    model.predict(X_cal, quantile=(alpha/2)*100), 
    model.predict(X_cal, quantile=(1-alpha/2)*100)])

Create a constant array
```
constant_arr = np.full((other_array.shape), 5)
# ** Don't really need this, since other_array + 5 works through broadcasting **
```
- “other_array” the array we want the constant array to do arithmetic with
- .shape method outputs other_array’s dimensions

Coerce from list

a = [1, 2, 3]
np.array(a)

a = [[1,2,3], [4,5,6]]
np.array(a, dtype = np.float32)

dtype is optional

Convert pandas df to ndarray
- new_array = pandas_df.values
- pandas_df.to_numpy()
- np.array(df)

Manipulation

Subsetting a row
```
ary = np.array([[1, 2, 3],
                [4, 5, 6]])

first_row = ary[0]
first_row = ary[1:3]
```
- Any changes to “first_row” also change “ary”
- Produces a “memory view” which conserves memory and increases speed
- Can only subset contiguous indices
Subsetting columns using Fancy Indexing
```
ary_copy = ary[:, [0, 2]] # first and and last column
```
- Uses tuple or list objects of non-contiguous integer indices to return desired array elements
- ** produces a copy of the array. So takes-up more memory**

Boolean masking

ary_bool1 = (ary > 3) & (ary % 2 == 0)
ary_bool2 = ary > 3
ary_bool2

array([[False, False, False],
      [ True,  True,  True]])

Subsetting 1st elt of all dimensions using ellipsis

# create an array with a random number of dimensions
dimensions = np.random.randint(1,10)
items_per_dimension = 2
max_items = items_per_dimension**dimensions
axes = np.repeat(items_per_dimension, dimensions)
arr = np.arange(max_items).reshape(axes)

arr[..., 0]
array([[[[ 0,  2],
        [ 4,  6]],

        [[ 8, 10],
        [12, 14]]],

      [[[16, 18],
        [20, 22]],

        [[24, 26],
        [28, 30]]]])

ellipsis makes it so if you have a large (or unknown) number of dimensions, you don’t have to use a ton of colons to subset the array
Here, “arr” has five dimensions

Filter by boolean mask
```
ary[ary_bool2]

array([4, 5, 6])
```

Reshaping

1 dim to 2 dim

ary1d = np.array([1, 2, 3, 4, 5, 6])
ary2d_view = ary1d.reshape(2, 3)
ary2d_view

array([[1, 2, 3],
      [4, 5, 6]])

2 x 3 array

Need 2 columns
```
ary1d.reshape(-1, 2)
```
- -1 is a placeholder
- Useful if we don’t know the number of rows, but we know we want 2 columns

Flatten array

ary = np.array([[[1, 2, 3],
                [4, 5, 6]]])
ary.reshape(-1)
ary.ravel()
ary.flatten()

array([1, 2, 3, 4, 5, 6])

reshape and ravel produce memory views; flatten produces a copy in memory
-1 is a placeholder

Combine arrays
```
ary = np.array([[1, 2, 3]])
# stack along the first axis (here: rows)
np.concatenate((ary, ary), axis=0)
```
- axis=1 would be stack column-wise (i.e. side-by-side)
- Computationally ineffiicient, so should avoid if possible.

Sort vector (arrange)

# asc
>>> boris = np.maximum(moose, squirrel) # see above
>>> np.sort(boris)
array([-2, -1,  1,  4])

# desc
>>> np.sort(boris,0)[::-1]
array([ 4,  1, -1, -2])

Sort array (arrange)

>>> squirrel = np.array([-2,-2,-2,-2])
>>> moose = np.array([-3,-1,4,1])
>>> natascha = np.vstack((moose, squirrel))
array([[-3, -1,  4,  1],
      [-2, -2, -2, -2]])

# column-wise (default)
>>> np.sort(natascha)
array([[-3, -1,  1,  4],
      [-2, -2, -2, -2]])
# row-wise
>>> np.sort(natascha, 0)
array([[-3, -2, -2, -2],
      [-2, -1,  4,  1]])
# row-wise desc
>>> np.sort(natascha, 0)[::-1]
array([[-2, -1,  4,  1],
      [-3, -2, -2, -2]])

Change values by condition
```
ary = np.array([1, 2, 3, 4, 5])
np.where(ary > 2, 1, 0)
```
- Any values > 2 get changed to a 1 and the rest are changed to 0

Mathematics

Incrementing the values

ary_copy += 99
array([[100, 102], 
      [103, 105]])

Matrix multiplication

matrix = np.array([[1, 2, 3], 
                  [4, 5, 6]])
column_vector = np.array([1, 2, 3]).reshape(-1, 1)
np.matmul(matrix, column_vector)

Dot product

row_vector = np.array([1, 2, 3])
np.matmul(row_vector, row_vector)
np.dot(row_vector, row_vector)

One or the other can be slightly faster on specific machines and versions of BLAS

Transpose a matrix

matrix = np.array([[1, 2, 3], 
                  [4, 5, 6]])
matrix.transpose()

array([[1, 4],
      [2, 5],
      [3, 6]])

Find pairwise maximum (pmax)

>>> squirrel = np.array([-2,-2,-2,-2])
>>> moose = np.array([-3,-1,4,1])
>>> np.maximum(moose, squirrel)
array([-2, -1,  4,  1])