# Advanced

## Advanced Array Indexing¶

Earlier we discussed basic array indexing techniques. Here we discuss advanced techniques and take a deeper dive into how array indexing works.

We start by setting up a 3x2x4 array of integers, `foo`

.

```
foo = np.arange(3*2*4).reshape((3,2,4))
print(foo)
# [[[ 0 1 2 3]
# [ 4 5 6 7]]
# [[ 8 9 10 11]
# [12 13 14 15]]
# [[16 17 18 19]
# [20 21 22 23]]]
```

Observe the result of `foo[:,:,0]`

```
print(foo[:,:,0])
# [[ 0 4]
# [ 8 12]
# [16 20]]
```

Recall the mental model for interpreting an N-dimensional array.

- 1-d array: imagine a row of values
- 2-d array: imagine a matrix of values
- 3-d array: imagine a row of matrices
- 4-d array: imagine a matrix of matrices

In this case, we can think of `foo`

as a row of matrices where element `(i,j,k)`

corresponds to the ith matrix,
the jth row, and the kth column. So, `foo[:,:,0]`

is requesting "every matrix, every row, column 0" resulting in
this sub-array.

```
# [[ 0 4]
# [ 8 12]
# [16 20]]
```

What's surprising is that we started with a *3-dimensional* array, but `foo[:,:,0]`

returned a *2-dimensional* array.
When we select more than one column, for example like this

```
print(foo[:,:,[0,1]])
# [[[ 0 1]
# [ 4 5]]
# [[ 8 9]
# [12 13]]
# [[16 17]
# [20 21]]]
```

the result is a 3-dimensional array as you might expect.

### Subsetting with same-shape index arrays¶

Let's see another example, this time using 3x2 index arrays for i, j, and k indexes of `foo`

.

```
i = [
[0,2],
[2,0],
[1,1]
]
j = [
[0,0],
[0,0],
[1,1]
]
k = [
[0,1],
[0,2],
[0,3]
]
print(foo[i,j,k])
# [[ 0 17]
# [16 2]
# [12 15]]
```

## Important

**When every dimension is indexed with an array (or list) and each of those arrays is the same shape, the output
array will be the same shape as the index arrays.**

In this case each of our index arrays is 3x2, so we know our result will also be a 3x2 array.

To understand the values in the result, you could imagine zipping the three index arrays into a matrix of 3-element tuples like this

```
[[(0,0,0), (2,0,1)],
[(2,0,0), (0,0,2)],
[(1,1,0), (1,1,3)]]
```

Here, each tuple gives the location of the corresponding output element.

Here's another example where we use 2x2x2x2 index arrays to extract values from `foo`

. Notice the result is also
2x2x2x2.

```
idx = np.zeros(shape=(2,2,2,2), dtype='int64')
result = foo[idx, idx, idx]
result.shape # (2, 2, 2, 2)
```

Again, **the shape of the output is dependent on the shape of the index arrays**, not the shape of the array you're
indexing into.

### Subsetting with varying-shape index arrays¶

We've considered the case where every dimension has an index array and each of the index arrays are the same shape. What if every dimension has an index array, but the index arrays have different shapes?

```
i = [0,1]
j = [0,1]
k = [[0],
[2],
[3]]
print(foo[i,j,k])
# [[ 0 12]
# [ 2 14]
# [ 3 15]]
```

In this case, NumPy broadcasts the index arrays, in essence forming three 2x3 index arrays.

```
i = [[0, 1],
[0, 1],
[0, 1]]
j = [[0, 1],
[0, 1],
[0, 1]]
k = [[0, 0],
[2, 2],
[3, 3]]
```

## Tip

You can do `np.broadcast_arrays(i,j,k)`

to see how the original index arrays broadcast into identically shaped
index arrays. See `broadcast_arrays()`

.

Subsetting `foo`

with equivalently shaped index arrays is straight-forward based on what we already covered.

### Subsetting with sice indexers¶

Lastly, we consider the case where our array contains slice indexers `:`

. For example,

```
i = [0,0,2,2]
k = [[0],
[1],
[2]]
print(foo[i,:,k])
# [[[ 0 4]
# [ 0 4]
# [16 20]
# [16 20]]
# [[ 1 5]
# [ 1 5]
# [17 21]
# [17 21]]
# [[ 2 6]
# [ 2 6]
# [18 22]
# [18 22]]]
```

What's going on here? The trick to understanding these complex scenarios is to build fully expanded, same-shaped index arrays for each dimension as follows:

```
broadcast all index arrays
for each slicer:
Copy each index array N times along a new last axis where N is the size of the current slicer’s dimension
Represent the current slicer with an index array
```

Let's walk through that last example. We start by broadcasting the index arrays into the following (3,4) arrays.

```
i = [[0, 0, 2, 2],
[0, 0, 2, 2],
[0, 0, 2, 2]]
```

```
k = [[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2]]
```

Now we create an index array for j. Start with a (3,4) array of `?`

s.

```
j = [[?, ?, ?, ?],
[?, ?, ?, ?],
[?, ?, ?, ?]]
```

Since j spans the second axis of `foo`

which is length-2, replace each `?`

with the indices `[0,1]`

.

```
j = [[[0,1], [0,1], [0,1], [0,1]],
[[0,1], [0,1], [0,1], [0,1]],
[[0,1], [0,1], [0,1], [0,1]]]
```

j is then promoted from a (3,4) array to a (3,4,2) array.

i and k are then each copied twice along a new last axis to match j's shape. The final (3,4,2) index arrays are

```
i = [[[0, 0],
[0, 0],
[2, 2],
[2, 2]],
[[0, 0],
[0, 0],
[2, 2],
[2, 2]],
[[0, 0],
[0, 0],
[2, 2],
[2, 2]]]
```

```
j = [[[0, 1],
[0, 1],
[0, 1],
[0, 1]],
[[0, 1],
[0, 1],
[0, 1],
[0, 1]],
[[0, 1],
[0, 1],
[0, 1],
[0, 1]]]
```

```
k = [[[0, 0],
[0, 0],
[0, 0],
[0, 0]],
[[1, 1],
[1, 1],
[1, 1],
[1, 1]],
[[2, 2],
[2, 2],
[2, 2],
[2, 2]]]
```

## Note

NumPy doesn't *actually* expand the index arrays in this way as it'd be innefficient, but NumPy's indexing
algorithm is logically equivalent to this type of index array expansion.

## Note

The slicer we used in this example `:`

is a full slice but we could use fancier slices like `::2`

which gets every
second element or `:3:-1`

which gets the first three elements in reverse order.

### Trailing Indexes¶

It's also worth mentioning that you can exclude trailing indexes in which case NumPy assumes you want all the values
from those excluded dimensions. For example, `foo[[0,1]]`

is equivalent to `foo[[0,1], :, :]`

.

### Indexing with a scalar¶

Recall `foo[:,:,0]`

returns a two-dimensional array even though `foo`

is a three-dimensional array. Why?

Recall the algorithm for determining the output of a subset with slicers and index arrays.

```
broadcast all index arrays
for each slicer:
Copy each index array N times along a new last axis where N is the size of the current slicer’s dimension
Represent the current slicer with an index array
```

We start by broadcasting all the index arrays. In this case there's only one index array - 0 - so we don't have to do any broadcasting, but what's the shape of a scalar like 0? It's actually empty or 0-dimensional. You can see this by calling

```
np.array(0).shape # ()
```

So we have an empty dimensional index, (). Then we concatenate the size of each dimension with a slice, so we end up with a (3,2) result.

If we had wrapped the 0 index in square brackets, the final result would be a (3,2,1) array.

```
foo[:,:,[0]].shape # (3, 2, 1)
```

## View vs Copy¶

It's important to understand when array indexing produces a *view* and when it produces a *copy*.

Suppose we have this 2d array, `spongebob`

```
spongebob = np.arange(12).reshape(3,-1)
print(spongebob)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
```

We can pick out the first two columns using array indexers or slicing.

```
squidward = spongebob[:, [0,1]]
patrick = spongebob[:, :2]
```

On the surface these techniques seem to produce the same result

```
np.all(patrick == squidward) # True
```

but under the hood, these techniques are subtly different. ** squidward is a copy of spongebob while patrick is a
view of spongebob**.

Observe that if we modify `squidward`

, `spongebob`

is unchanged.

```
squidward[0,0] = 100
print(spongebob)
# [[ 0 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
```

By contrast, if we modify `patrick`

, `spongebob`

changes too.

```
patrick[0,0] = 100
print(spongebob)
# [[100 1 2 3]
# [ 4 5 6 7]
# [ 8 9 10 11]]
```

### Array memory addresses¶

Another way to observe this is by looking up the memory address of the beginning of each array.

```
spongebob.__array_interface__ # {'data': (105553169745024, False), ...
squidward.__array_interface__ # {'data': (105553142497328, False), ...
patrick.__array_interface__ # {'data': (105553169745024, False), ...
```

It's clear that `spongebob`

and `patrick`

live at the same memory address, but the location of `squidward`

is elsewhere.

## Caution

The memory addresses listed here are the address of the *beginning* of each array. If two addresses are
different, the arrays don't start at the same place, but portions of the arrays may still overlap!

A better technique to check if arrays share memory is to use NumPy's `shares_memory()`

function.

```
np.shares_memory(spongebob, squidward) # False
np.shares_memory(spongebob, patrick) # True
```

### How to force copy an array¶

In, general, when you subset an array using nothing but slices, NumPy returns a *view* of the original array. You
can force numpy to copy the data using the array `copy()`

method like this.

```
spongebob[:, :2].copy()
```

## Note

When you subset an array using at least one index array, numpy will automatically copy the data.