NumPy is known for being fast, but could it go even faster? Here's how to use Cython to accelerate array iterations in NumPy.

NumPy gives Python users a wickedly fast library for working with data in matrixes. If you want, for instance, to generate a matrix populated with random numbers, you can do that in a fraction of the time it would take in conventional Python.

Still, there are times when even NumPy by itself isn't fast enough. If you want to perform transformations on NumPy matrixes that aren't available in NumPy's API, a typical approach is to just iterate over the matrix in Python ... and lose all the performance benefits of using NumPy in the first place.

Fortunately, there's a better way to work directly with NumPy data: Cython. By writing type-annotated Python code and compiling it to C, you can iterate over NumPy arrays and work directly with their data at the speed of C.

This article walks through some key concepts for how to use Cython with NumPy. If you're not already familiar with Cython, read up on the basics of Cython and check out this simple tutorial for writing Cython code.

## Write only core computation code in Cython for NumPy

The most common scenario for using Cython with NumPy is one where you want to take a NumPy array, iterate over it, and perform computations on each element that can't be done readily in NumPy.

Cython works by letting you write modules in a type-annotated version of Python, which are then compiled to C and imported into your Python script like any other module. In other words, you write something akin to a Python version of what you want to accomplish, then speed it up by adding annotations that allow it to be translated into C.

To that end, *you should only use Cython for the part of your program that does the actual computation*. Everything else that's not performance-sensitive—that is, everything that's not actually the loop that iterates over your data—should be written in regular Python.

Why do this? Cython modules have to be recompiled each time they're changed, which slows down the development process. You don't want to have to recompile your Cython modules every time you make changes that aren't actually about the part of your program you're trying to optimize.

## Iterate through NumPy arrays in Cython, not Python

The general method for working efficiently with NumPy in Cython can be summed up in three steps:

- Write functions in Cython that accept NumPy arrays as properly typed objects. When you call the Cython function in your Python code, send the
*entire*NumPy array object as an argument for that function call. - Perform all the iteration over the object
*in Cython*. - Return a NumPy array from your Cython module to your Python code.

So, don't do something like this:

```
for index in len(numpy_array):
numpy_array[index] = cython_function(numpy_array[index])
```

Rather, do something like this:

```
returned_numpy_array = cython_function(numpy_array)
# in cython:
cdef cython_function(numpy_array):
for item in numpy_array:
...
return numpy_array
```

I omitted type information and other details from these samples, but the difference should be clear. The actual iteration over the NumPy array should be done entirely in Cython, not through repeated calls to Cython for each element in the array.

## Pass properly typed NumPy arrays to Cython functions

Any functions that accept a NumPy array as an argument should be properly typed, so that Cython knows how to interpret the argument as a NumPy array (fast) rather than a generic Python object (slow).

Here's an example of a Cython function declaration that takes in a two-dimensional NumPy array:

```
def compute(int[:, ::1] array_1):
```

In Cython's "pure Python" syntax, you'd use this annotation:

```
def compute(array_1: cython.int[:, ::1]):
```

The `int[]`

annotation indicates an array of integers, potentially a NumPy array. But to be as precise as possible, we need to indicate the number of dimensions in the array. For two dimensions, we'd use `int[:,:]`

; for three, we'd use `int[:,:,:]`

.

We also should indicate the memory layout for the array. By default in NumPy and Cython, arrays are laid out in a contiguous fashion compatible with C. `::1`

is our last element in the above sample, so we use `int[:,::1]`

as our signature. (For details on other memory layout options, see Cython's documentation.)

These declarations inform Cython not just that these are NumPy arrays, but how to read from them in the most efficient way possible.

## Use Cython memoryviews for fast access to NumPy arrays

Cython has a feature named typed memoryviews that gives you direct read/write access to many types of objects that work like arrays. That includes—you guessed it—NumPy arrays.

To create a memoryview, you use a similar syntax to the array declarations shown above:

```
# conventional Cython
def compute(int[:, ::1] array_1):
cdef int [:,:] view2d = array_1
# pure-Python mode
def compute(array_1: cython.int[:, ::1]):
view2d: int[:,:] = array_1
```

Note that you don't need to specify the memory layout in the declaration, as that's detected automatically.

From this point on in your code, you'd read from and write to `view2d`

with the same accessing syntax as you would the `array_1`

object (e.g., `view2d`

). Any reads and writes are done directly to the underlying region of memory that makes up the array (again: fast), rather than by using the object-accessor interfaces (again: slow).

## Index, don't iterate, through NumPy arrays

Python users know by now the preferred metaphor for stepping through the elements of an object is `for item in object:`

. You can use this metaphor in Cython, as well, but it doesn't yield the best possible speed when working with a NumPy array or memoryview. For that, you'll want to use C-style indexing.

Here's an example of how to use indexing for NumPy arrays:

```
# conventional Cython:
cimport cython
@cython.boundscheck(False)
@cython.wraparound(False)
def compute(int[:, ::1] array_1):
# get the maximum dimensions of the array
cdef Py_ssize_t x_max = array_1.shape[0]
cdef Py_ssize_t y_max = array_1.shape[1]
#create a memoryview
cdef int[:, :] view2d = array_1
# access the memoryview by way of our constrained indexes
for x in range(x_max):
for y in range(y_max):
view2d[x,y] = something()
# pure-Python mode:
import cython
@cython.boundscheck(False)
@cython.wraparound(False)
def compute(array_1: cython.int[:, ::1]):
# get the maximum dimensions of the array
x_max: cython.size_t = array_1.shape[0]
y_max: cython.size_t = array_1.shape[1]
#create a memoryview
view2d: int[:,:] = array_1
# access the memoryview by way of our constrained indexes
for x in range(x_max):
for y in range(y_max):
view2d[x,y] = something()
```

In this example, we use the NumPy array's `.shape`

attribute to obtain its dimensions. We then use `range()`

to iterate through the memoryview with those dimensions as a constraint. We don't allow arbitrary access to some part of the array, for example, by way of a user-submitted variable, so there's no risk of going out of bounds.

You'll also notice we have `@cython.boundscheck(False)`

and `@cython.wraparound(False)`

decorators on our functions. By default, Cython enables options that guard against making mistakes with array accessors, so you don't end up reading outside the bounds of an array by mistake. The checks slow down access to the array, however, because every operation has to be bounds-checked. Using the decorators disables those guards by making them unnecessary. We've already determined what the bounds of the array are, and we don't go past them.