Playback speed
×
Share post
Share post at current time
0:00
/
0:00
Transcript

How to convert Python to Cython inside Jupyter Notebooks?

Let’s see how to cythonize Python code inside Jupyter notebooks step by step.

But, let’s first answer a basic question: What is the difference between CPython and Cython?

CPython is Python’s default interpreter.

What we commonly use as Python is written in the C language and i widely available. Did you know there are other version of Python as well?

  1. IronPython – Python written in C# (for .NET)

  2. Jython – Python written in Java

  3. RustPython – Written in Rust

  4. PyPy – Written in a subset of Python called RPython. Known for its speed enhancements.

  5. Brython – Written in Javascript for client side web programming.

The syntax for all these languages is common and is the same as the Python we use everyday. Interesting isn’t it?

However, unlike CPython, the Cython module converts code back into C, compiles it and we can directly call the compiled function without the need for an interpreter.

Let’s now see how to Cythonize python code in Jupyter Notebook environment.

To benchmark let’s first write a simple for-loop logic using Python code and measure how long it takes to run.

1. Define and time a Python Function to benchmark

Let’s create a simple function and measure how long it takes to execute in Python.

import time
def somefunc(K):
    accum = 0
    for i in range(K):
        if i % 5:
            accum = accum + i
    return accum

Measure the time.

t1 = time.time()
somefunc(20000000)
t2 = time.time()
t = t2-t1
print("%.10f" % t, "seconds")
3.0396461487 seconds

So, it takes about 3.03 seconds.

Let’s now see try to run the function using Cython and see if we gain some speed.

2. How to run Python using Cython in Jupyter Notebook

We can do this in 3 simple steps:

Step 1: Install the cython package.

!pip install cython
Collecting cython
  Downloading Cython-3.0.3-cp311-cp311-win_amd64.whl (2.8 MB)
                                              0.0/2.8 MB ? eta -:--:--
                                              0.0/2.8 MB 640.0 kB/s eta 0:00:05
     -                                        0.1/2.8 MB 1.0 MB/s eta 0:00:03
     --------                                 0.6/2.8 MB 4.8 MB/s eta 0:00:01
     ------------------                       1.3/2.8 MB 8.1 MB/s eta 0:00:01
     ----------------------------------       2.4/2.8 MB 10.8 MB/s eta 0:00:01
     ---------------------------------------- 2.8/2.8 MB 11.1 MB/s eta 0:00:00
Installing collected packages: cython
Successfully installed cython-3.0.3

Step 2: Load the cython extension.

%load_ext cython

For Windows: Microsoft Visual C++ 14.0 or greater is required. Get it with “Microsoft C++ Build Tools”: https://visualstudio.microsoft.com/visual-cpp-build-tools/

Step 3: Add the Cython magin in the beginning of the cell where you want Cython to convert the Python code.

%%cython -a

def somefunc_cy(K):
    accum = 0
    for i in range(K):
        if i % 5:
            accum = accum + i
    return accum
Content of stdout:
_cython_magic_8fbc48008287a01d5af77cea06745f284b6e6aa6.c
   Creating library C:\Users\Akash\.ipython\cython\Users\Akash\.ipython\cython\_cython_magic_8fbc48008287a01d5af77cea06745f284b6e6aa6.cp311-win_amd64.lib and object C:\Users\Akash\.ipython\cython\Users\Akash\.ipython\cython\_cython_magic_8fbc48008287a01d5af77cea06745f284b6e6aa6.cp311-win_amd64.exp
Generating code
Finished generating code

Generated by Cython 3.0.3

Yellow lines hint at Python interaction.
Click on a line that starts with a “+” to see the C code that Cython generated for it.

 1: 
+2: def somefunc_cy(K):
+3:     accum = 0
+4:     for i in range(K):
+5:         if i % 5:
+6:             accum = accum + i
+7:     return accum

The new function is now ready to run.

Step 4: Everything is set. We can now run the code now.

Since Cython has already compiled somefunc_cy function, we don’t have to add the %%cython -a function in the cell. The somefunc_cy function can be called like any other Python function.

t1 = time.time()
somefunc_cy(20000000)
t2 = time.time()
t = t2-t1
print("%.10f" % t, "seconds")
1.7662577629 seconds

Notice here, we did absolutely no change to the original function, yet we have a ~50% drop in the code run time just by using the %%cython -a magic command.

3. Let’s cythonize the function

You can bring in further improvements in the code run time by defining the data type of the variables used.

The type of the accum variable is unsigned long long int.

Why is this so?

Integer is given because the sum of all numbers will be an integer. Unsigned because the sum will always be positive.

And long long?

Because the sum of all numbers can be very large. long long is added so as to increase the variable size to the maximum possible size that the system can allow.

%%cython -a

cpdef unsigned long long int somefunc_cy2(long int K):    
    cdef unsigned long long int accum = 0
    cdef long int i

    for i in range(K):
        if i % 5:
            accum = accum + i
    return accum
Content of stdout:
_cython_magic_ce5f40fea156989a1abbf0f5aee20729e3b85c20.c
   Creating library C:\Users\Akash\.ipython\cython\Users\Akash\.ipython\cython\_cython_magic_ce5f40fea156989a1abbf0f5aee20729e3b85c20.cp311-win_amd64.lib and object C:\Users\Akash\.ipython\cython\Users\Akash\.ipython\cython\_cython_magic_ce5f40fea156989a1abbf0f5aee20729e3b85c20.cp311-win_amd64.exp
Generating code
Finished generating code

Generated by Cython 3.0.3

Yellow lines hint at Python interaction.
Click on a line that starts with a “+” to see the C code that Cython generated for it.

 1: 
+2: cpdef unsigned long long int somefunc_cy2(long int K):
+3:     cdef unsigned long long int accum = 0
 4:     cdef long int i
 5: 
+6:     for i in range(K):
+7:         if i % 5:
+8:             accum = accum + i
+9:     return accum

Cython has generated more C code for this. Let’s see if there is any improvement.

t1 = time.time()
somefunc_cy2(20000000)
t2 = time.time()
t = t2-t1
print("%.10f" % t, "seconds")
0.0721337795 seconds

It takes less than a 10th of a second now.

That’s the difference Cython can bring in just by making adding the magic %%cython -a in the beginning of the cell and declaring the data types.

Discussion about this podcast

Hidden Gems of Data Science by ML+
Hidden Gems of Data Science by ML+