Python Code Optimization using Cython VS Numba
Cython
- Cython is easier to distribute than Numba and is used by NumPy, SciPy, pandas, and Scikit-Learn.
- Cython is also a more stable and mature platform, while the features and performance of Numba are still evolving.
- Cython can compile arbitrary Python code and can even directly call C functions.
- Cython has the ability to “cythonize” an entire module written using advanced Python features.
Numba
- Numba is usually easier to write for the simple cases where it works. The main issue is that it can be difficult to install Numba unless you use Conda, which is a great tool, but not one everyone wants to use.
- Numba only accelerates code that uses scalars or (N-dimensional) arrays.
- You can’t use built-in types like list or dict, or your own custom classes.
- You can’t allocate new arrays in accelerated code, nor can you use recursion.
- Numba makes it easy to accelerate functions with broadcasting by simply adding the
@vectorize
decorator.
Final Note
If you don’t need to distribute your code beyond your computer or your team (especially if you use Conda), then Numba can be a great choice. Otherwise, you should lean toward Cython.
We can’t directly optimize (replace) pandas/numpy with Cython/Numba. We first need to find bottlenecks in our code using %prun
and then optimize that pure Python code with Cython/Numba.
Discover more from OnlyDataBytes
Subscribe to get the latest posts sent to your email.