What is PyPy? Faster Python without pain
Python has earned a reputation for being powerful, flexible, and easy to work with. These virtues have led to its use in a huge and growing variety of applications, workflows, and fields. But the design of the language—its interpreted nature and runtime dynamism—means that Python has always been an order of magnitude slower than machine-native languages like C or C++.
Over the years, developers have come up with a variety of workarounds for Python’s speed limitations. For instance, you could write performance-intensive tasks in C and wrap the C code with Python; many machine learning libraries do exactly this. Or you could use Cython, a project that lets you sprinkle Python code with runtime type information that allows it to be compiled to C.
But workarounds are never ideal. Wouldn’t it be great if we could just take an existing Python program as is and run it dramatically faster? That’s exactly what you can do with PyPy.
PyPy vs. CPython
PyPy is a drop-in replacement for the stock Python interpreter, CPython. Whereas CPython compiles Python to intermediate bytecode that is then interpreted by a virtual machine, PyPy uses just-in-time (JIT) compilation to translate Python code into machine-native assembly language.
Depending on the task being performed, the performance gains can be dramatic. On the (geometric) average, PyPy speeds up Python by about 4.7 times over Python 3.7, with some tasks accelerated 50 times or more. While JIT optimizations of certain kinds are being added to new versions of the CPython interpreter, they aren’t of the same scope and power as what PyPy does right now. (This doesn’t rule out the chance they might be in the future, but for now, they aren’t.)
The best part is that little to no effort is required on the part of the developer to unlock the gains PyPy provides. Simply swap out CPython for PyPy, and for the most part you’re done. There are a few exceptions, discussed below, but PyPy’s stated goal is to run existing, unmodified Python code and provide it with an automatic speed boost.
PyPy currently supports both Python 2 and Python 3, by way of different incarnations of the project. In other words, you need to download different versions of PyPy depending on the version of Python you will be running. The Python 2 branch of PyPy has been around much longer, but the Python 3 version has been brought up to speed as of late. It currently supports versions of Python up to 3.9, with Python 3.10 supported experimentally.
In addition to supporting all of the core Python language, PyPy works with the vast majority of tools in the Python ecosystem, such as pip
for packaging or virtualenv
for virtual environments. Most Python packages, even those with C modules, should work as-is. There are limitations, however, which we’ll discuss shortly.
How PyPy works
PyPy uses optimization techniques found in other just-in-time compilers for dynamic languages. It analyzes running Python programs to determine the type information of objects as they’re created and used, then uses that type information as a guide to speed things up. For instance, if a Python function works with only one or two different object types, PyPy generates machine code to handle those specific cases.
PyPy’s optimizations are handled automatically at runtime, so you generally don’t need to tweak its performance. An advanced user might experiment with PyPy’s command-line options to generate faster code for special cases, but only rarely is this necessary.
PyPy also departs from the way CPython handles some internal functions, but tries to preserve compatible behaviors. For instance, PyPy handles garbage collection differently than CPython. Not all objects are immediately collected once they go out of scope, so a Python program running under PyPy may show a larger memory footprint than when running under CPython. But you can still use Python’s high-level garbage collection controls exposed through the gc
module, such as gc.enable()
, gc.disable()
, and gc.collect()
.
If you want information about PyPy’s JIT behavior at runtime, PyPy includes a module, pypyjit, that exposes many JIT hooks to your Python application. If you have a function or module that seems to be performing poorly with the JIT, pypyjit
allows you to obtain detailed statistics about it.
Another PyPy-specific module, __pypy__, exposes other features specific to PyPy, which can be useful for writing applications that leverage those features. Because of Python’s runtime dynamism, it is possible to construct Python applications that use these features when PyPy is present and ignores them when it is not.
PyPy’s limitations
Magical as PyPy might seem, it isn’t magic. PyPy is not a completely universal replacement for the stock CPython runtime. Some of its limitations reduce or obviate its effectiveness for certain kinds of programs. Let’s consider the most important ones.
PyPy works best with pure Python apps
PyPy has always performed best with “pure” Python applications—that is, applications written in Python and nothing else. Python packages that interface with C libraries, such as NumPy, have not fared as well due to the way PyPy emulates CPython’s native binary interfaces.
PyPy’s developers have whittled away at this issue and made PyPy more compatible with the majority of Python packages that depend on C extensions. NumPy, for instance, works very well with PyPy now. But if you want maximum compatibility with C extensions, use CPython.
PyPy works best with longer-running programs
A side-effect of how PyPy optimizes Python programs is that longer-running programs benefit most from its optimizations. The longer the program runs, the more runtime type information PyPy can gather, and the more optimizations it can make. One-and-done Python scripts won’t benefit from this sort of thing. The applications that do benefit typically have loops that run for long periods of time, or run continuously in the background—web frameworks, for instance.
PyPy doesn’t do ahead-of-time compilation
PyPy compiles Python code, but it isn’t a compiler for Python code. Because of the way PyPy performs its optimizations and the inherent dynamism of Python, there’s no way to emit the resulting JITted code as a standalone binary and re-use it. Each program has to be compiled for each run, as explained in the documentation.
If you want to compile Python into faster code that can run as a standalone application, use Cython, Numba, or the currently experimental Nuitka project.