How Mypy could simplify compiling Python

The Mypy static type-checking project for Python is exploring ways it could aid with effortlessly compiling Python into C or machine language

How Mypy could simplify compiling Python
Credit: Florida Fish & Wildlife

It’s the dream of every professional Python programmer: Take an existing Python application, run it through a compiler, and generate high-speed, platform-native code that respects Python’s dynamic nature.

In theory, that’s possible to do right now—sort of. The problem is, every available path is fraught with limitations. You have to modify the source in nonstandard ways (Cython), use a replacement runtime that’s many times larger than the regular one and has its own limitations (PyPy), or work with tools that are still enormously unstable and experimental (Nuitka).

Now the development team for Mypy, the optional static-typing syntax for Python that’s become a standard issue for the language, is mulling the possibility of using Mypy to generate code that can be compiled to a binary.

The discussion that has sprung up around the idea has underscored one issue: Tools for compiling Python need to make Python code faster without also complicating matters for developers.

Exactly my type

The idea of using Mypy as a step toward a statically compiled version of Python has been kicking around ever since the syntax arose—more so since Mypy was accepted as a Python standard.

Python creator Guido van Rossum has previously stated that introducing Mypy into Python wasn’t intended as a prelude to making Python statically typed. One of Python’s biggest boons has always been its dynamism; anything that makes Python less dynamic by default isn’t much of a win.

“The basic philosophy of the language is not going to change,” said van Rossum. Mypy allows code linters and checkers to better guarantee the quality of Python code, while also preserving the dynamism.

That said, van Rossum isn’t opposed to the idea of implementations of Python that are statically compiled. To that end, Mypy’s syntax could be used as the basis for an intermediate representation or transpiled version of Python that’s fed to a compiler. Python as a language would remain dynamic; it has a new route to become static. (van Rossum has also given the thumbs-up to exploring strategies for Mypy to aid with static compilation.)

What comes out the other end

The discussion within the Mypy team is still in early stages, but a few key points have emerged in the debate. If this were done, Mypy’s developers argue, what kind of code would be produced?

One method involves what has been described as a PyIR: an intermediate representation for Python code, or IR, that takes inspiration from the WebAssembly project. Another possibility, which potentially involves far less reinventing from scratch, would be to leverage the Cython project.

Cython allows existing Python code to be compiled to C, often with dramatic speed-ups, by annotating Python code with markup to indicate static variable typing. But Cython’s markup doesn’t look anything like Mypy—or C; it’s an idiosyncractic subdialect of Python. Still, it isn’t much of a stretch to imagine how Mypy-annotated Python code could be converted to Cython (and thus to C). 

As elegant as that solution sounds, it’s also fraught with drawbacks. For one, some high-level type annotations in Mypy don’t map precisely to some close-to-the-machine Cython types. Developers would either have to add more type annotations or the compiler would have to make contextual guesses about typing.

The GitHub discussion also exposed the problem of being forced to choose between types that aid programmer productivity (Python’s forte) or typing that’s useful to the compiler (C’s specialty). Cython is mainly used to optimize individual modules and functions where speed is critical—“hot spots”—rather than entire programs, so developers only need to provide type annotation to the parts of the app that need it. By contrast, the Mypy discussion takes the more holistic idea of feeding the compiler the entire Python source tree and getting back a single, unified product.

Delivering the goods

If that’s the larger goal with a static compiler for Python, the real problem may not be with static compilation, but rather with how static compilation can be part of an end-to-end toolchain for the language. Toolchains in languages like C/C++—as well as derivatives like Rust, Go, and Nim—produce a binary without the need for an external runtime. Python has long been an interpreted language, so has no native tradition of it, apart from third-party packaging (not compilation) tools like PyInstaller.

If Python gained such tools, they likely won't come from the core language development process, since the core language is first and foremost dynamic and interpreted. But there’s clearly interest in having more toolchains for Python that reliably produce platform-native executables—or at the least lower-level code that can be compiled into same.

As commenter wtpayne said on the GitHub discussion, “I’m interested in building a *smooth* and *continuous* conveyor belt that takes ideas (algorithms) from inception to production.”