What is Python? Everything you need to know

Why the Python programming language shines for data science, machine learning, systems automation, web and API development, and beyond

1 2 Page 2
Page 2 of 2

What is Python used for?

The most basic use case for Python is as a scripting and automation language. Python isn’t just a replacement for shell scripts or batch files, but is also used to automate interactions with web browsers or application GUIs or system provisioning and configuration in tools such as Ansible and Salt. But scripting and automation represent only the tip of the iceberg with Python.

Python is used for general application programming. Both CLI and cross-platform GUI applications can be created with Python and deployed as self-contained executables. Python doesn’t have the native ability to generate a standalone binary from a script, but third-party packages like cx_Freeze or PyInstaller can be used to accomplish that.

Python is used for data science and machine learning. Sophisticated data analysis has become one of fastest moving areas of IT and one of Python’s star use cases. The vast majority of the libraries used for data science or machine learning have Python interfaces, making the language the most popular high-level command interface to for machine learning libraries and other numerical algorithms.

Python is used for web services and RESTful APIs. Python’s native libraries and third-party web frameworks provide fast and convenient ways to create everything from simple REST APIs in a few lines of code, to full-blown, data-driven sites. Python’s latest versions have powerful support for asynchronous operations, allowing sites to handle up to tens of thousands of requests per second with the right libraries.

Python is used for metaprogramming. In Python, everything in the language is an object, including Python modules and libraries themselves. This allows Python to work as a highly efficient code generator, making it possible to write applications that manipulate their own functions and have the kind of extensibility that would be difficult or impossible to pull off in other languages.

Python is used for glue code. Python is often described as a “glue language,” meaning it can allow disparate code (typically libraries with C language interfaces) to interoperate. Its use in data science and machine learning is in this vein, but that’s just one incarnation of the general idea.

Also worth noting are the sorts of tasks Python is not well-suited for. Python is a high-level language, so it’s not suitable for system-level programming—device drivers or OS kernels are straight out. It’s also not ideal for situations that call for cross-platform standalone binaries. You could build a standalone Python app for Windows, Mac, and Linux, but not elegantly or simply. Finally, Python is not the best choice when speed is an absolute priority in every aspect of the application. For that you’re better off with C/C++ or another language of that caliber.

The Python language’s pros and cons

Python syntax is meant to be readable and clean, with little pretense. A standard “hello world” in Python 3.x is nothing more than:

print(“Hello world!”)

Python provides many syntactical elements that make it possible to concisely express many common program flows. Consider a sample program for reading lines from a text file into a list object, stripping each line of its terminating newline character along the way:

with open(‘myfile.txt’) as my_file:
    file_lines = [x.strip(‘\n’) for x in my_file]

The with/as construction is a “context manager,” which provides an efficient way to instantiate a given object for a block of code and then dispose of it outside of that block. In this case, the object in question is my_file, instantiated with the open() function. This takes the place of several lines of boilerplate to open the file, read individual lines from it, then close it up.

The [x … for x in my_file] construction is another Python idiosyncrasy, the “list comprehension.” It allows a given item that contains other items (here, my_file and the lines it contains) to be iterated through, and to allow each iterated element (that is, each x) to be processed and automatically appended into a list.

You could write such a thing as a formal for… loop in Python, much as you would in another language. The point is that Python has a way to economically express things like loops that iterate over multiple objects and perform some simple operation on each element in the loop, or work with things that require explicit instantiation and disposal. Constructions like this allow Python developers to balance terseness and readability.

Python’s other language features are meant to complement common use cases. Most modern object types—Unicode strings, for instance—are built directly into the language. Data structures—like lists, dictionaries (i.e., hashmaps), tuples (for storing immutable collections of objects), and sets (for storing collections of unique objects)—are available as standard-issue items.

Like C#, Java, and Go, Python has garbage-collected memory management, meaning the programmer doesn’t have to implement code to track and release objects. Normally garbage collection happens automatically in the background, but if that poses a performance problem, it can be triggered manually or disabled entirely.

An important aspect of Python is its dynamism. Everything in the language, including functions and modules themselves, are handled as objects. This comes at the expense of speed (more on that below), but makes it far easier to write high-level code. Developers can perform complex object manipulations with only a few instructions, and even treat parts of an application as abstractions that can be altered if needed.

Python’s use of significant whitespace has been cited as both one of Python’s best and worst attributes. The indentation on the second line shown above isn’t just for readability; it is part of Python’s syntax. Python interpreters will reject programs that don’t use proper indentation to indicate control flow.

Syntactical white space might cause noses to wrinkle, and some people do reject Python out of hand for this reason. But strict indentation rules are far less obtrusive in practice than they might seem in theory, even with the most minimal of code editors, and the end result is code that is cleaner and more readable.

Another potential turnoff, especially for those coming from languages like C or Java, is the way Python handles variable typing. By default, Python uses dynamic or “duck” typing—great for quick coding, but potentially problematic in large code bases. That said, Python has recently added support for optional compile-time type hinting, so projects that might benefit from static typing can make use of it.

Python 2 versus Python 3

Python is available in two versions, which are different enough to trip up many new users. Python 2.x, the older “legacy” branch, will continue to be supported (i.e. receive official updates) through 2020, and it might even persist unofficially after that. Python 3.x, the current and future incarnation of the language, has many useful and important features not found in 2.x, such as better concurrency controls and a more efficient interpreter.

Python 3 adoption was slowed for the longest time by the relative lack of third-party library support. Many Python libraries supported only Python 2, making it difficult to switch. But over the last couple of years, the number of libraries supporting only Python 2 has dwindled; most are now compatible with both versions. Today, there are few reasons against using Python 3.

Python: A “batteries included” experience

The success of Python rests on a rich ecosystem of first- and third-party software. Python benefits from both a robust standard library and a generous assortment of easily obtained and readily used libraries from third-party developers. Python has been enriched by decades of expansion and contribution.

Python’s standard library provides modules for common programming tasks—math, string handling, file and directory access, networking, asynchronous operations, threading, multiprocess management, and so on. But it also includes modules that manage common, high-level programming tasks needed by modern applications: reading and writing structured file formats like JSON and XML, manipulating compressed files, working with internet protocols and data formats (web pages, URLs, email). Most any external code that exposes a C-compatible foreign function interface can be accessed with Python’s ctypes module. The default Python distribution also provides a rudimentary, but useful, cross-platform GUI library by way of Tkinter, and an embedded copy of the SQLite 3 database.

The thousands of third-party libraries, available through the Python Package Index (PyPI), constitute the strongest showcase for Python’s popularity and versatility. The BeautifulSoup library provides an all-in-one toolbox for scraping HTML—even tricky, broken HTML—and extracting data from it. Frameworks like Flask and Django allow rapid development of web services that encompass both simple and advanced use cases. Multiple cloud services can be managed through Python’s object model by way of Apache Libcloud. NumPy, Pandas, and Matplotlib accelerate math and statistical operations, and make it easy to create visualizations of data.

Python distributions for all

The most straightforward way to get Python is by downloading a release for your platform from the Python Software Foundation, the creators of the language. CPython, as this edition is called, is used as the stock Python runtime in every major Linux distribution as well as MacOS. That said, a wealth of other Python distributions exist to serve specific audiences.

Python for enterprise developers. ActiveState markets its own Python distribution, ActivePython, to enterprise users who want support and a rich set of development tools, such as ActiveState’s own Komodo IDE.

Python for data scientists. The Anaconda distribution, created by Continuum Analytics, includes a slew of common libraries for machine learning and data wrangling. Installing those libraries by hand can be tricky, especially on Windows. Anaconda saves you that trouble, and provides mechanisms for keeping them up to date and installing other libraries in the same vein. See also the Intel Distribution for Python, a repackaging of Anaconda using Intel’s custom math acceleration extensions.

Python for developers with a need for speed. PyPy accelerates Python applications by way of just-in-time compilation, a handy way to amp up an existing Python app without having to rewrite it. The biggest limitation with PyPy is that it works best with Python apps that don’t use external C libraries, but its development team has been addressing that problem.

Python for .Net and Java developers. Editions of Python exist that run the .Net and Java Virtual Machine runtimes—IronPython and Jython, respectively. Both of them allow Python to interoperate with other languages on their respective runtimes—such as an IronPython app can interoperate with .Net classes. Jython development hasn’t budged much in the last couple of years, but work on IronPython has been rejuvenated with a new development team.

Is Python too slow?

One common caveat about Python is that it’s slow. Objectively, it’s true. Python programs generally run much more slowly than corresponding programs in C/C++ or Java. Some Python programs will be slower by an order of magnitude or more.

Why so slow? It isn’t just because most Python runtimes are interpreters rather than compilers. It is also due to the fact that the inherent dynamism and the malleability of objects in Python make it difficult to optimize the language for speed, even when it is compiled. That said, Python’s speed may not be as much of an issue as it might seem, and there are ways to alleviate it.

Python has many routes for speed optimization. It isn’t always the fate of a slow Python program to be forever slow. Many Python programs are slow because they don’t properly leverage the functionality present in Python or its standard library. Math and statistics operations can be boosted tremendously by way of libraries such as NumPy and Pandas, and the PyPy runtime can provide orders-of-magnitude speedups for many Python apps.

A common adage of software development is that 90 percent of the activity for a program tends to be in 10 percent of the code, so optimizing that 10 percent can yield major improvements. With Python, you can selectively convert that 10 percent to C or even assembly, by way of projects like Cython or Numba. The end result is often a program that runs within striking distance of a counterpart written entirely in C, but without being cluttered with C’s memory micromanagement details.

In Python, developer time is usually far more valuable than machine time. Or to put it another way: For many tasks, speed of development beats speed of execution.

A given Python program might take six seconds to execute versus a fraction of a second in another language. But it might take only ten minutes for a developer to put that Python program together, versus an hour or more of development time in another language. The amount of time lost in the execution of the Python program is more than gained back by the time saved in the development process.

Obviously, this is less true when you’re writing software that has high-throughput, low-concurrency demands, such as a trading application. But for many real-world applications, in domains ranging from systems management to machine learning, Python will prove to be rich enough and fast enough for the job. And the flexibility and pace of development that Python enables may allow for innovation that would be more difficult and time-consuming to achieve in other languages.

When speed of development and programmer comfort are more important than shaving a few seconds off the machine clock, Python may well be the best tool for the job.

1 2 Page 2
Page 2 of 2