Python Hands-on

What is Python? Everything you need to know

Why the Python programming language shines for data science, machine learning, systems automation, web and API development, and beyond

What is Python? Everything you need to know
Danleo (CC BY-SA 3.0)

Dating from 1991, Python is a relatively new programming language. From the start, Python was considered a gap-filler, a way to write scripts that “automate the boring stuff” (as one popular book on learning Python put it) or to rapidly prototype applications that will be implemented in one or more other languages.

However, over the past few years, Python has emerged as a first-class citizen in modern software development, infrastructure management, and data analysis. It  is no longer a back-room utility language, but a major force in web application development and systems management and a key driver behind the explosion in big data analytics and machine intelligence.

Python’s success revolves around several advantages it provides for beginners and experts alike:

Python is easy to learn. The number of features in the language itself is modest, requiring relatively little investment of time or effort to produce one’s first programs. Python syntax is designed to be readable and straightforward. This simplicity makes Python an ideal teaching language, and allows newcomers to pick it up quickly. Developers spend more time thinking about the problem they’re trying to solve, and less time thinking about language complexities or deciphering code left by others.

Python is broadly used and supported. Python is both popular and widely used, as the high rankings in surveys like the Tiobe Index and the large number of GitHub projects using Python attest. Python runs on every major operating system and platform, and most minor ones too. Many major libraries and API-powered services have Python bindings or wrappers, allowing Python to interface freely with those services or make direct use of those libraries. Python may not be the fastest language, but what it lacks in speed, it makes up for in versatility.

Python is not a “toy” language. Even though scripting and automation cover a large chunk of Python’s use cases (more on that below), Python is also used to build robust, professional-quality software, both as standalone applications and as web services.

What is Python used for?

The most basic use case for Python is as a scripting and automation language. Python isn’t just a replacement for shell scripts or batch files, but is also used to automate interactions with web browsers or application GUIs or system provisioning and configuration in tools such as Ansible and Salt. But scripting and automation represent only the tip of the iceberg with Python.

Python is used for general application programming. Both CLI and cross-platform GUI applications can be created with Python and deployed as self-contained executables. Python doesn’t have the native ability to generate a standalone binary from a script, but third-party packages like cx_Freeze or PyInstaller can be used to accomplish that.

Python is used for data science and machine learning. Sophisticated data analysis has become one of fastest moving areas of IT and one of Python’s star use cases. The vast majority of the libraries used for data science or machine learning have Python interfaces, making the language the most popular high-level command interface to for machine learning libraries and other numerical algorithms.

Python is used for web services and RESTful APIs. Python’s native libraries and third-party web frameworks provide fast and convenient ways to create everything from simple REST APIs in a few lines of code, to full-blown, data-driven sites. Python’s latest versions have powerful support for asynchronous operations, allowing sites to handle up to tens of thousands of requests per second with the right libraries.

Python is used for metaprogramming. In Python, everything in the language is an object, including Python modules and libraries themselves. This allows Python to work as a highly efficient code generator, making it possible to write applications that manipulate their own functions and have the kind of extensibility that would be difficult or impossible to pull off in other languages.

Python is used for glue code. Python is often described as a “glue language,” meaning it can allow disparate code (typically libraries with C language interfaces) to interoperate. Its use in data science and machine learning is in this vein, but that’s just one incarnation of the general idea.

Also worth noting are the sorts of tasks Python is not well-suited for. Python is a high-level language, so it’s not suitable for system-level programming—device drivers or OS kernels are straight out. It’s also not ideal for situations that call for cross-platform standalone binaries. You could build a standalone Python app for Windows, Mac, and Linux, but not elegantly or simply. Finally, Python is not the best choice when speed is an absolute priority in every aspect of the application. For that you’re better off with C/C++ or another language of that caliber.

The Python language’s pros and cons

Python syntax is meant to be readable and clean, with little pretense. A standard “hello world” in Python 3.x is nothing more than:

print(“Hello world!”)

Python provides many syntactical elements that make it possible to concisely express many common program flows. Consider a sample program for reading lines from a text file into a list object, stripping each line of its terminating newline character along the way:

with open(‘myfile.txt’) as my_file:
    file_lines = [x.strip(‘\n’) for x in my_file]

The with/as construction is a “context manager,” which provides an efficient way to instantiate a given object for a block of code and then dispose of it outside of that block. In this case, the object in question is my_file, instantiated with the open() function. This takes the place of several lines of boilerplate to open the file, read individual lines from it, then close it up.

The [x … for x in my_file] construction is another Python idiosyncrasy, the “list comprehension.” It allows a given item that contains other items (here, my_file and the lines it contains) to be iterated through, and to allow each iterated element (that is, each x) to be processed and automatically appended into a list.

You could write such a thing as a formal for… loop in Python, much as you would in another language. The point is that Python has a way to economically express things like loops that iterate over multiple objects and perform some simple operation on each element in the loop, or work with things that require explicit instantiation and disposal. Constructions like this allow Python developers to balance terseness and readability.

Python’s other language features are meant to complement common use cases. Most modern object types—Unicode strings, for instance—are built directly into the language. Data structures—like lists, dictionaries (i.e., hashmaps), tuples (for storing immutable collections of objects), and sets (for storing collections of unique objects)—are available as standard-issue items.

Like C#, Java, and Go, Python has garbage-collected memory management, meaning the programmer doesn’t have to implement code to track and release objects. Normally garbage collection happens automatically in the background, but if that poses a performance problem, it can be triggered manually or disabled entirely.

An important aspect of Python is its dynamism. Everything in the language, including functions and modules themselves, are handled as objects. This comes at the expense of speed (more on that below), but makes it far easier to write high-level code. Developers can perform complex object manipulations with only a few instructions, and even treat parts of an application as abstractions that can be altered if needed.

Python’s use of significant whitespace has been cited as both one of Python’s best and worst attributes. The indentation on the second line shown above isn’t just for readability; it is part of Python’s syntax. Python interpreters will reject programs that don’t use proper indentation to indicate control flow.

Syntactical white space might cause noses to wrinkle, and some people do reject Python out of hand for this reason. But strict indentation rules are far less obtrusive in practice than they might seem in theory, even with the most minimal of code editors, and the end result is code that is cleaner and more readable.

Another potential turnoff, especially for those coming from languages like C or Java, is the way Python handles variable typing. By default, Python uses dynamic or “duck” typing—great for quick coding, but potentially problematic in large code bases. That said, Python has recently added support for optional compile-time type hinting, so projects that might benefit from static typing can make use of it.

Python 2 versus Python 3

Python is available in two versions, which are different enough to trip up many new users. Python 2.x, the older “legacy” branch, will continue to be supported (i.e. receive official updates) through 2020, and it might even persist unofficially after that. Python 3.x, the current and future incarnation of the language, has many useful and important features not found in 2.x, such as better concurrency controls and a more efficient interpreter.

Python 3 adoption was slowed for the longest time by the relative lack of third-party library support. Many Python libraries supported only Python 2, making it difficult to switch. But over the last couple of years, the number of libraries supporting only Python 2 has dwindled; most are now compatible with both versions. Today, there are few reasons against using Python 3.

Python: A “batteries included” experience

The success of Python rests on a rich ecosystem of first- and third-party software. Python benefits from both a robust standard library and a generous assortment of easily obtained and readily used libraries from third-party developers. Python has been enriched by decades of expansion and contribution.

Python’s standard library provides modules for common programming tasks—math, string handling, file and directory access, networking, asynchronous operations, threading, multiprocess management, and so on. But it also includes modules that manage common, high-level programming tasks needed by modern applications: reading and writing structured file formats like JSON and XML, manipulating compressed files, working with internet protocols and data formats (web pages, URLs, email). Most any external code that exposes a C-compatible foreign function interface can be accessed with Python’s ctypes module. The default Python distribution also provides a rudimentary, but useful, cross-platform GUI library by way of Tkinter, and an embedded copy of the SQLite 3 database.

The thousands of third-party libraries, available through the Python Package Index (PyPI), constitute the strongest showcase for Python’s popularity and versatility. The BeautifulSoup library provides an all-in-one toolbox for scraping HTML—even tricky, broken HTML—and extracting data from it. Frameworks like Flask and Django allow rapid development of web services that encompass both simple and advanced use cases. Multiple cloud services can be managed through Python’s object model by way of Apache Libcloud. NumPy, Pandas, and Matplotlib accelerate math and statistical operations, and make it easy to create visualizations of data.

Python distributions for all

The most straightforward way to get Python is by downloading a release for your platform from the Python Software Foundation, the creators of the language. CPython, as this edition is called, is used as the stock Python runtime in every major Linux distribution as well as MacOS. That said, a wealth of other Python distributions exist to serve specific audiences.

Python for enterprise developers. ActiveState markets its own Python distribution, ActivePython, to enterprise users who want support and a rich set of development tools, such as ActiveState’s own Komodo IDE.

Python for data scientists. The Anaconda distribution, created by Continuum Analytics, includes a slew of common libraries for machine learning and data wrangling. Installing those libraries by hand can be tricky, especially on Windows. Anaconda saves you that trouble, and provides mechanisms for keeping them up to date and installing other libraries in the same vein. See also the Intel Distribution for Python, a repackaging of Anaconda using Intel’s custom math acceleration extensions.

Python for developers with a need for speed. PyPy accelerates Python applications by way of just-in-time compilation, a handy way to amp up an existing Python app without having to rewrite it. The biggest limitation with PyPy is that it works best with Python apps that don’t use external C libraries, but its development team has been addressing that problem.

1 2 Page 1
Page 1 of 2