Microsoft machine learning program tackles coding drudgery

The DeepCoder experimental machine learning project from Microsoft and the University of Cambridge can learn how to write simple software from provided example code

Could machines, in time, write software themselves, and take programmers' jobs?

At the very least, they might well provide the same boon automation has for many other fields: Remove some of the drudgery, and leave developers to do more creative work.

A recently released research paper co-authored by Microsoft Research and the University of Cambridge discusses how a machine learning system called DeepCoder could learn to write small programs by using routines from other programs as raw material.

Go deeper

DeepCoder starts small. It uses small snippets of code, only a few lines each, written in a custom, DSL (domain-specific language) to make it easier to analyze the input and output of each snippet. The better a match each snippet is to solving a particular problem, the more likely it'll end up as part of the solution. Over time, and with training on new code snippets, DeepCoder's speed and accuracy improves.

The paper describes this approach as "recasting the problem [of building systems that can write computer programs] as a big data problem." It's reminiscent of how machine learning experts have tackled problems like language translation: Give a computer a big enough body of the same text in two languages, and you can generate statistical models for how each language maps to the other in order to translate between the two. Likewise, if you provide a large enough body of data about how software behaves, it's theoretically possible to make inferences about what pieces would be needed to make data behave a certain way.

Promising as all this sounds, it's currently very limited. For one, DeepCoder's DSL is its own creation, and due to its deliberately minimal feature set, it's easier deduce the behavior of code written with it. As such, it's currently limited to basic puzzle-solving, which is a far cry from feeding DeepCoder functions written in C or Java and having software assembled from scratch in those languages.

Still a moonshot

DeepCoder's researchers are confident the basic approach can be extended in time to do more sophisticated things. This doesn't merely include working with more complex languages, but also "incorporat[ing] natural language problem descriptions to lessen the information burden required from input-output examples" -- in other words, being able to ask for a specific kind of software solution in plain English.

Developers have long dreamed of software that can automatically create other software. There already is automation that provides scaffolding to be filled out by an expert developer later ("low-code" tools). But the holiest of grails has been an AI that could turn out applications on demand based on a few basic parameters, automating the more thankless and tiresome parts of programming.

There's precedent for using machine learning to teach computers to build software. An earlier MIT project called Prophet used both bugs and big fixes to teach a system how other, unrelated software bugs could also be fixed. But like Prophet, DeepCoder is a long way from being a tool that developers could have riding shotgun with them in their IDE, let alone a software tool that could eliminate their jobs entirely.