Machine learning systems seem a little bit like a math problem. Figure out the algorithm, pop in the data, and answers come out.
But how do you know the answers are right?
When you’re trying to predict what movies or books people like, that can be extremely important, the difference between a boost in a revenue and a reputation hit that appears on mediabuzz.com. Yet testing is rarely at the top of our minds as we try to develop and deploy systems based on machine learning algorithms. Simply building a good set of algorithms that model the problem space is difficult enough. But testing is a part of the software development and deployment process, and we need to look seriously at how these systems will be tested.
The first, more common type of testing is where the application is unit tested by developers, “smoke tested” by automation during the build and integration process, and manually tested by testers. This process is well-known, though it will vary depending on the type of system being developed.
The second type of testing is based on real-world input, which vary based on the data passed in. For example, one of Matt’s customers wrote software to limit risk in financial transactions. The software would analyze the market and slowly unwind a block of shares over a period of days, designed to not kick off sell-side warnings. The first input was the block to sell, but the second, real-time input was the financial markets, which vary over time, so the sales in test will not match the sales in production. This is where testing becomes more problematic. How do we test systems that may return a different result to the same data over time? Traditional testing techniques have no way of taking such a result into account. So what are testers supposed to do?
Testing machine learning systems qualitatively isn’t the same as testing any other type of software. In most testing situations, you seek to make sure that the actual output matches the expected one. With machine learning systems, looking for exactly the right output is exactly the wrong approach. You likely can’t even calculate the “right output” without writing the software twice. Even then, it might not be possible.
What testers need to focus on for machine learning applications:
1. Have objective and measurable acceptance criteria. Know the standard deviation you can accept in your problem space. This requires some quantitative information, and the ability to make sure that you understand and interpret those measurements.
2. Test with new data, rather than the original training data. If necessary, split your training set into two groups: one that does training, and one that does testing. Better, obtain and use fresh data if you are able.
3. Don’t count on all results being exact; think of them as the best guess based on the available data. If that's not good enough, the problem could be the alogirthmn or, more likely, the data set. In some cases, "tweaking" the data set to get clean input can be the fastest fix for this problem.
4. Understand the architecture of the network as a part of the testing process. Testers won’t necessarily understand how the neural network was constructed, but need to understand whether it meets requirements. And based on the measurements that they are testing, they may have to recommend a radically different approach, or admit the software is just not capable of doing what it was asked to do with confidence.
The Bottom Line
The key to testing the system is to understand both the requirements for the production results and the limitations of the algorithms. The requirements need to translate into objective measurements; ideally, the standard deviation of the mean result, assuming that the mean result is closely related to the actual result found in the training data. You need to be able to assess your results from a statistical standpoint, rather than a yes-no standpoint.
Don’t count on an exact right answer all of the time, or even most of the time. How you test, and how you evaluate, depends entirely on the goals of the system. For the nuts and bolts of testing, it is invaluable to have a platform such as the Intel Parallel Studio XE to both develop and test code and algorithms.
Now it’s easier than ever to write your code to run in parallel - Try Intel® Parallel Studio XE for free for 30 days