Is that code really yours?

Black Duck protexIP helps protect against license violations, but detection errors limit its usefulness

As open source software pushes its way further into the enterprise, a new set of risks has arisen regarding IP (intellectual property). The problem is that developers happily borrow code from various projects to save themselves from having to reinvent it. This help is all well and good as long as the resulting software complies with the licenses of the donor projects. The problem managers have is that they cannot know what parts of their code base comes from open source projects. A code snippet reused from a newsgroup posting could actually have come from a copyrighted open source project. And its use could legally require the company to open source its entire product. If the company is an ISV, it might even be faced with being required to offer its product at no cost.

Until recently, managers had to rely on their developers to avoid this problem. Now they can automate the process of checking their code with protexIP 4.0 from Black Duck Software. The product compares in-house code with many code sources, such as open source projects, and reports on matches it finds. A supporting component enables managers and legal counsel to approve the use of specific open source licenses for borrowed code. The solution manages the license agreements and provides a bill of materials that shows the company’s obligations for the open source it uses. The license management portion of the product is robust and well-designed. The code analysis and identification, however, leave much to be desired.

Prepping for flight

Black Duck offers protexIP in two basic flavors. One version, protexIP/developer, resides locally at the customer site, with two separate editions available (Enterprise and Professional), differing in functionality. I reviewed Enterprise, the higher-end version of this product. The second version, called protexIP/on-demand, is a hosted edition of the same software that is typically used by IP attorneys and acquisition specialists who need to verify the provenance of software they’re examining. Due to the size of the open source database, the on-site installation tends to require its own server. This server runs only on Linux.

In all cases, the client software is Java-based, so it runs on many platforms. To evaluate a code base, you marshal the code into a directory and point protexIP there. Unfortunately, protexIP does not integrate with source-code management systems, although an SDK enables developers to write interfaces should they wish to.

The software can analyze code in many programming languages and even compare binaries with known open products. It does this analysis by creating fingerprints of the source code and comparing them to the database of code prints the company has developed over the years. It then returns a summary of its findings (see screen image) in which it identifies files as being either green (no problem), yellow (awaiting identification), blue (pending approval), or red (definite problem). These colors refer to protexIP’s view of how tolerable the applicable licensing terms are to a given site. For example, the Apache license might be acceptable to many sites, whereas the viral provisions of the GPL (General Public License) might lead some companies to preclude its use. A screen used by managers or legal counsel enables approvals to be set for every kind of open source software license requirements and thereby enable protexIP to raise a warning if a match on GPL code is found, for example.

10TCblackduck.gif
Click for larger view.

The solution also flags situations in which licenses require conflicting actions from the user. To do this, it relies on a database of more than 650 open source licenses in which it has logged all requirements of the terms of use. This license management works well and will certainly help managers who rely on elements from open source projects know what their responsibilities are.

Duck and cover
I found many aspects of the code detection and reporting to be problematic. When I gave the program some code from Jetty, a high-profile open source Web server, protexIP recognized only a few of the files. A few misses would arguably be acceptable. Unfortunately, protexIP found that 64 percent of the files I submitted from the project were clean and did not match anything in its code base. I asked Black Duck Software to run these files for me, which it did, returning the same results. I presume this is because the database is terribly out of date. With such a high rate of false negatives, managers could easily ship open source code unknowingly. Code from other projects I scanned scored higher totals, but my overall impression was that protexIP does not deliver sufficiently on its central promise of vetting code pedigree.

I then examined one of the Jetty files that protexIP did match to see what it found. The results (shown in the screen image, but hard to make out) reported a match with a Jetty file. Two enigmatic figures appeared: 9% and line 1087. Normally, I would assume these numbers referred to the percentage of matching code and the starting line number of the match. But given that my file contained less than 800 lines and was a known 100 percent match, I assumed these columns had another meaning. Unfortunately, the online help system and the manual were of no use. Nowhere are these figures explained. To find out, I was told to open an incident with tech support. (My guess at their meaning was correct. The data was fallacious.)

Taken as a whole, protexIP represents an original solution that could be useful if it were implemented well. The one satisfactory component, the license manager, is badly undercut by the software’s inability to detect the presence of external source code correctly. Between these detection miscues, the insufficient documentation, and rough edges of the interface, it is clear the product needs to progress a fair bit before it can be recommended for adoption.

InfoWorld Scorecard
Value (10.0%)
Documentation (15.0%)
Accuracy (40.0%)
Management (15.0%)
Ease of use (20.0%)
Overall Score (100%)
Black Duck protexIP/development 4.0 6.0 6.0 5.0 8.0 6.0 5.9
From CIO: 8 Free Online Courses to Grow Your Tech Skills
Join the discussion
Be the first to comment on this article. Our Commenting Policies