Plotting the grid

The University of Pennsylvania's National Digital Mammogram Archive taps emerging open grid standards to exponentially boost breast cancer treatment

Hospitals and medical facilities everywhere continue to struggle to treat victims of breast cancer, hindered by the difficulty and expense of tracking and sharing case data, particularly traditional film-based X-rays. A group working out of the University of Pennsylvania in Philadelphia is combining high-powered computing with radiology to make treating breast cancer faster and more effective.

The foundation of an intricate heterogeneous grid-computing system called the National Digital Mammogram Archive (NDMA) has been laid through the efforts of the University of Pennsylvania, IBM, and Advanced Computing Technologies (ACT) at BWXT Y-12, a group at Tennessee's Oak Ridge National Laboratory. The NDMA allows North American hospitals to quickly and securely upload, download, and analyze enormous digital mammogram X-ray files, regardless of where they were performed.

Built around IBM's Globus Toolkit, an open grid-computing standard, using IBM eServers with DB2 Universal Database and exploiting the blazing content-delivery capabilities of Internet 2, the NDMA went live this year and now connects four facilities: hospitals at the University of Pennsylvania, University of Chicago, University of North Carolina at Chapel Hill, and the Sunnybrook and Women's College Hospital in Toronto. In the not-too-distant future, the highly scalable system could connect some 2,000 medical facilities to an enormous, invaluable collection of breast cancer records, resulting in countless opportunities for treating the disease.

The use of grid-computing addresses "the trick of making use of digital images, indexing them, and delivering them to hospital locations on demand," says Dr. Robert Hollebeek, director of the National Scalable Cluster Lab at the University of Pennsylvania and chief architect of the NDMA. "That's more of an on-demand content-delivery problem than an on-demand computing problem."

Devising a means of storing enormous quantities of data was one of the most significant challenges. An average digital mammogram case ranges from 100MB to 200MB, and thus far, 500 cases are in the archive. In the long run, the NDMA anticipates having to collect 28 petabytes one petabyte equals 1,024 terabytes of data per year. To that end, the grid is built on a three-tier architecture. Each participating medical facility has a portal, or "wall plug," of two IBM eServer xSeries systems; one serves as a temporary repository for the digital data, and the other is a link to the grid.

Data is collected via the mammogram-recording instruments with the push of a button. Once the data is loaded into the portal, it is transmitted to a metropolitan hub, an IBM eServer Cluster 1600 Unix system, for storage. When the NDMA is fully deployed, data from several metropolitan hubs will be funneled to a high-capacity regional hub.

With the system in place, doctors can retrieve, in mere seconds, any records in the system via a secure Web site. This allows a doctor to view a patient's past and current X-rays in moments, expediting diagnosis, and, if necessary, treatment.

Without digital technology and this type of system, a doctor has to track down the X-rays he or she seeks, contact the facility at which they're stored, secure permission to view them, and arrange for them to be mailed. "All that can be greatly facilitated by a digital mechanism, especially as we transition to digital mammogram machines," according to NDMA�s Hollebeek.

The project has cost $6 million so far, and the group is in the process of putting in a grant request from the National Library of Medicine and is exploring other funding opportunities.

Although the $6 million project may sound simple, members have faced a bevy of obstacles since starting it three years ago. The real challenge is "propagating it out to a bunch of hospitals while making it simple enough that they could use it," Hollebeek says.

Buy-in from medical facilities is integral to the project's success so the NDMA has made the portals relatively inexpensive $10,000 or less and simple to install and maintain. Participating facilities also must be using digital mammogram equipment, the trend in the medical industry. The potential ROI is high: The average hospital spends $4 million yearly just to develop X-ray films.

Thus far, the four installed portals have required no on-site intervention; they are all remotely managed from the University of Pennsylvania. "The end points have got to make their own assessments and report problems somehow. That is the only way to manage an end-point system," Hollebeek says.

Also easing adoption is the fact that Globus, as an open standard, is compatible with various operating systems, which "shifted the notion of management away from trying to ram a particular technology down someone's throat," according to Dave Turek, IBM's vice president of Linux cluster and grid solutions, in Poughkeepsie, N.Y.

Security concerns and complying with rigid HIPAA (Health Insurance Portability and Accountability Act) standards also proved integral to the project. ACT developed the secure Web front-end, access to which is controlled by secure devices and smart cards. "Instruments within the hospital can interact with the wall plug, provided they have the right kind of digital certificate, as well," Hollebeek says.

Another interesting challenge has been figuring out how to move large volumes of data long distance over the network. "What you would like to do is deliver a case to a doctor in a fraction of a second. We've managed to optimize network transmission protocols over these long lines," Hollebeek explains. "What you need to do is configure the system with large buffers, especially to transmit large volumes of information."

One of the techniques was to change the acknowledgements go back and forth between systems. "Every time you acknowledge a transmission, there's a stall," Hollebeek says. The greater the distance, the more noticeable that stall becomes.

The NDMA is still exploring ways to widen participation to thousands of medical facilities. "The systems we're running right now could easily handle a hundred hospitals. The question remains as to what it takes to handle thousands of hospitals. That is a study we're undertaking with IBM," Hollebeek says.

For the time being, the grid system is geared toward the sharing of storage resources, but processing resources also could be shared for crunching data. With time, project leaders envision adding algorithmic functionality to the system perhaps delivered similarly to a Web service for analyzing trends for individual patients or wide geographic areas.

Judging by the success of the NDMA, Turek envisions similar grid-computing opportunities in the e-business world. "Think about it: What this is saying ... is that you can get what you need, when you want it, without a tremendous amount of know-how, without regard to proximity of a data source or anything else. You can just attach it to the grid and get what you need."