March 30, 2006

Web-scouring robots anchor Kapow

Kapow's screen-scraper integrates, assembles your portal information with data from other online sites

For the past few years, all the talk about Web 2.0 has centered on helping Web sites knit themselves together in sophisticated remixes. Although most of the hype concerns the use of JavaScript to add intelligence to the browser client, a quieter group has been adding this power to the central server.

Kapow’s Web Integration Platform version 6.0 is one of the best examples of these central-server solutions. The suite is a big, automatic screen scraper that assembles the information into a portal, aggregating information from many different sites in a way that makes it easy for users to absorb.

Although the hype about doing the work on the client with JavaScript is exciting, there will always be advantages to a central service. Kapow’s solution doesn’t need to be debugged on the wide variety of browsers and it can also integrate with databases to store past information and give pages some historical content.

Robot results
The Web Integration Platform could be a hit with big IT shops that build information portals for employees and clients. I’ve seen a number of cases where portal projects bog down because one division doesn’t want to open up its databases and systems. One simple, easy-to-use connection system would be wonderful, but that means getting all parts of a company to support this central vision.

Kapow’s solution avoids the politics by offering a system of code-capturing robots that operate at the lowest-common denominator: HTML-marked up text. These robots are experts at extracting information from internal and external Web pages, and usually do not require much cooperation from the source.

The central server schedules the robots and aggregates their results. If someone goes to a portal page, the server will fire up the right robots to clip the correct information before bundling it together. This information can be cached temporarily or stored in a database for a long-term view.

The robots are blessed with a sophisticated language for understanding HTML. If you’ve ever done any screen scraping, you’ve probably said things like, “I’m looking for the second row of the table nested inside the second row of the main table.” Kapow’s internal nomenclature takes care of that by imitating the JavaScript DOM; in this case, the answer is: html.body.table.tr[2].td.table.tr[2].

Most users won’t need to worry about this language because Kapow includes a sophisticated workstation for taking Web sites apart. After you provide the URL, the Kapow suite loads the Web site and displays it in a section of the RoboMaker UI. You can then start snipping and cutting from the site by pointing and clicking on the parts you want. The HTML and the language for extracting the HTML appears in a window alongside the Web site.

The robot instructions are at the top of the UI; they’re built with a fairly traditional visual language, and you can add loops and branches. The result looks like a standard flowchart, although there are many special features tuned to the nature of HTML -- one loop command, for instance, will extract all but the top row of a table.

Test Center Scorecard
30%30%15%15%10%
Kapow RoboSuite 6.088899
8.3
Very Good
Close

On Twitter now

Application development

Powered by Twitter

White Paper

D2D Virtual Tape Library Replication Primer

This whitepaper explains the terminology and concepts behind Data Replication technologies and establishes some sizing rules through worked examples. Learn the new paradigm in disaster tolerance—protect data anywhere.

Download now »

White Paper

An Alternative to Virtualization for Datacenter Cost Savings

Server virtualization is a popular option for dealing with mounting datacenter costs. Another equally promising approach is the use of an Application Delivery Controller. Citrix NetScaler provides a low-cost way for organizations to reduce their server count and accrue cost savings from a reduction in space, cooling, power and personnel.

Download now »

White Paper

Why Your Firewall, VPN, and IEEE 802.11i Aren't Enough to Protect Your Network

The emergence of WLANs has created a new breed of security threats to enterprise networks.

Included in HP ProCurve WLAN solutions is security technology that alleviates threats from WLANs through:
* Monitoring wireless activity inside and out of the enterprise
* Classifying WLAN transmissions into harmful and harmless
* Preventing transmissions that pose a security threat to the enterprise network
* Locating participating devices for physical remediation

Download now »

White Paper

Bringing the Edge to the Data Center

Effectively address data protection challenges, implementing solutions that help store and protect business–critical data while cutting costs and improving efficiency and reliability.

Download now »

Subscribe to the Developer World Newsletter

Receive a weekly roundup about the art and science of software development.

©1994-2009 Infoworld, Inc.