| About InfoWorld : Advertise : Subscribe : Contact Us : Awards : Events : Store |
|
||||
|
|
||||
|
Reeling in the tiers By David L. Margulius May 16, 2002 1:01 pm PT CAN WEB APPLICATION monitoring technology keep pace with the evolution of Web applications? At first glance, the answer would seem to be no. The infrastructure behind Web applications keeps getting more complex, distributed, and asynchronous, making it tougher to pinpoint problems. And the proliferation of Web services and remote processes will only accelerate this trend.
Web applications are proliferating both inside and outside the firewall. From self-service portals to commerce applications, many critical business processes are now linked to some kind of Web application. Monitoring solutions have also evolved, with the emergence of load testing services based on distributed synthetic transactions -- from companies such as Mercury Interactive, Keynote Systems, BMC Software, and Empirix -- that augment traditional infrastructure component monitoring software from Hewlett-Packard OpenView, IBM Tivoli, and Computer Associates. But synthetic load testing, although good for measuring latency and availability, has fallen short on monitoring the functional integrity of enterprise Web applications. Without application and infrastructure-specific knowledge, these services can tell you how fast a page loaded, but not necessarily whether the page contained the right data or what caused a performance problem. Vendors in the space have therefore turned to a variety of new approaches to tackle key challenges and move performance monitoring into the era of n-tier and distributed computing. Challenge one: complexity Complexity is the first of these challenges. Today's Web applications are built on multiple tiers of resources and components (app servers, Web servers, databases, hardware, and network components), and failure often results from interactions and dependencies between these tiers. A J2EE (Java 2 Enterprise Edition) application, for example, can include Web pages, servlets, EJBs (Enterprise JavaBeans), and database calls, potentially running across hundreds of JVMs (Java virtual machines) on dozens of machines. In fact, Web apps have gotten so complex that the essence of QA -- functional and load testing as well as code profiling -- has moved into production environments. "The only place you can test an n-tier application is production, where you have all the data, all the services," explains Robert Wenig, CTO of San Francisco-based TeaLeaf Technology, one of a new generation of Web app monitoring companies. The upshot is that performance monitoring vendors are racing to do three things well. The first is to capture rich, drill-down performance data from each system component or tier (app server, database, or network). The second is to correlate all that data across those tiers to generate actionable insights into root causes of performance problems. And the third is to correlate that data with the user session as viewed through the browser, to make sure what the user experienced in an actual session is reflected in the diagnosis. At each tier, a different set of vendors claims to provide best-of-breed visibility into performance. At the application server tier, for example, companies like Wily and Precise Software on the J2EE side and Mutek Solutions on the .Net side are tackling instrumentation -- monitoring the application components and their interactions -- at the byte code and JVM level, taking advantage of late binding and native APIs to do dynamic code profiling and performance troubleshooting. When it comes to correlating data across all the tiers, companies such as Altaworks, TeaLeaf, and Mercury Interactive are trying to take this component level data and use various statistical techniques to "sessionize," or map, it to user experiences and transactions. They use terms such as "Bayesian probabilistic analysis" (to establish probable root causes), "dynamic baselining" (figuring out what's normal, what's not), and "time-shifted behavior pattern analysis" (to see noninstantaneous patterns) to describe what goes on inside their black boxes. "The challenge is closely connecting something that started as HTTP with a Java method invocation on the application server, through JDBC which opens a connection to the data base, then becomes a SQL call ... associating them all as a single logical transaction," says Ido Sarig, CTO of Mercury Interactive. "Why did it take 25 seconds vs. the three seconds I expected?" Correlation vendors also note the difficulties involved in recording complete user sessions through the browser (for later joining with component-side data), which requires a complete understanding of everything that's going on in a Web page, such as client side scripting, digital certificates, and heavy encryption. And they note the importance of assembling and visually interpreting the correlated data for different kinds of users, including developers and front-line operations people who are not necessarily statisticians. Yet even after all this, it's not guaranteed to work. "Doing the true correlation is where the industry is trying to go," says Richard Nikula, product architect at BMC Software. "The industry is taking baby steps now with assumptions." Nikula questions whether companies purporting to correlate front end and back-end data on specific transactions are mistakenly correlating separate instances of the various components of the transaction. The only way to properly do it, he claims, is by super-time stamping or tagging all elements of a session so they can be followed across the tiers, requiring lots of very low level traces, which are not yet in place. Companies must try to do it anyway, counters TeaLeaf's Wenig, pointing out that although application support has become mission critical, it is often fragmented across departments including operations, customer support, and development, where logs often mean nothing to one another. "Today support says, 'I can't reproduce your problem, go away, you're an idiot,' " Wenig says. "The Holy Grail is putting it all together." Challenge two: distributed systems The second hurdle for Web app performance monitoring is the distributed nature of application architectures, a situation that will only be exacerbated by Web services. Enterprise Web apps increasingly call on processes that live outside the IT department's domain, making it difficult or impossible to get baseline expected performance or QoS (quality of service) data. Web services or other remote processes may also be nested, calling other unmonitorable processes and compounding the problem. "You don't even get to control the tiers, much less the connection between the tiers," explains TeaLeaf's Wenig. "How many of those requests are going to respond in time, and how many of those responses will be valid?" And finally, the asynchronous or even stateless nature of these (often long-running) remote processes makes it all the more challenging to automatically discover what's supposed to be happening on the back end of an application, and consequently to monitor that application's performance. "Asynchronous creates more of a problem.... people get worried if things aren't moving," explains BMC's Nikula. "If there's a queue with 1,000 messages in it, for example, is that good or bad?" The answer depends on whether or not the monitoring application has a map of the business process and understands how it is supposed to flow. Several monitoring products include basic Web services diagnostic capabilities: validating the WSDL (Web Services Description Language) file and auto-generating a SOAP (Simple Object Access Protocol) test client. But there's currently no metadata protocol to enable them to acquire QoS, SLA (service-level agreement), or process-flow information. "Unfortunately I don't think monitoring has been addressed," says Flamenco Networks' CTO Dave Spicer, referring to current Web services standards efforts. "There's a layer missing." Flamenco, along with other Web services networks such as Grand Central and Actional, hopes to make up for this, managing the services and passing along QoS and process information to the monitoring apps. Another solution would be to build metadata into developing protocols such as WSFL (Web Services Flow Language), and give monitoring apps more visibility into expected performance (at least in terms of process flow) of long-running or asynchronous Web services processes. Will standards solve these problems? Tackling the complexity issue will require a much better exchange of performance monitoring data among monitoring apps (or services) in the different tiers of the Web app delivery architecture. Tackling the distribution issue will require a much better exchange of service-level information among providers of external processes and the monitoring applications (or services). In fact, to reach the nirvana of an autonomously self-diagnosing and self-healing ecosystem of Web applications, existing monitoring software and services will need a major makeover at all layers. And this is true regardless of whether you believe in a centralized or a peer-based model of monitoring in the future. Someday, self-monitoring may become a set of services, which consume best-of-breed instrumentation and deliver best-of-breed analytics, interpretation, alerts, and management capabilities. As a start, many monitoring companies today share their data via SNMP or the proprietary APIs of the entrenched management vendors (HP OpenView, Tivoli, CA). Some are also starting to deliver raw feeds of their data as a Web service. But this must be taken much further. Applications, components, and resources must be able to report trouble in a more structured and standardized manner than plain text in log files. The external (user) view of a complete session or transaction must be easier to join with the internal (component) view of that same process. Remote processes must be able to give one another a map containing expected performance and process-flow information, via a publish-and-subscribe or similar type of model. Relatively few proposals are on the drawing board to enable pieces of the puzzle. Relevant proposals and protocols include WSFL, JSRs 163 (Java Platform Profiling Architecture) and 174 (Monitoring and Management Specification for the Java Virtual Machine), and OMI (Open Management Initiative), for exchange of business process information. That will change as the stresses of complexity and distributed systems put more pressure on the existing monitoring approaches. Return to our Web application management package.
RELATED ARTICLES RELATED SUBJECTS SPONSORED WHITE PAPERS
SPONSORED LINKS
|
|||||||||||||||||||||||||||||||||||||||||
|
||||||||||