Troubleshooting Java enterprise application performance

Tuning web service applications for performance and memory utilization

Roughly 40 percent of all IT projects surveyed in a September 2000 Gartner Group study failed to meet business requirements. The average cancelled IT project was scheduled to last 27 weeks and terminated on week 14, resulting in an estimated $1 million loss per year. While much has changed in the last decade, we have seen the need for code reviews of business applications increase significantly. Industry competition has compressed the average development cycle, especially for web applications. As a consequence, code issues are often not detected, or they're ignored or expected to be addressed at a later time, after the application has gone into production.

A typical code review documents about 150 lines of code per hour for web-based projects. Inspecting and reviewing more than a few hundred lines of code per hour risks overlooking critical errors. Industry data indicates that code reviews can accomplish at most an 85 percent defect removal rate, with an average rate of about 65 percent. The rest is up to you.

In this article we address frequently occurring issues that can snowball into performance bottlenecks in enterprise Java applications. Our observations are based on developing, inspecting, and reviewing hundreds of enterprise applications. For the purpose of this article, we developed and studied a sample Java web services application, which allowed us to assess the impact of common code-based issues on application performance. In our recommendations we focus on techniques to catch potential problems early and performance-tune code during the development cycle, before the application is live.

About the test application

Our test application is used to retrieve medical data stored in an RDBMS and represent it to health services users. The application is based on an n- tier Java EE architecture where the business logic is implemented across mediators and managers in the business layer. Application components are deployed in a Java EE application server as a WAR file archive. The diagram in Figure 1 depicts a layered view of the application.

Figure 1. Layered view of the example application

A diagram of the three-layer architecture of the example application.
Application tiers

The web services tier exposes SOAP-based web services. Clients access the web services via HTTPS. The interface is exposed as a WSDL URL. The client gets the URL and creates a client stub to access the web service using JAX-WS. The persistance tier is an RDBMS used as a persistence store.

The middle tier validates and applies business rules. It consists of the following components:

  • Service manager acts as a controller and communicates with different business manager classes. Examples include the authentication and authorization manager, user manager, vital data manager, and son on.
  • Business logic layer validates and applies business rules to process user data. This layer interacts with a data access layer interface for further data transactions.
  • Data access layer uses an ORM framework (JPA) to interact with the persistence tier to store application data in a database. The data access layer also exports stored data to other systems.

Initial observation and performance review

We documented the following observations during our initial review of the test application:

  • Based on a J2EE model using an MVC framework.
  • Timeouts during web service calls ensure quick response. If a timeout occurs, the user receives an error message.
  • Domain data is cached at the session level except the reference domain, which is cached at the global level
  • Memory constraints and CPU constraints exist in the production environment.
  • Application fulfiller utilizes a maximum CPU of 95 percent, which is very high. Note the need to examine the query size and data marshaling size in order to keep the CPU utilization to around 80%.
  • Load balancing is not done on the web server.
  • No session data is maintained.
  • No cache management framework is implemented. Caching is not incorporated in DB or flex coding.
  • Database is load balanced only in the production environment.
  • No build or profiling tools used.

Initial performance review

An initial performance review of the application resulted in the following observations:

  • Many Small Web Format and Adobe Flash files loaded, all without caching.
  • Evidence that web service calls might be marshaling/un-marshaling significant XML data.
  • The HTTP content GZIP does not appear to be enabled.
  • Windows is at 100% CPU utilization. For 25 users that load would be too high.
  • No failed transactions: application functionality appears to be working albeit with bad response times.
  • Check memory utilization and heap allocation. Smaller heap sizes can lead to frequent garbage collection, resulting in bad response times.

In the next sections we summarize our findings and recommend steps to performance-tune Java enterprise applications for common conditions.

Thread safety of JAX-WS proxy instances

In our test application JAX-WS proxy instances shared by the application web services were not synchronized. Every Service object was instantiated on server startup and each service accessed a new JAX-WS proxy instance, which are not guaranteed to be thread safe (see Resources).

The performance impact was a 30-minute delay in the application's response time as the default proxy instances were not able to support concurrent user requests. We also noted a delay in response during application startup and initialization, on server startup, and also when the application was idle for few minutes.

JAX-WS proxy clients are usually instantiated and used in three separate phases, as shown here (note that is a best practice to minimize the first phase because it is time consuming):

myService svc = new MyService();  // 1
myPort port = svc.getMyPort();    // 2 
myResult = port.dosomething();    // 3


  • As long as no two threads use the same port instance (proxy instance) at the same time, there is no issue.
  • The getPort () method instantiates the proxy object, which is not considered thread safe. In the absence of synchronization, create a separate proxy instance (the output of phase 2) for each running thread.
  • If ports are too expensive to create, they can be pooled (by application code) for reuse across threads.
  • Another way to enhance performance is to keep the WSDL locally with the client so it is not retrieved across the network every time a new service is up (see phase 1).


In this case we recommend modifying the code to create a separate port instance for each thread (Service), thus ensuring JAX-WS (port/proxy instance) thread safety.

XML web services performance in .Net

We noted a delay in response time on our test application's first invocation of the web service on .Net. We realized this was because the .Net infrastructure provides XML serialization to read from and write to an XML document. By default, at runtime, .Net generates a serializer class on-the-fly, resulting in a performance penalty. If a compiler is not available in production (not installed, or disabled to run by policy), the application will simply fail.


To address the performance penalty we recommend explicitly generating serialization assemblies and shipping them along with application to improve cold-start performance. For example, you could use an XML Serializer Generator to create an XML serialization assembly for types in a specified assembly. This would improve the startup performance of an XmlSerializer when serializing or deserializing objects of the specified types. Listing 2 is a sample of code used to pre-generate XML serialization.

Listing 2. Pre-generating XML serialization

using System;
using System.IO;
using System.Xml.Serialization;
      namespace XmlSerializerTest
      public class TestXmlRoot
         class Program
        static void Main(string[] args)
         // create strongly typed content
          TestXmlRoot root = new TestXmlRoot();
         // create serializer
         var serializer = new XmlSerializer(
          // serialize
        var ms = new MemoryStream();
          serializer.Serialize(ms, root);
         // verify serialized XML
         ms.Position = 0;
         new StreamReader(ms).ReadToEnd());

Bulk web service response handling

As part of our test application's data transfer, bulk data was transferred as web service response to each user. During our code review we noted that multiple users sending simultaneous web service requests would lead to a huge response payload and network bandwidth consumption. Users would be left hanging for a response.


Measure the payload of your application's web service responses. If it is greater than 10 KB, consider compressing the responses. If the response data is not in binary format, compress it as a GZIP. This will decrease the size of the data returned and increase throughput.

Element and attribute name lengths in XML

SOAP XML response nodes names are descriptive, which sometimes leads to being lengthy. Processing large XML documents can cause high CPU, memory, and bandwidth utilization. These issues get magnified if users access the documents over low-bandwidth networks, or if many users connect to the application at once.


First, it is essential that the web service provider and consumer agree to the naming convention and standards of XSD/XML. Next, consider shortening the length of the element names and attribute names. Both are included as metadata in the XML documents, so the length of an element or attribute name affects the document size. Balance size issues with ease of human interpretation and future maintenance. Try to use attribute and element names that are short and meaningful.

Marshaling and un-marshaling web service response

We noted that our test application was reading SOAP responses, converting them into POJOs, and these POJOs were then converted to JSON formats. Marshaling and un-marshaling web service responses into different formats caused a delay in the application's overall response time.


Set the web service response format to JSON. Include security attributes in the response. Using the JSON format will result in less payload. You can also modify the JSON objects using parsers and/or JavaScript at the web service client, bypassing the additional step of un-marshaling the SOAP response to schema objects.

JPA sequence pre-allocation optimizes the application

If the Sequence pre-allocation is not done in the code (for instance if the JPA allocationsize is not set) then you must query the database to retrieve the sequence value using sequence.nextvalue. You must then write another query to insert a new row to persist an object. Multiple select queries for SequenceID and insert statements can degrade application performance.


Use sequence pre-allocation to avoid multiple query executions (such as one to retrieve the sequence.nextvalue and another to insert the new row) and optimize performance. Setting allocationSize=N means: Go and fetch the next value from the database once in every N persistence calls. Locally increment the value by 1 in between.

Instead of writing

@GeneratedValue(strategy = GenerationType.SEQUENCE, generator= "idSeq")
@SequenceGenerator(name="idSeq”, sequenceName="GIVER_SEQ")

We recommend

@GeneratedValue(strategy=GenerationType.SEQUENCE, generator="idSeq")      @SequenceGenerator(name="idSeq",sequenceName="GIVER_SEQ",allocationSize=100)

Setting the allocationSize to 100 means that the first 1 to 100 sequence values will be fetched and locally incremented. The value and next sequence value will be fetched from the database on the 101 persistence call.

Duplicate sequence key

Our code review found a duplicate generator idSequence. This was because in the database table a SubjectID field is assigned a Sequence value, which is duplicated in the Java code by the @sequenceGenerator for SubjectID.


Avoid the duplicate sequence key by restricting sequence key generation either through code or the database table definition.

Inefficient use of keySet Iterator

The following code snippet shows the value of a Map entry being accessed using a key that was retrieved from a keySet iterator. Note that at each iteration a map needs to lookup the value for a given key, which is ultimately inefficient.

for (Iterator it = myMap.keySet().iterator(); it.hasNext();) {    
  Object key =;                                
   Object value = myMap.get(key);                                    
// do something with the key and the value                      }

Getting a value from a key (the third line in the sample above) can be time-consuming, which would negatively impact application performance.


A more efficient approach is to use an iterator on the entrySet of the map, as shown here:

for (Iterator it = map.entrySet().iterator(); it.hasNext();) {
   Map.Entry entry = (Map.Entry);
   Object key = entry.getKey();
   Object value = entry.getValue();
   // do something with the key and the value

Nested entity objects

Too many DB table queries are executed to populate each Entity object with a single column. In some cases single-attribute entities are included as child attributes to other entity objects, which increases the size of the nested entity object. Here is an example:

1 2 Page 1
Page 1 of 2