A pitfall is Java code that compiles fine but leads to erroneous, and sometimes disastrous, results. Avoiding pitfalls can save you hours of frustration. In this article, I will present a pitfall you might encounter when posting to a URL, and another that plagues Java beginners.
Pitfall 5: The hidden complexity of posting to a URL
As the Simple Object Access Protocol (SOAP) and other XML remote procedure calls (RPCs) continue to grow in popularity, posting to a URL will become a more common and more important operation -- it is the method for sending the SOAP or RPC request to the respective server.
While implementing a standalone SOAP server, I stumbled upon multiple pitfalls associated with posting to a URL, starting with the nonintuitive design of the URL-related classes and ending with specific usability pitfalls in the URLConnection
class.
A simple HttpClient
class would be the most direct way to perform an HTTP post operation on a URL, but after scanning the java.net package
, you'll come up empty. Some open source HTTP clients are available, but I have not tested them. (If you have tested those clients, drop me an email regarding their utility and stability.) Interestingly, there is an HttpClient
in the sun.net.www.http
package that is shipped with the JDK (and used by HttpURLConnection
), but it is not part of the public API. Instead, the java.net
URL classes were designed to be extremely generic and take advantage of dynamic class-loading of both protocols and content handlers. Before we jump into the specific problems with posting, let's examine the overall structure of the classes we will use (either directly or indirectly).
This UML diagram of the URL-related classes in the java.net
package illustrates the classes' interrelatedness. (The diagram was created with ArgoUML
-- see Resources for a link.) For brevity's sake, the diagram shows only key methods and no data members.
Pitfall 5 centers on the main class: URLConnection
. However, you cannot instantiate that class directly -- it is abstract. Instead, you will receive a reference to a specific subclass of URLConnection
via the URL
class.
Admittedly, the figure above is complex. The general sequence of events works like this: A static URL commonly specifies the location of some content and the protocol needed to access it. The first time the URL
class is used, a URLStreamHandlerFactory
singleton is created. That factory generates an URLStreamHandler
that understands the access protocol specified in the URL. The URLStreamHandler
instantiates the appropriate URLConnection
class, which opens a connection to the URL and instantiates the appropriate ContentHandler
to handle the content at the URL.
So what is the problem? Because of the classes' overly generic design, they lack a clear conceptual model. In his book, The Design of Everyday Things (Doubleday, 1990), Donald Norman states that one of the primary principles of good design is a sound conceptual model that allows us to "predict the effects of our actions." Some problems with the URL
classes' conceptual model include:
- The
URL
class is conceptually overloaded. A URL is merely an abstraction for an address or an endpoint. In fact, a better design would feature URL subclasses that differentiate static resources from dynamic services. Missing conceptually is aURLClient
class that uses the URL as the endpoint to read from or write to. - The
URL
class is biased toward retrieving data from a URL. There are three methods that retrieve content from a URL, but only one that writes data to a URL. That disparity would be better served with a URL subclass for static resources that only has a read operation; the URL subclass for dynamic services would have both read and write methods. That design would provide a clean conceptual model for use. - Calling the protocol handlers "stream" handlers is confusing because their primary purpose is to generate (or build) a connection. A better model would emulate the Java API for XML Processing (JAXP), where a
DocumentBuilderFactory
produces aDocumentBuilder
, which produces aDocument
. Applying that model to the URL classes would yield aURLConnectorFactory
that generates aURLConnector
that produces aURLConnection
.
Now you are ready to tackle the URLConnection
class and attempt to post to a URL. The goal is to create a simple Java program that posts some text to a common gateway interface (CGI) program. To test the programs, I created a simple CGI program in C that echoes (in an HTML wrapper) whatever passes into it. (See Resources to download the source code for any program in this article, including the CGI program.)
The URLConnection
class has getOutputStream()
and getInputStream()
methods, just like the Socket
class. Based on that similarity, you would expect that sending data to a URL would be as easy as writing data to a Socket
. Armed with that information and an understanding of the HTTP protocol, we write the program in Listing 5.1, BadURLPost.java
.
Listing 5.1 BadURLPost.java
package com.javaworld.jpitfalls.article3; import java.net.*; import java.io.*; public class BadURLPost { public static void main(String args[]) { // get an HTTP connection to POST to if (args.length < 1) { System.out.println("USAGE: java GOV.dia.mditds.util.BadURLPost url"); System.exit(1); } try { // get the url as a string String surl = args[0]; URL url = new URL(surl); URLConnection con = url.openConnection(); System.out.println("Received a : " + con.getClass().getName()); con.setDoInput(true); con.setDoOutput(true); con.setUseCaches(false); String msg = "Hi HTTP SERVER! Just a quick hello!"; con.setRequestProperty("CONTENT_LENGTH", "5"); // Not checked con.setRequestProperty("Stupid", "Nonsense"); System.out.println("Getting an input stream..."); InputStream is = con.getInputStream(); System.out.println("Getting an output stream..."); OutputStream os = con.getOutputStream(); /* con.setRequestProperty("CONTENT_LENGTH", "" + msg.length()); Illegal access error - can't reset method. */ OutputStreamWriter osw = new OutputStreamWriter(os); osw.write(msg); osw.flush(); osw.close(); System.out.println("After flushing output stream. "); // any response? InputStreamReader isr = new InputStreamReader(is); BufferedReader br = new BufferedReader(isr); String line = null; while ( (line = br.readLine()) != null) { System.out.println("line: " + line); } } catch (Throwable t) { t.printStackTrace(); } } }
A run of Listing 5.1 produces:
E:\classes\com\javaworld\jpitfalls\article3>java -Djava.compiler=NONE com.javaworld.jpitfalls.article3.BadURLPost http://localhost/cgi-bin/echocgi.exe Received a : sun.net.www.protocol.http.HttpURLConnection Getting an input stream... Getting an output stream... java.net.ProtocolException: Can't reset method: already connected at java.net.HttpURLConnection.setRequestMethod(HttpURLConnection.java:10 2) at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLCo nnection.java:349) at com.javaworld.jpitfalls.article2.BadURLPost.main(BadURLPost.java:38)
When we try to obtain the HttpURLConnection
class's output stream, the program informs us that we cannot reset the method because we are already connected. The Javadoc for the HttpURLConnection
class contains no reference to setting a method. The program is referring to the HTTP method, which should be POST
when we send data to the URL and GET
when we retrieve data from the URL.
The getOutputStream()
method causes the program to throw a ProtocolException
with the error message "Can't reset the method." The JDK source code reveals that the error message results because the getInputStream()
method has the side effect of sending the request (whose default request method is GET
) to the Web server. This is similar to a side effect in the ObjectInputStream
and ObjectOutputStream
constructors, detailed in my book, Java Pitfalls: Time Saving Solutions and Workarounds to Improve Programs (John Wiley & Sons, 2000).
The pitfall is the assumption that the getInputStream()
and getOutputStream()
methods behave just as they do for a Socket
connection. Since the underlying mechanism for communicating to the Web server actually is a Socket
, it is not an unreasonable assumption. A better implementation of HttpURLConnection
would postpone the side effects until the initial read or write to the respective input or output stream. You can do that by creating an HttpInputStream
and an HttpOutputStream
, which would keep the Socket
model intact. You could argue that HTTP is a request/response stateless protocol, and the Socket
model does not fit. Nevertheless, the API should fit the conceptual model; if the current model is identical to a Socket
connection, it should behave as such. If it does not, you have stretched the bounds of abstraction too far.
In addition to the error message, there are two problems with the above code:
- The
setRequestProperty()
method parameters are not checked, which we demonstrate by setting a property called stupid with a value of nonsense. Since those properties actually go into the HTTP request and are not validated by the method (as they should be), you must take extra care to ensure that the parameter names and values are correct. - Although the code is commented out, it is also illegal to attempt to set a request property after obtaining an input or output stream. The documentation for
URLConnection
indicates the sequence to set up a connection, although it does not state that it is a mandatory sequence.
If we did not have the luxury of examining the source code -- which should definitely not be a requirement to use an API -- we would be reduced to trial and error, the absolute worst way to program. Neither the documentation nor the API of the HttpURLConnection
class afford us any understanding of how the protocol is implemented, so we feebly attempt to reverse the order of calls to getInputStream()
and getOutputStream()
. Listing 5.2, BadURLPost1.java
, is an abbreviated version of that program.
Listing 5.2 BadURLPost1.java
package com.javaworld.jpitfalls.article3; import java.net.*; import java.io.*; public class BadURLPost1 { public static void main(String args[]) { // ... try { // ... System.out.println("Getting an output stream..."); OutputStream os = con.getOutputStream(); System.out.println("Getting an input stream..."); InputStream is = con.getInputStream(); // ... } catch (Throwable t) { t.printStackTrace(); } } }
A run of Listing 5.2 produces:
E:\classes\com\javaworld\jpitfalls\article3>java -Djava.compiler=NONE com.javaworld.jpitfalls.article3.BadURLPost1 http://localhost/cgi-bin/echocgi.exe Received a : sun.net.www.protocol.http.HttpURLConnection Getting an output stream... Getting an input stream... After flushing output stream. line: <HEAD> line: <TITLE> Echo CGI program </TITLE> line: </HEAD> line: <BODY BGCOLOR='#ebebeb'><CENTER> line: <H2> Echo </H2> line: </CENTER> line: No content! ERROR! line: </BODY> line: </HTML>
Although the program compiles and runs, the CGI program reports that no data was sent! Why? The side effects of getInputStream()
bite us again, causing the POST
request to be sent before anything is placed in the post's output buffer, thus sending an empty POST
request.
After failing twice, we understand that getInputStream()
is the key method that actually writes the requests to the server. Therefore we must perform the operations serially (open output, write, open input, read) as we do in Listing 5.3, GoodURLPost
.
Listing 5.3 GoodURLPost.java