SIP programming for the Java developer

Deliver SIP-based services to Java applications with SIP Servlet

Session Initiation Protocol (SIP) is a control (signaling) protocol developed by the Internet Engineering Task Force (IETF) to manage interactive multimedia IP sessions including IP telephony, presence, and instant messaging. The SIP Servlet Specification (Java Specification Request 116), developed through the Java Community Process, provides a standard Java API programming model for delivering SIP-based services. Derived from the popular Java servlet architecture of Java Platform, Enterprise Edition (Java EE is Sun's new name for J2EE), SIP Servlet brings Internet application development capabilities to SIP solutions.

IT and telecom are converging. Network-IT applications, typically data oriented, are merging with communication applications. The increasing number of Call Me buttons appearing on Webpages is an example of this integration. The SIP Servlet Specification brings a familiar programming model to Java developers for building converged applications. This article gives a step-by-step introduction on how to use SIP Servlet to build a simple echo chat service.

Session Initiation Protocol

Defined in Request for Comments 3261, SIP is a protocol for establishing, modifying, and terminating multimedia IP communication sessions. Figure 1 is a simple example of using SIP to establish a VoIP (voice-over Internet Protocol) call:

Figure 1. Typical SIP message flow in VoIP calls

All the white lines in Figure 1 represent the SIP communications. Caller sends a SIP INVITE request to invite the "callee" to establish a voice session. Callee first responds with a message that has a 180 status code to indicate the phone is ringing. As soon as phone is picked up, a response with a 200 status code is sent to the caller to accept the invitation. Caller confirms with an ACK message, and session is established. Once the session is established, the actual digitized voice conversation typically transmits via Realtime Transmission Protocol (RTP) with the session, as the red line in Figure 1 indicates. When the conversation ends, a SIP BYE request is sent, followed by a response with a 200 status code to confirm the session termination.

Here is an example of a SIP INVITE request and a response with a 200 OK status code:

 

SIP INVITE request: INVITE sip:callee@callee.com SIP/2.0 Via: SIP/2.0/UDP pc.caller.com;branch=z9hG4bK776asdhds Max-Forwards: 70 To: Callee <sip:callee@callee.com> From: Caller <sip:caler@caller.com>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314159 INVITE Contact: <sip:caller@pc.caller.com> Content-Type: application/sdp Content-Length: 142

(content (SDP) is not shown)

SIP 200 OK response:

 

SIP/2.0 200 OK Via: SIP/2.0/UDP pc.caller.com;branch=z9hG4bK776asdhds;received=192.0.2.1 To: Callee <sip:callee@callee.com>;tag=a6c85cf From: Caller <sip:caller@caller.com>;tag=1928301774 Call-ID: a84b4c76e66710 CSeq: 314159 INVITE Contact: <sip:callee@workstation.callee.com> Content-Type: application/sdp Content-Length: 131

(content (SDP) is not shown)

As you can see, the format of SIP resembles HTTP. However, when compared to HTTP, SIP is:

  • Responsible for session management. The actual multimedia content, such as instant messages, voice, and video, may or may not be transmitted via SIP.
  • Asynchronous and stateful. For each SIP request, there could be multiple responses. This means the application has to process each SIP message within a proper state context.
  • An application protocol that can run on both reliable and unreliable transport. Thus, the application must guarantee the message delivery with message retransmission and acknowledgement.
  • A peer-to-peer protocol where there is no clear distinction between client and server. Either side must be able to send and receive requests and responses.

SIP-based services

SIP-based services are SIP servers that offer services, such as message routing, to SIP endpoints, such as IP phones. For example, in Figure 2, the SIP registrar server and proxy server offer SIP registration and proxy services to help the SIP endpoints locate and communicate with each other.

Figure 2. SIP message routing with SIP registrar and proxy server. Click on thumbnail to view full-sized image.

Figure 2 illustrates the following:

  1. Callee registers itself to the registrar server by sending a REGISTER request.
  2. The registrar server accepts the registration, which contains the callee's name address, by responding with a 200 OK status code.
  3. Caller requests to establish a communication session with callee by sending an INVITE request to the proxy server. The INVITE message's content typically contains the description of the communication session the caller wants to establish, such as media type, security, or IP address. The description is typically in Session Description Protocol (SDP) format.
  4. The proxy server looks up the registrar server to find out the callee's current address. Note that lookup is an implementation issue not part of SIP.
  5. The proxy server forwards the INVITE request from caller to callee based on its current address.
  6. Callee accepts the invitation by responding with a 200 OK status code. The 200 OK response to an INVITE request typically contains the description of the communication session that callee can establish with the caller.
  7. The proxy server forwards a 200 OK response from callee to caller.
  8. Caller confirms the session establishment by sending an ACK message to the proxy server. The ACK message may contain the final agreement on the session.
  9. In turn, the proxy server forwards the ACK to the callee. Thus, the three-way handshake is completed via the proxy server, and a session is established.
  10. Now the communication between caller and callee happens. The protocol used for communication may or may not be SIP. For example, instant messages can be transmitted over SIP. Voice conversations are typically transmitted over RTP.
  11. Now, callee finishes the conversation and wishes to terminate the session by sending a BYE request.
  12. Caller responds with a 200 OK status code to accept session termination.

In the above scenario, the SIP proxy server simply routes the messages to the callee's current address. As you can imagine, more interesting and smart routing services can happen. For example, the proxy server can "follow a user" by routing the messages to where he can be reached, such as a cell phone, even if someone is calling on his office phone.

SIP Servlet

Defined in Java Specification Request 116, the SIP Servlet Specification provides a container-servlet programming model for SIP applications. Since it is derived from the Java servlet architecture in Java EE, JSR 116 brings a familiar approach to building SIP services to Java EE developers.

The table below summarizes the similarity between HTTPServlet and SIPServlet.

Comparison between an HTTP and SIP servlet

 HTTP SIP
Servlet classHttpServlet SipServlet
SessionHttpSession SipSession
Application packageWARSAR
Deployment descriptorweb.xmlsip.xml

Much like HTTP servlets, SIP servlets extend the javax.servlet.sip.SipServlet class, which in turn extends the javax.servlet.GenericServlet class. As you might have guessed, SipServlet overrides the service(ServletRequest request, ServletResponse response) method to handle different types of SIP messages.

Since SIP is asynchronous, only one of the request and response arguments in the service() method is valid; the other one is null. For example, if the incoming SIP message is a request, only the request is valid and the response is null, and vice versa. The default implementation of the SipServlet class dispatches requests to doXXX() methods and responses to doXXXResponse() methods with a single argument. For example, doInvite(SipServletRequest request) for a SIP invite request and doSuccessResponse(SipServletResponse response) for SIP 2xx class responses. Typically SIP servlets override doXXX() methods and/or doXXXResponse() methods to provide application logic.

How do you send SIP responses if there is no response object in the doXXX() methods? In SIP servlets, you must call one of the createResponse() methods in the javax.servlet.sip.SipServletRequest class to create a response object. Then, call the send() method on the response object to send the response.

How about creating a SIP request in a SIP servlet? There are two ways to create SIP requests: Call either one of the createRequest() methods on the SipSession class to create a SIP request within the session, or one of the createRequest() methods on javax.servlet.sip.SipFactory to create a SIP request within a new SipSession. To get an instance of SipFactory, you must call getAttribute("javax.servlet.sip.SipFactory") on the ServletContext class.

The SipFactory is a factory interface in the SIP Servlet API for creating various API abstractions, such as requests, address objects, and application sessions. One interesting object created by SipFactory is javax.servlet.sip.SipApplicationSession. The intention of JSR 116 is to create a unified servlet container that can run both an HTTP and a SIP servlet. SipApplicationSession provides a protocol-agnostic session object to store application data and correlate protocol-specific sessions, such as SipSession and HttpSession. Hopefully this concept will be adopted by future versions of the Servlet API to make it javax.servlet.ApplicationSession instead of javax.servlet.sip.SipApplicationSession.

The SipApplicationSession manages protocol-specific sessions like SipSession. The SipSession interface represents the point-to-point relationship between two SIP endpoints and roughly corresponds to a SIP dialog defined in Request for Comments 3261. SipSession is inherently more complicated than its HTTP counterpart due to SIP's asynchronous and unreliable nature mentioned above. For example, Figure 3 shows the SipSession state transitions defined in JSR 116:

Figure 3. State transitions in SipSession

Typically, an HttpSession is created when a user logs in and destroyed after logout. A SipSession typically represents one logical conversation, even if you have multiple conversations between the same endpoints. So SipSession is more dynamic and has a shorter lifespan.

More advanced discussions of the SipSession lifecycle and its relationship with SIP dialog reaches beyond this article's scope. Fortunately, the container handles most of the complexity, such as lifecycle and state transitions, and SipSession can simply be used as storage for session data.

A complete example: EchoServlet

The EchoServlet is a SIP servlet that can echo the instant messages you type in Windows Messenger:

1 2 Page 1
Page 1 of 2