The PathProxy pattern: Persisting complex associations

PathProxy offers an easier way to persist complex relationships without so many lookup tables

PathProxy is a design pattern for persisting complex relationships without cluttering up your database. In this article JavaWorld contributor Matthew Tyson introduces his PathProxy pattern and walks you through an example application implementation based on Spring, JSF, and JPA/Hibernate.

When to use PathProxy

Use PathProxy if: You have many entities whose interrelationships are complex and require knowledge of other relationships. Creating explicit objects to represent these types of relationships becomes burdensome. This is especially true if the objects must be persisted, creating a proliferation of database tables.

Consider PathProxy if: Your system design calls for a number of classes whose sole or primary function is to model the relationships between other objects.

Using PathProxy is more complicated than using simple objects to represent relationships, so consider your particular situation. If you have few relationships to store and they are not too complicated, PathProxy may not be the right choice. On the other hand, once you reach a certain level of relationship complexity, using PathProxy will greatly simplify your overall system design. Being able to reuse the same mechanism over and over again is also a huge time-saver.

It is easy to persist simple object relationships in a database using the Java Persistence API, but what's a good way to store information about complex relationships in your system? When your class relationships require pathing knowledge, that is, knowledge about a number of related objects, then the standard "many-to" associations won't cut it. The PathProxy class is an abstraction of such relationships, allowing you to manage, persist, and extend them without complicating the classes themselves, and without a proliferation of lookup tables.

The fundamental idea is this: create a class that can point to any entity in the system, and that also can reference its own class as a parent. With this class, you create a tree structure that maintains the interrelationships outside of the referenced objects. Building a JPA mapping around this class requires some thought, but is quite powerful.

In this article I'll introduce the beginnings of a simple application -- a project tracking system -- built using JPA/Hibernate, Spring, and JSF. I'll then show you how how the PathProxy class handles the application's data structure, specifically mapping relationships between project managers, developers, projects, and tasks.

Path-specific relationships

The PathProxy solution applies in any situation where an entity can appear as an association of another entity type, but only for a specific path. I refer to such relationships as path specific. E is a child of D, but only for the path A-->B-->C-->D-->E. On another path, D might have no children (A-->B-->Q-->D) or might have a different child or children (A-->B-->X-->D-->Z).

As an example, imagine a development team consisting of a project manager named Johnie and two developers, Robert and Mukunda. On project A, Johnie leads Robert and on Project B, Johnie leads Mukunda. This is a somewhat contrived example, but not an uncommon scenario in the world of corporate structures. In the real world, you might have the efficiency of the same process in different business locations, or the actions taken in response to the same event by different teams.

For the purpose of this article, we'll stick with the simple example. The development team structure is shown in Figure 1.

Conceptual relation of project, manager, and developer
Figure 1. Conceptual relation of project, manager, and developer

Mapping the relationships

Before diving into how the PathProxy class would handle this structure, let's consider how we might approach the data modeling requirements for this scenario without it. We know that a project can have many managers, and vice versa, so we have a many-to-many relationship there. The same is true of the manager-to-developer relationship. So there are two many-to-many relationships that imply two cross-reference tables in the database, a project_manager table and a manager_developer table.

In the case of the manager_developer table, however, having just two columns (manager_id and developer_id) isn't enough: we need to know which project the manager and developer are working on. If we add a project_id column, we end up with a project_manager_developer table.

That might be fine. Or it might not.

Figure 2 -- an illegal hybrid of ERD and UML Object Diagrams, by the way -- shows the project_manager_developer table in action. What happens if we need to extend the hierarchy in either direction? That is, what if at some point we have to track projects as children of locations; or what if we need to track the tasks a developer is working on? We could easily end up with a project_manager_developer_task table.

Development structure in a lookup (cross-reference) table
Figure 2. Development structure in a lookup (cross-reference) table

The trouble with this sort of mapping is that it only gets more complex, requiring more tables and classes, as the project hierarchy evolves. Imagine what happens if we need to also have tasks on managers? Basically, every time we want to support having the task type show up under a specific path, we have to add a table that contains all the required data for that path. We could end up with a project_manager_ba_task table, and a project_manager_qa_task table, and -- you get the picture.

To summarize: if you use simple cross-reference tables (also called lookup tables or crosswalk tables), every time you add another possible path through your objects, you have to add another table, an object to map that table, and all the supporting code to deal with those additions.

Wouldn't it be nice to be able to handle all this pathing in a consistent way -- and all in one table? That's what the PathProxy pattern does.

1 2 3 4 5 6 7 Page 1
Page 1 of 7