You are hereWhat Is a Service?
What Is a Service?
A service is a well-defined unit of work performed by a component and packaged for easy access. The idea is to implement the functionality exactly once, do it well, and then make it widely accessible. The extra effort involved in initially creating the service (and there is always extra effort) is paid for by avoiding future development costs either through the direct reuse of the service or by isolating the service user from ongoing internal implementation changes within the service.
The tasks performed by a service are its operations. You can think of a service as a bundle of closely related operations (Figure 3-1). The simplest operations are, in the mathematical sense, pure functions—they take inputs and produce outputs that are solely based on the inputs. For example, they take two numbers and add or subtract them.
Figure 3-1. Basic Service
However, addition and subtraction are so simple that they are actually not good candidates for service operations—at least when the service user is a computer. This becomes apparent when you realize that the operation is actually being performed by another component with communications involved (Figure 3-2). In the case of addition and subtraction, the overhead of communicating with the service provider to invoke these operations is so much greater than the service user's cost of executing the operations that it just doesn't make sense. So one of the tradeoffs in deciding what ought to be a service begins to emerge: the granularity of the work versus the communications costs.
Figure 3-2. Simple Service Utilization Example
Service operations that are pure mathematical functions do exist, but they are not common. Two examples that often begin their lives as purely functional concepts are data transformations and rule evaluations. A data transformation takes one data structure and reorganizes its data elements into another data structure. A purely functional business rule takes in one or more data structures and generates output structures containing conclusions about the input data.
In reality, data transformations and rule evaluations commonly require some form of reference data. Data transformations frequently perform lookups and substitutions on the data. One system's identifiers in the input data structure need to be replaced with another system's identifiers in the output data structure. To support this, a cross-reference table is needed that maps one system's identifiers to the other system's identifiers. Data transformations often convert monetary values given in one currency to values in another currency. To support this, a currency exchange table is needed.
Rules often require reference data as well. Consider the rules governing the access to business banking accounts. A business rule might state, "Permission to perform operations on this account is granted if the person requesting the operation is either the person owning the account or a superior of that person." Executing this rule obviously requires some information about the placement of individuals within an organization structure. This is reference data.
To keep an operation in its purely functional form, the user of the operation would have to access the reference data prior to invoking the operation. The user would have to look up the other system's identifiers, map dollars to euros, and explore the organizational structure of the people involved in the transaction. Logically, however, these additional tasks belong with the operation. The operation now evolves beyond being a simple mathematical function to include the use of reference data.
The idea that operations can reference things goes beyond simply referencing data. Some service operations may reference physical objects as well: an individual person, an automobile, an aircraft, a package, or a building. In general, many service operations will reference objects in the process of performing their operations (Figure 3-3). But these objects are not part of the service. The service does not manage these objects—it only references them. Thus, these objects lie outside the scope of the service.
Figure 3-3. A Service with Referenced Objects
Some services exist specifically to manage objects. In fact, the core functionality of most business processes revolves around the management of objects such as sales orders, shipments, and warehouse inventory. If you want to assemble such business processes from services, then you need to organize services that manage objects. Management implies complete control, and the only way to guarantee complete control over an object is for the service to own it. Now you have services that own some objects and merely reference others. You need to clearly distinguish between the objects that the service owns and manages, and those objects that it merely references (Figure 3-4). And herein lies one of the major challenges in service design—clearly establishing the ownership of objects.
Figure 3-4. Structure of a Service with Owned Objects and Referenced Object
Consider an SOA approach to building an order-to-cash business process. Figure 3-5 shows some initial concepts for some of the services you might want. In this conceptualization, the Sales Order Service manages the information specifically related to individual sales orders, the Product Information Service manages the information about the products being offered, and the Customer Information Service keeps track of information about the individual customers. So far, so good.
Now take a look at the inherent structure of the data (Figure 3-6). Some of the classes clearly belong to a specific service. Sales Order and Sales Order Line Item obviously belong to the Sales Order Service. Likewise, Product belongs to the Product Information Service and Customer belongs to the Customer Information Service. But what about Address and Phone? Even more challenging, what about the relationship between Sales Order and Customer, and between Sales Order Line Item and Product? Which services own these?
Figure 3-6. Information Related to Sales Orders
Such questions are the crux of service design and comprise a major challenge in evolving from a design centered around a single monolithic database to a design involving distributed services, each managing a subset of the data. The ownership of Address and Phone is fairly straightforward. As a class (i.e., a concept), neither is uniquely owned by any specific service. Instead, each service owns specific instances of these classes that play particular roles with respect to the other objects that are owned by the service. For example, the Sales Order Service owns the shippingAddress, billingAddress, and contactPhone, while the Customer Information Service owns the homeAddress, workAddress, defaultShippingAddress, defaultBillingAddress, homePhone, workPhone, and cellPhone. In fact, each service may have different representations for these objects, although there are benefits to maintaining a uniform representation of these objects across the services (i.e., a common data model), particularly in their interfaces.
Determining the ownership of relationships presents more of a challenge. Defining a relationship requires referencing the objects at both ends of the relationship. How do you represent a relationship when one of these objects is owned by one service and the other by another service? The Sales Order Service needs to know who the customer is for the order, but it is not the owner of the Customer object. How exactly do you do that?
One approach is to add some of the needed customer data to the Sales Order Service and have this service manage this customer data. However, now you've lost the advantage of having a Customer Information Service that manages all of the customer data. How do you update customer data and keep it consistent when there is customer data present in more than one service? This is exactly the type of problem you are seeking to avoid with a SOA.
Another approach is to keep a bare-bones reference to the customer—just the customerID—in the Sales Order Service. This is simpler and less risky, for identifiers are generally not edited after they have been created. However, this approach presents its own issues. First, if you want to print out or display the order with the customer name or other customer information, the Sales Order Service has to go to the Customer Information Service to obtain the information. From a performance perspective, the delays in such retrieval may not be acceptable, and it may not be practical for the Customer Information Service to support this type of query load. Second, you won't be able to delete a customer from the Customer Information Service without first determining whether there are any remaining references to that customer (i.e., instances of the customerID) in any other service. Otherwise, deleting the Customer would leave invalid references in the other services. Even worse, recycling the customerID would actually leave old Sales Orders referencing the wrong customer.
Don't get the wrong impression here. These issues do have eminently practical resolutions. The point is that you have to dig down into the structure and ownership of the data (and other managed objects) in order to clearly define the scope of each service. As you dig and define ownership, your concept for each service is likely to evolve. Continuing this example will illustrate the evolution.
Assume that, for performance reasons, the Sales Order Service must have the customer's name readily available. Also for performance reasons, it must have the product name available. Figure 3-7 shows the resulting logical data model for the Sales Order Service. Note that you need an understanding of the service's intended utilization to conclude that this information needs to be present in the Sales Order Service!
Figure 3-7. Logical Data Model for Sales Order Service
This data model clearly resolves the issue of owning the relationships: The relationship between the Sales Order and the Customer is owned by the Sales Order Service, as is the relationship between Sales Order Line Item and Product. But now you are left with copies of this data in the Sales Order Service—data for which the service is not the owner. How do you manage this data?
Managing Cached Information
The solution for maintaining the accuracy of replicated data is to treat it as a cached copy of the original. This involves establishing a single service as the system-of-record for the original data and adding subscription operations to notify others of changes to the original data. Services that maintain copies of the information subscribe to this service and update their local copies when the data changes.
Figure 3-8 shows the modified conceptualization of the services considered earlier. The Sales Order Service (the service containing the cached information) subscribes to the change notification service of each of the other services for which it holds cached data. The Product Information Service provides a subscribeToProductChangeNotification subscription operation and a corresponding productInformationHasChanged notification operation. The Customer Information Service provides the subscribeToCustomerChangeNotification and customerInformationHasChanged operations.
Figure 3-8. Extended Services
The placement of the productInformationHasChanged and customerInformationHasChanged operations may seem a bit strange at first. These are the operations that are called by the subscription service when changes occur. Intuitively, you would expect these operations to be on the service that receives the change notification (i.e., the Sales Order Service), since it is the one responding to the notifications. However, it is the subscription service itself that actually defines the operation interface, including the data that is passed when it is called. For this reason, the operation is shown as part of the subscription service. These notification operations are examples of the proposed WSDL Out-Only message exchange pattern.
 W3C (World Wide Web Consortium). March 2004. "Web Services Description Language (WSDL) Version 2.0 Part 2: Message Exchange Patterns." W3C Working Draft 26. www.w3.org/TR/2004/WD-wsdl20-patterns-20040326.
It may take a bit of thought to get comfortable with the Out-Only message exchange pattern, but it is an important one. The subscription service defines the interface (the notification interface), but this interface is not called by another component. Instead, the interface is called by the subscription service itself and causes a message to be sent to the subscribing component. It is as if the notification operation were part of the subscribing component's interface.
The subscription paradigm provides a means for the subscription service to send notifications to other components without having to know anything specific about the notification recipients. The only thing the subscription service knows (or assumes, at least) is that all recipients have implemented the notification interface. This leads to a degree of independence—decoupling—between the subscription service and the components being notified, but does not necessarily lead to complete independence.
When a service contains cached information from another service that cannot be arbitrarily deleted, it has a dependency on the system-of-record service that provides the information. In this example, the order would not make a lot of sense if the customer or the product information were deleted. Thus the Sales Order Service depends upon the Customer Information Service and the Product Information Service not to delete information that it is currently using.
The existence of such dependencies usually becomes clear when you consider the impact of purging—permanently deleting—information. In a practical system, the removal of customer and product information requires coordination with the Sales Order Service so that orders don't end up missing customers or products. You may find yourself with a purge strategy that starts with removing old orders and then proceeds to remove old customers and old products. To ensure that you are not deleting customers and products for which there are still active orders, you may want to add operations on the Sales Order Service to find orders related to a particular customer or product. Alternatively, you may not want to delete old customers or products at all; instead, simply mark them as inactive in the system-of-record. Of course, eventually you are going to need to do a purge anyway as the accumulating information will degrade performance and stress physical storage limits.
As this example illustrates, defining services requires that you rationalize the structure and ownership of data as well as functionality. You can't just wrap a poorly organized database with services and expect to have well-defined and easy-to-manage services. Put lipstick on a pig and you still have a pig! Designing services requires teasing data apart for manageability and establishing clear ownership for both objects and relationships.
When one service references objects owned by another service, you will likely have to add operations to manage replicated data. The full lifecycle of referenced objects, from creation through modification and destruction, needs to be considered. If arbitrary purges of referenced objects cannot be allowed, then you must document this dependency, describe the required object management, and add the required operations to the services.