All input and output in software involves resource management. A resource is an entity that represents storage or devices reserved for use by a specific application instance. A file handle is a simple example. A file handle represents an open file and a channel that enable input and output to that file. The file handle also acts as a software interface to the contents of the file, hiding the hardware-level details. In general, resources are abstractions that simplify the low-level complexity of dealing with input and output.
In addition to encapsulating input and output details, resources also serve semantic purposes. Resources store contextual information and enable controlled, concurrent access to the underlying storage or devices. This part of the book describes the semantic characteristics of resources, gives an overview of common resources that are involved in data access, and introduces patterns that are useful for managing resources.
Resources and Context
Resources store context. Consider how an application writes to a file:
FileWriter out
= new FileWriter("/home/user1/output.txt", false);
out.write("This is a test.");
out.write("This is another test.");
out.write("This is yet another test.");
The FileWriter object is the file handle. It stores contextual information about how the file was opened. This context includes the file's name and whether it was opened for reading, writing, or appending. In addition, the file handle also stores state information, specifically a pointer to the location of the last operation. Saving this state ensures that subsequent writes do not overwrite data.
All file input and output operations require this contextual information. Without the notion of open files and file handles, the same code might look like this:
File.write("/home/user1/output.txt", 0,
"This is a test.");
File.write("/home/user1/output.txt", 15,
"This is another test.");
File.write("/home/user1/output.txt", 36,
"This is yet another test.");
Without the benefit of a file handle's contextual information, the code has to pass the file name on every operation and maintain its own file pointer. Listing the filename repeatedly clutters the code and file pointer calculations are extremely error-prone. In general, resources that store contextual information save programmers from copying and pasting a lot of redundant code and managing state information from one operation to the next.
In addition, a resource's context helps to eliminate redundant initialization operations. A file handle might reference hardware-level details about the physical location of a file's data or a record from a volume's file access table. A file handle resolves this information when an application initializes it, and then references it on every operation. If there is no file handle in which to store these low-level details, the file system needs to resolve them for every operation. Resource initialization tends to be expensive, and storing the details for repeated use significantly reduces performance overhead.
Resources and Concurrency
Resources often represent data or objects that are available to multiple applications distributed across a network. For example, a shared file system's contents are accessible to multiple users and serve as a cornerstone for team development and project management. Similarly, a single table's contents are available to a variety of applications, as well as reporting, data entry, and management tools. When multiple users access the same files or tables in incompatible ways, unpredictable results occur.
Consider a set of servers that logs requests and responses to a single file. This log is useful for gathering operational statistics and debugging. If two server instances write entries to the file at the same time, there is a good chance that the messages will blend together into an unreadable mess. Clearly, this file requires concurrent writes to be synchronized.
Resources play an important role in robust concurrency solutions. Resources often manage some level of synchronization that restricts concurrent access. For example, a file handle associates multiple writes with a single server instance so that the file system can ensure that no other instances write to the same file until the current write operation is complete. It can further restrict access so that only one server instance can open a given file at a time, if required. In this case, it blocks or rejects any other instance's requests for similar file handles.
Data Access Resources
The following general-purpose, low-level resources provide unstructured data access and are useful in a wide range of applications:
A buffer is an unstructured block of memory allocated from the heap for a specific purpose. The system's heap manager prevents other applications from allocating the buffer's memory until the original requester explicitly frees it.
A file handle controls access to a physical file. File systems implement concurrency to prevent multiple users from making inconsistent updates to the file.
A socket is a network connection handle to another application or server.
A substantial amount of relational database access code today employs a standard call-level interface like ODBC, OLE DB, or JDBC. Interface standards have matured in recent years, but most retain the same well-known resource concepts. The following list describes the most common resources that applications use when calling data access interfaces. While these are not universal, most call-level interfaces define analogous entities.
A connection is a handle to a database manager or server that exposes fundamental database operations. Connections often enforce user authentication and authorization and encapsulate one or more sockets. Because of its considerable initialization overhead, creating a new database connection tends to be among the slowest database operations in an application.
A statement handle stores contextual and state information for one thread or module's database operations. A prepared statement is an optimized form of statement handle that stores compiled database instructions and parameter values so that an application can quickly execute them multiple times.
A result set provides access to query results stored within a database. Result sets usually allow applications to fetch results as either single rows or blocks to avoid requiring massive amounts of bandwidth and client memory. A result set maintains a pointer to the current position using an open cursor. In some cases, particularly when updating result data, open cursors lock individual rows or complete tables to prevent other processes from updating them at the same time.
Some databases allow you to read and manipulate disconnected result sets. A disconnected result set contains an entire set of query results in memory and does not require an active connection. Later on, when your application re-establishes a connection, it can synchronize changes made to the result set data. Disconnected result sets impose fewer concurrency constraints than connected result sets, but require additional attention when synchronizing potentially inconsistent changes.
A transaction groups a set of database operations together so that either all or none of the operations get committed to the database. Database drivers often implement transactions using extensive row and table locking.
This list is not comprehensive. You will run across other resources, but the management techniques and patterns that this section describes often still apply.
Resource Management
Resources play an important role in simplifying input and output code, improving the performance of database operations, and enforcing consistent, concurrent data access. On the other hand, resources also impose some important responsibilities on applications.
Resources usually consume significant amounts of storage as long as they remain open. For example, a database connection requires client memory to store its contextual information and it allocates server memory to maintain server-side context. In addition, it usually keeps a socket open on both sides to enable fast communication between the two sockets.
Resources often implement synchronization by restricting concurrent access to one or more objects. A file handle opened for writing often instructs the file system to lock out any other requests for the same file. The file system blocks other applications or users from accessing the file until the writing application closes its handle. This characteristic usually manifests to end users as a slow or unresponsive system since processes spend a significant amount of time waiting for locked resources.
Since open resources consume storage and reduce concurrency, it is important to understand how applications use and manage resources. There are no easy solutions for all cases, but this list describes a few ideas that apply to most:
Always release resources—
Anytime your code opens or allocates a resource, it is also responsible for closing or freeing it. While you might not intentionally neglect to close a resource, it is common to mistakenly inject a condition where the resource close operation gets skipped. This condition is commonly known as a resource leak. As your program gradually consumes more resources, the supply available for other applications similarly depletes. Resource leaks are exceptionally difficult to debug because the ultimate symptom is either an error indicating that no more resources are available or a slow system whose performance continually degrades. In either case, the error provides absolutely no indication of the resource leak's source.
Environments that provide automatic garbage collection theoretically solve this problem. Once your program finishes using a resource, the garbage collector closes it for you. This strategy works effectively for resources like buffers that are managed within the garbage-collected environment, but it does not work well for database resources. There is an unpredictable window between the time your program finishes using a resource and the time the environment collects it. As this interval increases, concurrency issues become more pronounced. Even worse, many database drivers keep internal references to resources for their own management. Since the garbage collector does not differentiate between your applications' and the driver's references, it assumes that all referenced resources are still in use and does not close them.
A better strategy for avoiding resource leaks is to open and close resources with rigid, symmetric structure. This means that any isolated component that opens a resource is responsible for closing it as well. It also helps to confine resource management within a single component so that it is easier to write and maintain, and requires fewer accommodations for special cases.
Minimize the interval that a resource is left open—
Since it is common for resources to lock objects for the purpose of synchronization, it is best to minimize the interval between the time when an application opens and closes each resource.
Consider an application that performs extensive calculations while writing a report to a file. It is a good idea to calculate the report data before opening the file. That way, the file handle is open just long enough to write the report. The file system locks the file for a smaller interval, making its previous contents available to other applications until just before it writes the new report data.
Pool resources to conserve initialization expense—
Some resources, like database connections and sockets, are quite expensive to initialize. These resources consume client and server storage, but they do not necessarily restrict concurrency. In these cases, consider pooling resources. Pooling resources means keeping a set of resources open at all times so that your application can use them quickly without incurring initialization overhead on each operation.
Resource Patterns
Resource patterns define common design strategies for managing resources at the application or middleware level. This part of the book contains chapters for each of the following patterns:
Resource Decorator (103)—
Dynamically attaches additional behavior to an existing resource with minimal disruption to application code. A resource decorator enables the extension of a resource's functionality without subclassing or changing its implementation.
Resource Pool (117)—
Recycles resources to minimize resource initialization overhead. A resource pool manages resources efficiently while allowing application code to freely allocate them.
Resource Timer (137)—
Automatically releases inactive resources. A resource timer alleviates the effect of applications or users that allocate resources indefinitely.
Resource Descriptor (159)—
Isolates platform- and data source-dependent behavior within a single component. A resource descriptor exposes specific platform idiosyncrasies that relate to particular database resources as generic, logical operations and enables the majority of data access code to remain independent of its physical environment.
Retryer (171)—
Automatically retries operations whose failure is expected under certain defined conditions. This pattern enables fault tolerance for data access operations.
If you employ a strategy that isolates data access code within a single component like the decoupling patterns in Part 1 of this book describe, then you are in a better position to confine the use of resource patterns within that component as well. This enables you to add new database features like customized logging, connection pooling, statement caching, automated resource cleanup, and retry functionality without negatively affecting application code.