High Performance != Client/Server: The Case for Embedded Databases

Margo Seltzer, Harvard University and Sleepycat Software

Mention the problem of "high performance transaction processing" and most people's minds turn towards SQL, client-server, and middleware. This is not right. These solutions are inherently expensive in terms of overhead, and the architecture throws obstacles in the way of application designers striving for truly high performance.

One of the most widely deployed transaction applications today is the lightweight directory access protocol (LDAP). According to Joel Snyder, "In the past five years, LDAP has been elevated to the de facto standard for how users and applications access information stored in a directory server." [1] The top three performing LDAP servers according to this survey all ignore the "conventional wisdom" of client-server relational databases for their core data store, instead using a blazingly fast embedded transactional data store.

How can this be? Benchmark after published benchmark clearly demonstrates that the high performance transaction processing environments are dominated by the big relational vendors. So, what's going on with LDAP? Quite simply, the LDAP designers questioned the conventional wisdom and arrived at a superior architecture. LDAP is essentially yet another database front end. It is a query language. Thus, using a conventional relational database results in translating one database language and representation to another. The high-performers in this field instead implement their own data model on top of a robust, reliable, but largely unstructured data store. The result is systems that outperform the big relational vendors.

LDAP is not an isolated case. The move for XML-encoded business logic has created a burgeoning market for XML to SQL translators. Once again, this misses the point. If you want high performance, design for the data model your application uses rather than wasting valuable time translating between data models.

Data translation is not the only place where conventional application architectures lose performance. The plethora of client-server architectures is an abuse of an otherwise good idea. The beauty of client-server is that it enables clients to communicate with remote servers in a consistent fashion. However, the introduction of three-tier architectures is simply misguided adherence to this model. The three-tier architecture derives from the conventional client-server model where the server is being decomposed into multiple layers. While the traditional argument is that this decomposition allows different tiers to reside on different machines, in practice, this decomposition rarely happens. Instead, two of the tiers reside on the same machine, wasting valuable cycles communicating via IPC rather than communicating in the same address space.

Consider a web server with a back-end database application. The most common arrangement of such services is for the web server and database to reside on the same machine, yet they communicate through IPC. It is far more efficient to let the web server and database communicate through a common address space, as the embedded, API-based database systems allow. Admittedly, some large installations replicate the web server and put the database server on a separate machine. However, there are two driving forces for this separation. In some cases, the application and system demands are simply too large, in which case the division of labor makes sense. In other, more common cases, it's simply because the database requires all the cycles available from a large server. This is a somewhat circular argument: if the database were streamlined, designed for the application, and embedded with it or the web-server, the need for a dedicated database machine would vanish.

Another disadvantage of the client-server architecture is in ease of deployment. Selling an application requiring database services from a client-server system faces the challenge of selling not only the application but a database system as well. Application salespeople must now convince a customer to purchase a third-party software product, and undoubtedly hire a DBA as well. Contrast this with the embedded approach. The database is seamlessly bundled with the application. No third-party licensing is required; no DBA need be hired; in fact, the customer never needs to know that a database is being purchased. Application installation subsumes database installation. Upgrading is similarly unified.

The ability to deploy without the need for a DBA should not be underestimated. Installations requiring high-performance transaction throughput and hands-off administration (e.g., network switches) have no other choice. It is simply not an option to require human intervention for any aspect of system administration or maintenance.

In addition to technical deployment issues, there are strong business advantages. The customer has a single vendor with which to communicate for support, licensing, etc. From a financial perspective, if the customer isn't knowingly buying a database, an application vendor can sell multiple copies of the same database to the same customer (bundled with different applications). Finally, if application companies aren't trying to explicitly sell a database, they needn't face the obstacle of introducing a new database into a company that has a corporate database policy (e.g., selling an Informix-based application to an Oracle shop).

In summary, the client-server database architecture is a powerful tool when it enables distributed computing. However, when it is used simply as an artifact of implementation, it introduces a number of obstacles to high performance that are easily overcome with simpler, embedded database designs.

References

[1] http://www.nwfusion.com/reviews/2000/0515rev2.html,