|
|
Database Caching
Andreas
Bühmann
2003-12-01
Introduction
Web applications are facing an ever increasing number of users
as well as the demand of providing each of them with more and more
customized, i.e., dynamically generated contents. This places a
high workload on every link of the typical processing chain, that
is, on the web servers, the application servers, and the database
server. While the former ones can often be replicated, the more or
less central backend database hinders scaling of the overall
application.
The idea of database caching is to disburden the backend by
using a number of frontend or cache database servers (caches).
These caches, placed close to the application servers, hold
frequently used subsets of the backend database which enable the
caches to answer queries without accessing the backend.
Questions
This seemingly simple scenario – caches for other kinds of
objects are well-known after all – poses a lot of interesting
questions:
Specification
- How can the cached portions of the backend database be
specified?
- Predicates and their extensions seem to provide a very
abstract view on the ‘objects’ being cached. Which
kinds of such predicates can be handled (more easily than others)
and how? How are their possibly overlapping extensions stored and
maintained efficiently?
- How can approaches such as DBCache or
DBProxy be improved, possibly be combined,
or be embedded into an overall database caching model?
Query Processing
- How can we decide which queries or which parts of a query we
are able to answer by using the cache contents?
- How can queries be rewritten such that they can be evaluated
in a distributed manner among backend and frontend?
Adaptation
- How do we choose predicates whose extensions are worth
caching?
- If the cache contents are to be dynamically adapted to
changing workloads, which strategy is appropriate? Which
underlying principle of locality can be exploited?
- How do we know that the maintenance of a given predicate
extension in the cache is no longer useful? (There is a cost with
every cached predicate extension if we require its
freshness.)
Updates
- What about updates occurring in the backend or being
initiated by the application through the frontend?
-
Updates to the backend have to be propagated to all frontends.
When should this happen? When must this have happened in order
to provide reasonable modes of consistency and freshness (to be
defined)?
- Is it possible to specify some time interval δ that
limits the age of a database view exposed at the
frontend?
- Is it possible for different ‘freshness
spheres’ to exist side by side?
-
Having in mind the goal of disburdening the backend: Can
updates be applied within a transaction only to the cache and
be propagated to the backend after transaction commit?
- Which levels of isolation can be achieved, which ones can
be tolerated?
- Similarly, what about transactional semantics in general?
Is caching worthwhile or even possible if ACID must strictly
be ensured? (What update/read ratio is acceptable before
caching does not pay anymore?)
Distribution
- Is it possible to exchange data directly between multiple
frontends in a P2P-like fashion?
- Might database caching play a role in a forthcoming data
grid?
Related ideas
-
Caching for web applications on other levels/with other
objects:
- HTML/XML pages: mostly meaningful for static pages
only.
- Fragments of pages: Divide pages in fragments with
different frequencies of update; cache some fragments,
generate others, and use templates to assemble them.
- application objects (e.g., EJBs)
- Replication
- Materialized views: Use them for answering queries against
base tables. (IBM: Materialized query tables, MQTs)
-
Semantic Caching
- DBCache at
IBM Almaden Research Center,
cache tables
-
DBProxy
|