data.cache
- Cache ¶A cache works similarly as a dictionary, associating keys to values, but its entries may disappear according to the policy of the cache algorithm. This module defines a common protocol for cache datatypes, and also provides several typical cache implementations.
Let’s start from simple examples to get the idea.
Suppose you want to read given files and you want to
cache the frequently read ones. The following code defines
a cached version of file->string
:
(use data.cache) (use file.util) (define file->string/cached (let1 file-cache (make-lru-cache 10 :comparator string-comparator) (^[path] (cache-through! file-cache path file->string))))
The procedure closes a variable file-cache
,
which is an LRU (least recently used) cache that associates
string pathnames to the file contents.
The actual logic is in cache-through!
, which first consults the
cache if it has an entry for the path
. If the cache has
the entry, its value (the file content) is returned. If not,
it calls file->string
with the path to fetch the file content,
register it to the cache, and return it. The capacity of the cache is
set to 10 (the first argument of make-lru-cache
), so when the
11th file is read, the least recently used file will be evicted from the
cache.
The effect of the cache isn’t very visible in the above example. You
can insert some print stubs to see the cache is actually in action,
as the following example.
Try read various files using file->string/cached
.
(define file->string/cached (let1 file-cache (make-lru-cache 10 :comparator string-comparator) (^[path] (print #"file->string/cached called on ~path") (cache-through! file-cache path (^[path] (print #"cache miss. fetching ~path") (file->string path))))))
Caveat: A cache itself isn’t MT-safe. If you are using it in multithreaded programs, you have to wrap it with an atom (see Synchronization primitives):
(use data.cache) (use file.util) (use gauche.threads) (define file->string/cached (let1 file-cache (atom (make-lru-cache 10 :comparator string-comparator)) (^[path] (atomic file-cache (cut cache-through! <> path file->string)))))
A cache of any kind has a comparator and a storage.
The comparator is used to compare keys; in the example above,
we use string-comparator
to compare string pathnames
(see Basic comparators, for more about comparators).
The storage is a dictionary that maps keys to internal
structures of the cache. It needs to implement the dictionary
protocol (see gauche.dictionary
- Dictionary framework).
By default, a hashtable is
created automatically using the given comparator (or,
if a comparator is omitted, using default-comparator
).
The comparator must have the hash function.
Alternatively, you can give a pre-filled dictionary (copied from another instance of the same kind of cache) to start cache with some data already in it. Note that what the cache keeps in the dictionary totally depends on the cache algorithm, so you can’t just pass a random dictionary; it has to be created by the same kind of cache. If you pass in the storage, the comparator is taken from it.
Thus, the cache constructors uniformly take keyword arguments comparator and storage; you can specify either one, or omit both to use the defaults.
For the storage and comparator keyword arguments, see above.
{data.cache
}
Creates and returns a FIFO (first-in, first-out) cache that can
hold up to capacity entries.
If the number of entries exceeds capacity, the oldest entry
is removed.
{data.cache
}
Creates and returns an LRU (least recently used) cache that can
hold up to capacity entries.
If the number of entries exceeds capacity, the least recently used
entry is removed.
{data.cache
}
Creates and returns a TTL (time to live) cache with the timeout value
timeout.
Each entry is timestamped when it’s inserted, and it is removed
when the current time passes timeout unit from the timestamp.
The actual entry removal is done when the cache is accessed.
By default, the Unix system time (seconds from Epoch) is used as a timestamp, and timeout is in seconds. It may not be fine-grained enough if you add multiple entries in shorter intervals than seconds. You can customize it by giving a thunk to timestamper; the thunk is called to obtain a timestamp, which can be any monotonically increasing real number (it doesn’t need to be associated with physical time). If you give timestamper, the unit of timeout value should be the same as whatever timestamper returns.
{data.cache
}
A variation of TTL cache, but the entry’s timestamp is updated
(refreshed) whenever the entry is read. Hence we call it TTL with refresh
(TTLR). But you can also think it as a variation of LRU cache with
timeout.
The unit of timeout, and the role of timestamper argument, are
the same as make-ttl-cache
.
The following APIs are for the users of a cache.
{data.cache
}
Look for an entry with key in cache, and returns its
value if it exists. If there’s no entry, the procedure returns default
if it is provided, or throws an error otherwise.
Some types of cache algorithms update cache by this operation (e.g. access timestamp), hence the bang is in the name.
{data.cache
}
Look for an entry with key in cache, and returns its
value if it exists. If there’s no entry, a procedure value-fn
is called with key as the argument, and its return value
is inserted into cache and also returned.
The insertion may cause eviction of exising entries.
{data.cache
}
This inserts association of key and value into cache.
If there’s already an entry with key, it is overwritten.
Otherwise a new entry is created.
The same effect can be achieved by calling cache-evict!
then
cache-through!
, but cache algorithms may provide efficient
way through this method.
{data.cache
}
Removes an entry with key from cache, if it exists.
{data.cache
}
Removes all entries from cache.
Each cache algorithm must define a class inheriting <cache>
,
and implement the following two essential methods.
The higher-level API calls them.
{data.cache
}
Looks for an entry with key in cache.
If it exists, returns a pair of key and the associated value.
Otherwise, returns #f
. It may update the cache,
for example, the timestamp of the entry for being read.
{data.cache
}
Add an entry with key and associated value into cache.
This is called after key is confirmed not being in cache.
Additionally, the implementation should consider the following points.
initialize
method must call next-method
first, which sets up
the comparator
and storage
slots. You should check
if storage
has pre-filled entries, and if so, set up other
internal structures appropriately.
cache-evict!
and cache-clear!
only
takes care of the storage of the cache. You should implement them
if your auxiliary structure needs to be taken care of.
cache-write!
is just cache-evict!
followed by cache-register!
. You may provide an alternative method
if you can do it more efficiently, which is often the case.
There are several procedures that help implementing cache subclasses:
{data.cache
}
Returns the comparator and the storage of the cache, respectively.
Typical caches may be constructed with a storage (dictionary) and
a queue, where the storage maps keys to (<n> . <value>)
,
and queues holds (<key> . <n>)
, <n>
being a number
(timestamp, counter, etc.) Here are some common operations work
on this queue-and-dictionary scheme:
{data.cache
}
You can call this in the initialize
method to set up the
queue. This procedure walks storage (a dictionary that maps
keys to (<n> . <value>)
) to construct (<key> . <n>)
pairs, sorts it in increasing order of <n>
, and pushes them
into the queue.
{data.cache
}
The queue may contain multiple pairs with the same key. Sometimes the
queue gets to have too many duplicated entries (e.g. the same entry
is read repeatedly, and you push the read timestamp to the queue
for every read). This procedure scans the queue and removes duplicated entries
but the up-to-date one. After this operation, the length of the queue
and the number of entries in the storage should match.
{data.cache
}
This procedure renumbers <n>
s in the queue and the storage
starting from 0, without changing their order, and returns the
maximum <n>
. The duplicated entries in the queue is removed
as in cache-compact-queue!
.
When you’re using monotonically increasing counter for <n>
and
you don’t want <n>
to get too big (i.e. bignums), you can
call this procedure occasionally to keep <n>
’s in a reasonable range.