When you code a dynamic application, you will soon face its trade-off: it is dynamic.
Each time a user does a request, your server makes all sorts of calculations – database queries, template rendering and so on – to create the final response. For most web applications, this is not a big deal, but when your application starts becoming big and highly visited you will want to limit the overhead on your machines.
That's where caching comes in.
The main idea behind cache is simple: we store the result of an expensive calculation somewhere to avoid repeating the calculation if we can. But, sincerely speaking, designing a good caching scheme is mainly a PITA, since it involves many complex evaluations about what you should store, where to store it, and so on.
So how can Emmett help you with this? It provides some tools out of the box that let you focus your development energy on what to cache and not on how you should do that.
The caching system in Emmett consist of a single class named Cache
. Consequentially, the first step in configuring cache in your application is to create an instance of this cache in your application:
from emmett.cache import Cache
cache = Cache()
By default, Emmett stores your cached content into the RAM of your machine, but you can also use the disk or redis as your storage system. Let's see these three handlers in detail.
As we just saw, this is the default cache mechanism of Emmett. Initializing a Cache
instance without arguments would be the same of using the RamCache
handler:
from emmett.cache import Cache, RamCache
cache = Cache(ram=RamCache())
The RamCache
also accepts some parameters you might take advantage of:
parameter | default value | description |
---|---|---|
prefix | allows to specify a common prefix for caching keys | |
threshold | 500 | set a maximum number of objects stored in the cache |
default_expire | 300 | set a default expiration (in seconds) for stored objects |
Note on multi-processing: When you store data in RAM cache, you are actually using the python process' memory. If you're running your web application using multiple processes/workers, every process will have its own cache and the data you store wont be available to the other ones.
If you need to have a shared cache between processes, you should use the disk or redis ones.
The disk cache is actually slower than the RAM or the redis ones, but if you need to cache large amounts of data, it fits the role perfectly. Here is how to use it:
from emmett.cache import Cache, DiskCache
cache = Cache(disk=DiskCache())
The DiskCache
class accepts some parameters too:
parameter | default value | description |
---|---|---|
cache_dir | 'cache' |
allows to specify the directory in which data will be stored |
threshold | 500 | set a maximum number of objects stored in the cache |
default_expire | 300 | set a default expiration (in seconds) for stored objects |
Redis is quite a good system for caching: is really fast – really – and if you're running your application with several workers, your data will be shared between your processes. To use it, you just initialize the Cache
class with the RedisCache
handler:
from emmett.cache import Cache, RedisCache
cache = Cache(redis=RedisCache(host='localhost', port=6379))
As we saw with the other handlers, RedisCache
class accepts some parameters too:
parameter | default value | description |
---|---|---|
host | 'localhost' |
the host of the redis backend |
port | 6379 | the port of the redis backend |
db | 0 | the database number to use on the redis backend |
prefix | 'cache:' |
allows to specify a common prefix for caching keys |
default_expire | 300 | set a default expiration (in seconds) for stored objects |
As you probably supposed, you can use multiple caching system together. Let's say you want to use the three systems we just described. You can do it simply:
from emmett.cache import Cache, RamCache, DiskCache, RedisCache
cache = Cache(
ram=RamCache(),
disk=DiskCache(),
redis=RedisCache()
)
You can also tells to Emmett what handler should be used when not specified, thanks to the default
parameter:
cache = Cache(m=RamCache(), r=RedisCache(), default='r')
Changed in version 2.0
The quickier usage of cache is to just apply it on a simple action, such as a select on the database or a computation. Let's say, for example, that you have a blog and a certain function that exposes the last ten posts:
@app.route("/last")
async def last():
rows = Post.all().select(orderby=~Post.date, limitby=(0, 10))
return dict(posts=rows)
Now, since the performance bottleneck here is the call to the database, you can limit the overhead by caching the select result for 30 seconds, so you decrease the number of calls to your database:
@app.route("/last")
async def last():
def _get():
return Post.all().select(orderby=~Post.date, limitby=(0, 10))
return dict(posts=cache('last_posts', _get, 30))
Here's how it works: you encapsulate the action you want to cache into a function, and then call your cache
instance with a key, the function, and the amount of time in seconds you want to store the result of your function. Emmett will take care of the rest.
You can also put in cache results coming from async
operations, you just need to be sure to pass a coroutine function to the cache call. In this case the syntax will be preserved, and you need to await
the cache call:
async def _get():
return 'value'
@app.route()
async def data():
return dict(data=await cache('last_data', _get, 30))
– OK, dude. What if I have multiple handlers? where does Emmett store the result?
– you can choose that
As we saw before, by default Emmett stores your cached content into the handler chosen as default. But you can choose on which handler you want to store data:
cache = Cache(
ram=RamCache(),
disk=DiskCache(),
redis=RedisCache(),
default='ram'
)
v_ram = cache('my_key', my_f, my_time)
v_ram = cache.ram('my_key', my_f, my_time)
v_disk = cache.disk('my_key', my_f, my_time)
v_redis = cache.redis('my_key', my_f, my_time)
Changed in version 2.0
Emmett's cache can also be used as a decorator. For example, we can rewrite the above example as follows:
@cache(duration=30)
def last_posts():
return Post.all().select(orderby=~Post.date, limitby=(0, 10))
@app.route("/last")
async def last():
return dict(posts=last_posts())
and the result would be the same. The notation, in the case you want to specify the handler to use, is the same:
# use redis handler
@cache.redis()
# use ram handler
@cache.ram()
When using the decorator notation, Emmett will use the arguments you pass to the decorated method to build different results. This means that if we decorate a method that accepts arguments like:
@cache()
def cached_method(a, b, c='foo', d='bar'):
# some code
then Emmett will cache different contents in case you call cached_method(1, 2, c='a')
and cached_method(1, 3, c='b')
.
The cache decorator also supports async
methods:
@cache()
async def get_data(page=1):
await some_data(page)
@app.routes()
async def data():
return dict(data=await get_data(request.query_params.page or 1))
New in version 1.2
Sometimes you would need to cache an entire response from your application. Emmett provides the Cache.response
decorator for that. Let's rewrite the example we used above: this time, instead of caching just the database selection, we will cache the entire page that Emmett will produce from our route:
@app.route("/last")
@cache.response()
async def last():
posts = Post.all().select(orderby=~Post.date, limitby=(0, 10))
return dict(posts=posts)
The main difference from the above examples is that, in case of available cached content, everything that would happened inside your route and template code won't be executed; instead, Emmett will return the final response body and its headers from the ones available in the cache.
Note: this means that also nothing contained in the
pipe
,on_pipe_success
andon_pipe_failure
methods of the pipes in your route pipeline won't be executed. In case you need execution of code on cached routes you should use theopen
andclose
methods of the pipes.
Mind that Emmett will cache only contents on GET and HEAD requests that returns a 200 response code. This is intended to avoid unwanted cached mechanism on your application.
The Cache.response
method accepts also some parameters you might want to use:
parameter | default value | description |
---|---|---|
duration | 'default' |
the duration (in seconds) the cached content should be considered valid |
query_params | True |
tells Emmett to consider the request's query parameters to generate different cached contents |
language | True |
tells Emmett to consider the clients language to generate different cached contents |
hostname | False |
tells Emmett to consider the path hostname to generate different cached contents |
headers | [] |
an additional list of headers Emmett should use to generate different cached contents |
In some cases, you might need to cache all the routes contained in an application module. In order to achieve this, you can use the cache
parameter when you define your module:
mod = app.module(__name__, 'mymodule', cache=cache.response())
Changed in version 1.2
As we saw in the sections above, the common usage of cache is to call the Cache
instance with a callable object that will produce the cached contents in case they are not available in the cache.
In all the cases you need to perform operations on the cache dirrently, you can use the exposed methods of the Cache
instance and its handlers. Let's see them in detail.
Every time you need to access contents from cache, you can use the get
method:
value = cache.get('key')
If no contents are available, this method will return None
.
When you need to manually set contents in cache, you can use the set
method:
cache.set('key', 'value', duration=300)
Note: if you want to store the result of a callable object, you should invoke it yourself.
You can implement a manual check-and-set policy using get
and set
methods:
value = cache.get('key')
if not value:
value = 'somevalue'
cache.set('key', value, duration=300)
The last example can be written in a compact way using the get_or_set
method:
value = cache.get_or_set('key', 'somevalue', duration=300)
Note: as we saw for the
set
method, if you want to store the result of a callable object, you should invoke it yourself.
New in version 2.0
The get_or_set
behaviour can also be used with awaitable functions and objects, just use the provided get_or_set_loop
method for that:
async def somefunction():
return 'value'
value = await cache.get_or_set_loop('key', somefunction, duration=300)
Whenever you need to manually delete contents from cache, you can use the clear
method:
cache.clear('key')
And if you need to clear the entire cache you can invoke the clear method without arguments.
Note: on redis, a key containing * will mean clearing all the existing keys with that pattern. So calling
cache.clear('user*')
will delete all the contents for keys starting with user.