Brett Hoerner's blog
Using the Nginx memcached module with Django
written on Monday, October 27, 2008
I decided to play around with the Nginx memcached module recently. It's a very interesting and simple mod that checks memcached for pages before falling back to your backend (or whatever you tell it to do, actually).
The simplicity does have a couple of drawbacks,
Cached bits are pulled out of memcached and nginx doesn't magically know their content-type. You can (should) provide a default content-type that will be used for your typical dynamic requests such as /foo and /bar/. If you want to cache non-[X]HTML content, you have to do some redirection hacking so that nginx thinks it knows what it's serving - I don't go into that here as I'm only caching my dynamic HTML requests.
The second issue is that you have to handle caching (and invalidation) inside of your app. This brings with it all of the normal cache invalidation issues. Since this is a very simple blog I was able to get away with a quick hack, and I haven't figured out how to best handle it on larger sites. On the other hand, you control the memcached timeouts, meaning even complex sites could use this with only time-based invalidation if they decided X seconds of dirty cache were alright for Y URLs.
nginx.conf
I think this part is pretty self explanatory. I only check memcached on GET requests, and I have chosen a key prefix of NG: so that I don't pollute the hash table (as the URL is used for the rest of the key).
user www-data www-data;
worker_processes 4;
pid /var/run/nginx.pid;
error_log /var/log/nginx/error.log;
events {
worker_connections 1024;
}
http {
upstream backend {
server 127.0.0.1:8000 weight=1;
}
# pass Host header on to the backend
proxy_set_header Host $host;
# set X-Forwarded-For header so the backend knows who
# actually made the request
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
# don't send version in response headers
server_tokens off;
include /usr/local/conf/mime.types;
# fallback if none matches
default_type application/octet-stream;
# about: http://www.baus.net/on-tcp_cork
tcp_nopush on;
sendfile on;
gzip on;
gzip_min_length 1000;
gzip_proxied any;
gzip_types text/css text/plain application/atom+xml application/x-javascript;
gzip_vary on;
server {
listen 80;
server_name bretthoerner.com;
root /a/bretthoerner.com/root;
location ~ [^/]$ {
# catches paths that don't end in / (typically media)
# if there is no matching static file in the root
# proxy the request to the backend
if (!-f $request_filename) {
proxy_pass http://backend;
break;
}
}
location / {
# don't check memcached on POST, etc
if ($request_method != GET) {
proxy_pass http://backend;
break;
}
# pages fetched from memcached don't know their
# own mime type
default_type "text/html; charset=utf-8";
# use same prefix as backend does
set $memcached_key "NG:$uri";
# memcached location
memcached_pass localhost:11211;
# 404 for cache miss
# 502 for memcached down
error_page 404 502 = @cache_miss;
}
location @cache_miss {
proxy_pass http://backend;
}
}
server {
listen 80;
server_name media.bretthoerner.com;
root /a/bretthoerner.com/media;
}
}
Django middleware for setting cache
This is where the cache is set. On each Django response I check that,
- The request is a GET
- The request is not to /admin
- The request is a success
Then you simply set the cache where nginx will look,
class NginxMemcacheMiddleWare:
def process_response(self, request, response):
path = request.get_full_path()
if request.method != "GET" \
or (path.startswith('/admin') and not request.user.is_anonymous()) \
or response.status_code != 200:
return response
# settings.NGINX_CACHE_PREFIX == 'NG', just like nginx.conf
key = "%s:%s" % (settings.NGINX_CACHE_PREFIX, path)
cache.set(key, response.content)
return response
Django signal for invalidating cache
Cache invalidation is freakin' hard, especially on a more complicated site. As I said previously, this is mostly a hack, it isn't the sort of thing that could scale (not in performance, but code and development). I'm trying to think of a better way to handle it, but describing and maintaining dependencies in a code-base is tough.
Anyway, the following should also be pretty self explanatory. When an object is saved, the URL for the object itself is invalidated (that's the easy part). Now the tough question is - what else needs to change? For my simple blog I've just hardcoded these values because I only have one model I really care about (Entry). I invalidate the homepage and all archive pages, so that the next request to any will have the latest title / body for all entries.
def delete_memcache_keys(sender, instance, created, **kwargs):
keys = []
if hasattr(instance, "get_absolute_url"):
keys.append(instance.get_absolute_url())
if isinstance(instance, Entry):
date = instance.pub_date
keys.extend(["/",
"/blog/",
"/blog/%s/" % date_format(date, "Y"),
"/blog/%s/" % date_format(date, "Y/b"),
"/blog/%s/" % date_format(date, "Y/b/d")
])
for key in keys:
cache.delete("%s:%s" % (settings.NGINX_CACHE_PREFIX, key))
signals.post_save.connect(delete_memcache_keys)
Benchmarks and conclusion
So how much faster is it? Well, there's a little hitch when you use a basic blog to test this stuff out - it's already fast as hell. I'm using sqlite and am pretty confident the entire 150K database is in memory. A typical request does maybe 1 simple query and returns the pre-rendered entry field from the database. What that all means is that the largest cost of requesting a page (on this site) is traffic over the wire, e.g. waiting for the bits.
A more complex app would be a much different story, although I don't have one to try it on. I will say that if I added a simple time.sleep(.1) (100ms) to my index view - in order to simulate more work that would have to be done on a real site - I went from ~13 req/s to ~27 req/s. (Keep in mind I'm on a small VPS and I was only doing 3 concurrent requests at a time because I don't have many pre-forked Apache children created, and I wanted to keep both tests 1:1) Still, a pretty big deal, no?