Keeping a warm cache

For Drupal sites of any size, even a relatively small amount of traffic can lead to unfortunate page load times.  For the number of db calls each page requires, you’ll be looking at some poor performance and hungry visitors.  A quick jaunt into Memcached and Varnish and your site hums happily along, so long as two visitors hit a given page on your site, inside your cache expiration time frame.  So what happens when you have a pretty significant amount of content which doesn’t naturally fit into a model leading visitors to your most recent post – a recipe site, for instance.  Let’s also say that you don’t want to let a fantastic, breaking strudel recipe languish for a day behind an extremely long cache timeout – your visitors need strudel! 

A simple cache warmer to the rescue!  Assuming you’re running cron (which clears cache) every 3-6 hours, you’ll want to tack a script on to the end which makes a call to each page on your site (or, if we want to get really fun, just the top 500 or so nodes, using Google Analytics).  wget will trigger Varnish to add a page to the cache, but won’t trigger a page view in GA (no JavaScript runs).

 wget -q http://www.example.com/sitemap.xml -O - | egrep -o "http://www\.example\.com[^<]+" | wget -q -i -o /dev/null - --wait 1

So, what’s in this script?

Because our aim to specifically cache the content that people aren’t already hitting regularly, we actually want to use GA to grab the *least* visited nodes on the site.  That one hidden gem for chocolate chip cookies with avocados might only be seen once a month, but when it is, we want it immediately!

Once we have our content list, it’s a snap to iterate through and hit each entry using wget.  Pipe that out to /dev/null, since we don’t actually want the file. You don’t want to have the warmer slam your server with a pile of requests at the same time, so the delay spaces that load out.  Assuming 3 calls per second, 1000 nodes are taken care of in 5 min and 33 sec.