Monday, October 18, 2010

Adventures with Squid caching - overview

I've been knees-deep in the REST services world for most of the last 12 months, and while that domain may be simpler to model and deal with than RPC or SOAP, one of the most compelling advantages is making use of HTTP caching. Instead baking your own caching layer or application, using stateful server sessions, or, worst of all, forcing clients to use a fat-client jar that does it's own caching (yes, I've committed that sin in the distant past), there are several open-source HTTP cache proxy applications that support the HTTP caching spec to differing degrees. The main cache proxies are:
For our needs, we just went with squid as a team member had lots of experience and successful with it at a previous gig.

Of course, we could have used a look-aside cache (something like memcached), but we were interested in having a write-through cache in front of the service to reduce hops and latency (I'll explain below).


Basic premise and early decisions
We were breaking up a massive www application into smaller standalone components/services for, ya know, the usual sorts of reasons: scalability, availability, no single points of failure, and so on. In addition to pulling apart legacy code and rethinking our data persistence options, we also opened up another data center - on the other side of the country. Our data itself wasn't going to be directly replicated to the new data center (not going into any details on this) as there many hands in the pie that could not be easily pulled out, slapped, or cut off. To that end, we had to deal with applications deployed only in the new data center that needed the existing customer data. Enter cross-country latency - umm, yummy.
So now that you know some of the ground rules and constraints of the work, we decided early on to use REST as the premise/framework for building out our services. We wanted to stick as close to the HTTP RFC as possible, within reason, without just inventing a bunch weird-ass rules around using our stuff. And, true to the title of this series, we wanted to use a reverse cache proxy to not only improve the basic performance of the service, but also to help relieve the cross-colo latency.

So, why is this interesting to you? I'm going to record my experiences with REST, squid, and the rest, and I'll point out what worked well, what didn't, and what was freaking painful to figure out, get working, or just plain abandon. I'll cover the following topics:
  • Vary (HTTP header)
  • ETag (HTTP header)
  • cache invalidation via PUT/POST/DELETE
  • tying them all together

Of course, as we went through building out everything in the service, it didn't all fall into neat, even bucket of functionality like the bullet points above. Like everything else when building out new buckets of functionality, it's a little of coding here, a little bit of configuration there, and some ITOps work to bundle it all up.

Oh, yeah, beer was involved, as well. Lots.

No comments: