Showing posts with label services. Show all posts
Showing posts with label services. Show all posts

Monday, October 18, 2010

Adventures with Squid caching - overview

I've been knees-deep in the REST services world for most of the last 12 months, and while that domain may be simpler to model and deal with than RPC or SOAP, one of the most compelling advantages is making use of HTTP caching. Instead baking your own caching layer or application, using stateful server sessions, or, worst of all, forcing clients to use a fat-client jar that does it's own caching (yes, I've committed that sin in the distant past), there are several open-source HTTP cache proxy applications that support the HTTP caching spec to differing degrees. The main cache proxies are:
For our needs, we just went with squid as a team member had lots of experience and successful with it at a previous gig.

Of course, we could have used a look-aside cache (something like memcached), but we were interested in having a write-through cache in front of the service to reduce hops and latency (I'll explain below).


Basic premise and early decisions
We were breaking up a massive www application into smaller standalone components/services for, ya know, the usual sorts of reasons: scalability, availability, no single points of failure, and so on. In addition to pulling apart legacy code and rethinking our data persistence options, we also opened up another data center - on the other side of the country. Our data itself wasn't going to be directly replicated to the new data center (not going into any details on this) as there many hands in the pie that could not be easily pulled out, slapped, or cut off. To that end, we had to deal with applications deployed only in the new data center that needed the existing customer data. Enter cross-country latency - umm, yummy.
So now that you know some of the ground rules and constraints of the work, we decided early on to use REST as the premise/framework for building out our services. We wanted to stick as close to the HTTP RFC as possible, within reason, without just inventing a bunch weird-ass rules around using our stuff. And, true to the title of this series, we wanted to use a reverse cache proxy to not only improve the basic performance of the service, but also to help relieve the cross-colo latency.

So, why is this interesting to you? I'm going to record my experiences with REST, squid, and the rest, and I'll point out what worked well, what didn't, and what was freaking painful to figure out, get working, or just plain abandon. I'll cover the following topics:
  • Vary (HTTP header)
  • ETag (HTTP header)
  • cache invalidation via PUT/POST/DELETE
  • tying them all together

Of course, as we went through building out everything in the service, it didn't all fall into neat, even bucket of functionality like the bullet points above. Like everything else when building out new buckets of functionality, it's a little of coding here, a little bit of configuration there, and some ITOps work to bundle it all up.

Oh, yeah, beer was involved, as well. Lots.

Thursday, February 14, 2008

Right-sizing the architecture

I work at a medium-sized technology company as an e-commerce architect. The company started out in the dot-com days supporting a single web site (primarily content driven). When I joined about two and a half years ago, the company decided to expand out into new sites hosting different content for different partners. As technologists, we decided break apart what was a monolithic application into separate services that could be reused amongst the sundry sites we would now be hosting. With SOA being a more accepted idiom at the time, we decided to build out a "platform" to handle our needs.

As for my role specifically in e-commerce, we have two aspects of the business: digital subscriptions and merchandise. The digital subscription e-commerce is handled in-house via home-brewed systems. The merchandise e-commerce engine and site was hosted and managed by third company. Early on in my requirements gathering for the e-commerce part of the new platform, there was talk of ditching off the external company and handling the merchandise ourselves. As the talk "seemed" rather serious, and the thought of running an e-commerce engine that handled both both merch and digital subscriptions sounded really fun, I decided to incorporate many of the merchandising aspects into my design up front. This problem number one.

As for the platform itself, it is a collection of synchronous web services and asynchronous message publishers/listeners. The web services are most SOAP-style, but there are a few REST services, as well. (I don't want to discuss the advantages of one web service technology over the other here, just the costs/implications involved with choosing one of them.) On the web services, we have a fully defined WSDL/XSD contract. For the messaging components, it's more-or-less a collection of name/value pairs (expressed as XML) going across the JMS, although some of the value elements are full XML documents conforming to a defined XSD. We found that specifying specifying the service interface and data entities via WSDL and XSD was helpful for defining the XML documents that would be exchanged between client and service, but that it's application was limited outside of that scope. This is problem number two, and the reason why it is a problem is largely dependent on problems number one.

The assumptions (and aspirations!) I started with unfortunately did not bear as much fruit as hoped. Let's start with problem number one. In my desire to design and build out a bad-ass e-commerce system, I jumped in full force. Given the requirements I had, I created what I call the "Big Vision Up Front". It's sort of a mix of Big Design Up Front (a/k/a/ the waterfall model) and the agilists' notion of YAGNI. Basically, I aggregated all the requirements and derived a vision of the future platform. This did not include hardcore UML diagrams or extensive documentation, but lots of collected thoughts about the various models and interactions amongst them. Then, as a given set of features that we knew were needed for the next project, we would build them out.

OK, this all seems cool so far, build components we need when we need them based on the original vision. Unfortunately, in the initial build out (call it phase 1), I built in more functionality, more of that big vision, than was necessary at the time. In other words, we overbuilt expecting that the unused components would be "needed soon". For some components that has turned out to be true (it may have taken a year or two, but they are finally being used). For others, it feels like random appendages hanging off in semi-sensibile directions. Of course much of this was due to my enthusiasm to create something big and great, but what I failed to recognize and understand was the nature of the business side of our, and really any, company. The business folks are, on the whole, nice people and are trying to do their jobs well. Unfortuantely, I didn't apply enough discretion to understand the business vision and goals in both the short and long terms and how I could apply my technical skills to best serve that. Hence, we overbuilt some things. Nothing awful or completely atrocious, but some design and code are growing a funky odor...

So, the lesson for problem one is to understand your business. Figure out what is really being asked for, both in the short and long term, and learn how to marry the two. Of course, this kind of knowledge only comes with experience, so I think I've begun down that path of enlightenment.

As for problem two, the purely technical matter: as part of taking over the merchandising site, we would have needed to integrate in with about 75 external partners (manufacturers, shippers, and such) and several other entities for user management, e-commerce, and miscellaneous services. With this in mind, we created full-blown WSDLs and XSDs (by hand - it's not that hard) as our service interfaces. Well, we built it, but nobody came. In fact, our only service consumers are internal. Now, having services even just for our internal clients (web-apps we host and so on) has turned out to be a great boon to seperation of concerns and being able to upgrade some of our sites without a big-bang release, there has been a development cost. We need to create the WSDL (basically define a few operations - we follow the SOAP Messaging style, not SOAP XML-RPC disaster), define the data structure in XSD, then create XML to object mapping files (we use JiBX) so everything gets populated into our domain models objects. This style isn't so bad once you've learned WSDL, XSD, JiBX, proper domain modeling, ...., but, there is a cost whenever we want to add a new operation to a service or create a new service. Everytime this happens, we keep looking at each other saying, "Something's just nor right here...". I feel that once again, we didn't understand our business fully and ended up satisfying their initial requirements but perhaps added in more infrastructure costs than what we really needed to incur.

So, the lesson for problem number two is to understand your business and know how to apply how the most cost-effective technology you can. By cost-effective, I don't necessarily mean cheapest in terms of licensing and deployment costs (although these factors cannot be totally discounted), I'm really talking about knowing your development team - the poor engineers who will have to implement and live with the architectural decisions. Ruby on Rails might (might?) be the best solution for a project in isolation, but if all your developers are Java cats with no Ruby (let alone Rails) experience, the cost of the project just went up. Similarly, the WS-* stack contains many inherent costs just to get something up and running.

To sum up, architects needs to keep eye on the business folks and the engineering resources who will be responsible for the building and ownership of the software we help them manifest. Otherwise it's a gonna be a long time in refactoring land....

Monday, January 28, 2008

Simplifying service interfaces - avoid database ids

Let's assume you have a data-centric (i.e. CRUD) service for retrieving product information. For the sake of simplicity, let's assume an XSD that looks like this:

<complexType name="product">
<sequence>
<element name="sku" type="string"/>
<element name="name" type="string"/>
<element name="description" type="string" minOccurs="0"/>
<element name="product-type" type="product-type"/>
</sequence>
</complexType>

<simpleType name="product-type">
<restriction base="string">
<enumeration value="apparel"/>
<enumeration value="subscriptions"/>
<enumeration value="digital_video"/>
</restriction>
</simpleType>


I'm defining a simple product that includes a product-type enum value, so a sample document would look like:

<product>
<sku>SHG876SHFW</sku>
<name>Navy Blue T-Shirt</name>
<description>A nice, blue shirt for your enjoyment!</description>
<product-type>apparel</product-type>
</product>

We explicitly want to keep the service interface simple and free of how things are represented within the service - meaning, not exposing database ids where they have no apparent meaning to a client of the system. Exposing the id of an entity like product seems reasonable, but is unconvincing for items such as lookup values, which in my designs have a unique integer id as well as a unique, human-readable name. The names should be immediately understandable to clients rather than needing a secondary lookup on their own end. For example, using "apparel" is much easier to understand than "4".

These lookup values I tend to treat as enum values in XML/XSD, and in the database they get their own row within a lookup table.

By asking to clients to pass in both an id and a unique name is bound for disaster - this typically happens with auto-generated code/service interfaces. So, for the simplest and easiest way to integrate, and to hide more of the internal nuts and bolts, I just use enum values in the interface.


Thursday, December 27, 2007

Simplifying service interfaces

When I started building out services, I was lucky enough to miss out on the whole notion of distributed objects - the idea of treating a remote object as if it were local. I did, however, start off with thick clients and (XML) RPC-style style calls, using JAX-RPC (<shudder>). In this style, of course, all the operations are very fine-grained, even when loading a single entity. After knocking my head on this for almost a year, and as REST and asynchronous messaging was coming into vogue, I decided to rethink my approach to service interfaces/contracts.

In my new dircetion, I choose to create services based around an "entity"; for example, a product. Essentially, the Product Service and similar services are data-centric. I don't quite want to call these CRUD-style services per se, but in the long run, perhaps they are. In any case, the manner in which I've designed the interface uses the notion of the entity, as a whole, for both input and output parameters. Let's look at the operations I've defined on the Product Service:
  • Product store(Product product)
  • Product find(Product product)
  • Product[] search(Product product)
(Note: Just because I've designed my service interface like this, it does not necessarily mean there is a one-to-one correspondence with the gateway, a/k/a facade, into my domain logic.)

As you can see, there are only three operation declared on this service contract: store, find, and search.

The store operation does what is implies; it stores the product being passed to it. Where this gets interesting is that this operation will do either a SQL insert or update, yet that detail is invisible to clients. In this way, clients do not need to be concerned if this is a new entity or an update, and thus is relieved from worrying about any corresponding semantics between an insert or update (or PUT or POST in REST lingo). Further, we have an idempotent operation.

The find and search operations are quite similar. find is intended to locate a given entity by unique property of the entity. In this example, a SKU field or ProductId could be used as a unique field; this does, of course, require clients to know which fields are unique. Hence, find should only return one, distinct entity. search, on the other hand, can return zero to many entities. The output parameters to these operations should be clear enough, but let's talk about the inputs. You'll notice the interface requires an entity instance to be passed to it. I choose to employ a "query by example" style of interface for the service. This reduces the size and complexity of the interface because I do not need to create separate operations like:
  • Product findById(Long id)
  • Product findBySku(String sku)
  • Product findByName(String name)
  • ... and so on
In this paradigm, the client only populates the fields it is interested in using in a lookup operation. For example, for a find operation, the client can just populate the ProductId or SKU field, and pass that to the service (obviously, with XML marshalling and unmarshalling in between). Similarly, for a search, the client can populate the price and/or other non-unique fields. (This interface does not account for the situation where you want to have multiple or ranges of values to search against for one field; for example, products whose price is greater than $4.95 and less than $9.95. As I don't have this business case right now, I haven't bothered to account for it.)

The next question you should be thinking is, "Why have two operations, find and search? Couldn't you accomplish the same with just one operation (like what you did with store)?"

Hmmm ... yeah, the longer I live with this duality, the less in favor of it I've become. Having two operations that, more or less, do the same thing seems not so optimized to me, especially as I went out of my way to make store a simple notion. Having this duality does require the client to distinguish between the two styles of entity lookup, but actually, that was part of the argument for the separation back in the day: If you know the exact entity you want, call operation A with the specific unique fields populated and you will only get back what you expect; else, call operation B and get an array. I think the argument does make sense, but I'm beginning to feel that perhaps the simpler interface is more expressive. If I merged the two operations, the service interface would become:
  • Product store(Product product)
  • Product[] find(Product product)