Software Carnival: 2007

Thursday, December 27, 2007

Simplifying service interfaces

When I started building out services, I was lucky enough to miss out on the whole notion of distributed objects - the idea of treating a remote object as if it were local. I did, however, start off with thick clients and (XML) RPC-style style calls, using JAX-RPC (<shudder>). In this style, of course, all the operations are very fine-grained, even when loading a single entity. After knocking my head on this for almost a year, and as REST and asynchronous messaging was coming into vogue, I decided to rethink my approach to service interfaces/contracts.

In my new dircetion, I choose to create services based around an "entity"; for example, a product. Essentially, the Product Service and similar services are data-centric. I don't quite want to call these CRUD-style services per se, but in the long run, perhaps they are. In any case, the manner in which I've designed the interface uses the notion of the entity, as a whole, for both input and output parameters. Let's look at the operations I've defined on the Product Service:

Product store(Product product)
Product find(Product product)
Product[] search(Product product)

(Note: Just because I've designed my service interface like this, it does not necessarily mean there is a one-to-one correspondence with the gateway, a/k/a facade, into my domain logic.)

As you can see, there are only three operation declared on this service contract: store, find, and search.

The store operation does what is implies; it stores the product being passed to it. Where this gets interesting is that this operation will do either a SQL insert or update, yet that detail is invisible to clients. In this way, clients do not need to be concerned if this is a new entity or an update, and thus is relieved from worrying about any corresponding semantics between an insert or update (or PUT or POST in REST lingo). Further, we have an idempotent operation.

The find and search operations are quite similar. find is intended to locate a given entity by unique property of the entity. In this example, a SKU field or ProductId could be used as a unique field; this does, of course, require clients to know which fields are unique. Hence, find should only return one, distinct entity. search, on the other hand, can return zero to many entities. The output parameters to these operations should be clear enough, but let's talk about the inputs. You'll notice the interface requires an entity instance to be passed to it. I choose to employ a "query by example" style of interface for the service. This reduces the size and complexity of the interface because I do not need to create separate operations like:

Product findById(Long id)
Product findBySku(String sku)
Product findByName(String name)
... and so on

In this paradigm, the client only populates the fields it is interested in using in a lookup operation. For example, for a find operation, the client can just populate the ProductId or SKU field, and pass that to the service (obviously, with XML marshalling and unmarshalling in between). Similarly, for a search, the client can populate the price and/or other non-unique fields. (This interface does not account for the situation where you want to have multiple or ranges of values to search against for one field; for example, products whose price is greater than $4.95 and less than $9.95. As I don't have this business case right now, I haven't bothered to account for it.)

The next question you should be thinking is, "Why have two operations, find and search? Couldn't you accomplish the same with just one operation (like what you did with store)?"

Hmmm ... yeah, the longer I live with this duality, the less in favor of it I've become. Having two operations that, more or less, do the same thing seems not so optimized to me, especially as I went out of my way to make store a simple notion. Having this duality does require the client to distinguish between the two styles of entity lookup, but actually, that was part of the argument for the separation back in the day: If you know the exact entity you want, call operation A with the specific unique fields populated and you will only get back what you expect; else, call operation B and get an array. I think the argument does make sense, but I'm beginning to feel that perhaps the simpler interface is more expressive. If I merged the two operations, the service interface would become:

Product store(Product product)
Product[] find(Product product)

Thursday, December 6, 2007

Playing with Hibernate Shards

After hearing a lot of discussion at QCon and elsewhere about the notion of database sharding (a new, hipper name for the older, more mundane name "partitioning"), I decided to download the Hibernate Shards project and give it a whirl (what else do you do when you can't sleep?). I wrote an earlier blog entry on running multiple MySQL instances that will help when playing with Hibernate Shards.

As I'm reasonably proficient with Hibernate, I figured that I could be able to jump in and get a pretty good understanding just with a few days effort. Luckily, I was correct - my basic assumptions and paradigms for using Hibernate remain mostly intact (Configuration, SessionFactory, and Session object are almost the exact same). The details, of course, lie more in the distribution if data between multiple database instances.

Just a few quick Hibernate tech notes before talking about the data implications. You need to have a little bit more going on in your Hibernate config file(s), one file for each shard. Nothing big there. When creating a SessionFactory, you have to provide implementations for the following interfaces:

ShardAccessStrategy - a strategy for accessing sharded databases for queries (not loading an item by it's id). The provided implementations offer either sequential (including a round-robin, load balanced version) or parallel access.
ShardResolutionStrategy - a strategy for determining which shard to access when loading an entity.
ShardSelectionStrategy - a strategy for determining which shard to store a new entity in.

As relates to data, let's first discuss the idea of entity ID generation. As your data is now spread out multiple databases, you really can't use a database sequence for generating ids - unless you restrict each database's valid sequence values to a restricted range (db1 gets ids 0 - 3000, db2 gets ids 3001 - 6000, and so on). Somehow, this sounds easy but rather self-limiting. Out of the box, Hibernate Shards provides two mechanisms for managing ID generation:

ShardedUUIDGenerator - this implementation basically creates a large random number, and also encodes the shard id into the id. What you get is a really big number (my sample tests had about 30-36 digits each).
ShardedTableHiLoGenerator - this implementation creates a hilo table in one of the shards and uses that to generate all IDs. Of course, the shard then becomes not only a bottle neck for performance, but is a single point of failure.

After playing with it and thinking for a few days, the UUID generator seems the better choice, as long as your db is cool with big primary keys. MySQL worked fine with this, but I needed to make the column of type numeric(64,0) - 64 may be a bit large, but it made my sample code work so I left it. Of course, as your entity's id is shard specific, all the data related to that entity must reside on the same shard. This makes total sense and is definitely what you want from both management and performance perspectives.

Of course, deciding exactly how spread data amongst the shards is a big decision. Unfortunately, Hibernate Shards does not provide a facility to live resharding (not that it would be easy, mind you). Essentially, you would have to do some slight-of-hand while then data was being moved/redistributed, then update your Hibernate configurations and entity to shard mappings. Hibernate Shards does, however, provide a concept of virtual shard ids, and you can point multiple virtual shards to physical shards. This seems like a good idea as you can define many virtual shards which can map to just to two or three physical shards, then as you grow, it's only a slight config update to repoint the virtual shard. The documentation says "Virtual shards are cheap", so it's probably reasonable to create a bunch of virtual shards as long as it doesn't hurt performance.

Hibernate Shards does have a few gotchas:

HQL is pretty much unuseable at this point, and there seem to be some incomplete pieces with Criteria queries. This is mainly due just to the immaturity of the project; I expect they'll be coming along soon.
Master values from lookup tables. As Hibernate demands one and only one instance of an object with a given id, if you use master objects from a lookup table, you need to be careful that you don't get into strange cross-shard problems. This should only happen when saving/updating, but it is something to be aware of. For some advice about this problem, I have another blog entry about that.

All in all, Hibernate Shards is a rather nice wrapper for hiding all the complexity of spreading your data around multiple databases. As it's still in beta, I'm not sure if it's production-ready. As a web site, especially e-commerce sites, are more than just read-repositories for front-end HTML pages, there may be a need for more robust functionality before making the dive.

Running multiple instances of MySQL on the same machine

This may be documented elsewhere on the internet, but I thought I capture my own notes about how to setup and run multiple MySQL instances on the same box. I was playing with this mainly for experimenting with sharding, in particular with the Hibernate Shards project). I'm going to make several assumptions for this discussion: that the databases will, data excluded, be mirror of the other in terms of essential config and schema, that we're running on *nix (I'm using solaris x86), and that we're setting up two MySQL databases.

After the obligatory "download and install", create two directories that will hold all of the database files. For simplicity, I created these directories in the MySQL home directory. Next, and this is the most important part, create two separate MySQL configuration files, one for each database. You only need to specific the unique attributes per instance, which include the directory for files (datadir), socket, port, and the name of the pid file. Here's my first config file:

[mysqld]
datadir=/opt/csw/mysql5/shard0
port=3307
socket=/tmp/mysqld-shard0.sock
pid-file=/opt/csw/mysql5/shard0/pid

And here's my second:

[mysqld]
datadir=/opt/csw/mysql5/shard1
port=3308
socket=/tmp/mysqld-shard1.sock
pid-file=/opt/csw/mysql5/shard1/pid

As you can see, I've just tweaked the values slightly for the second instance.

The next thing to do is setup each database with the proper MySQL master tables and db files. As this is the standard way to setup a database instance, here's the command you need to run twice, one time for each insatnce:

<mysql_home>/bin/mysql_install_db --datadir=/opt/csw/mysql5/shard0
<mysql_home>/bin/mysql_install_db --datadir=/opt/csw/mysql5/shard1

All you need to do now is start up both instances, and you are good to go. The only special thing you need to when starting up is pass the name of the configuration file, like this:

<mysql_home>/bin/mysqld_safe --defaults-file=/opt/csw/mysql5/shard0/shard0.cnf

Note that, and this comes straight from the MySQL documentation, the defaults-file flag must be the first parameter on the command line.

And that's it!

Initial observations in India

(5 Dec 2007)

I should talk about the area where Tavant's office is located. This is a suburb of Delhi called Noida, and from our location it's about a 20 minute drive just to get to the Noida-Delhi border. Until ten years or so ago this area was just a rural village. But then some developers started building roads and office buildings, and starting two or three years ago, many software companies have moved into Noida. Adobe just completed a large building here (looks quite nice, actually). So, Noida has this interesting mix (which I have out is quite common in India) where you have the local rural peasants and pieces of Westernization/modernization - roads, offices, shopping malls (sooo many of these now in India) occupying the same space as very poor people, stray dogs, and, of course, the ubiquitous sacred cows walking everywhere, including in the middle of heavy traffic. In my high school studies, I remember discussing the notion of cows and how they treated as sacred animals and how they can walk anytwhere and have the right of way. I never really expected to see them in the middle of Noida's nasty traffic-thick roads, but they really go anywhere and nobody can (or does) move them, hit them, or ignore their presence.

Also the driving here in India is pure chaos. If you think New York City cabbies are crazy, that's nothing compared to this (although, NYC cabbies do make more sense to me now as a lot of them are from India). Basically, there's no real notion of a driving lane here. Yes, there are lines painted on the ground, but the are wholly ignored. With sooo many bikes, rickshaws, auto rickshaws, buses, trucks, two-wheelers (motorcycles), and cars all sharing the same road, and everyone is swerving and switching lanes and passing, you'd think I would have seen at thrity accidents by now. Often the distance between the car I'm riding in and anything else on the road is less two inches. All this much is not too bad, but on top of it all, everybody is honking their horn. It gets insanely loud when traffic slows down.

Through my years of hearing about India, talking about the culture and food, I always had the feeling that just being here would be a spiritual experience. I must admit that for the first few days, I really don't feel like I'm in that India that I had imagined. Mahesh and I were talking about this, and he says that the India of today is not the country he grew up in (he's from the south in Madras). He says that life and culture used to be much simpler, but now with so many cars now (compared to just ten years), all the Western-style shopping malls, and the changes to family life (mothers working late-night call-center jobs, for example), that Indian life has become much more complicated, at least here in Delhi. So, he definitely understands my lack of spiritual-bliss-upon-arrival feeling and he recommends that we travel outside of Delhi to see more of the country side and religous sights where we can experience the more traditional side of India. I have found it interesting, though, that both of us, a native Indian (who has been away from India for more than seven years) and a dumb mid-Western guy, both have the same impression. Not that there is anything wrong with change or modernization, but we both fear that maybe something is uniquely Indian is being lost in this shuffle. I wonder if this is what happens to all countries on the route to modernization?

Wednesday, December 5, 2007

India Trip - Day 1, part 2

(evening of 2 Dec 2007)

After taking a much needed nap, we were met by our friend Navin Goel, who is an employee at the company we are visiting here in India called Tavant. Navin picked us up in the afternoon and was took us to a local Hindu shrine call Akshardahm. Wow, that place is really stunning and quite beautiful. Akshardahm is a temple dedicated to an eigthteenth-century Hindu priest named Swaminarayan, and the complex has several large statues in his honor. The is some truly amazing sand stone and wood work that was done - the sand stone elephants scenery around the main temple is absolutely beautiful. Interestingly, Akshardahm was just completed two years ago, so it's extremely new. As night fell, there is a water works display with lights and music that follows the Hindu beliefs for the creation, duration, and end of life. It was quite elegant and nothing garish.

After Akshardahm, we went for dinner at the Radisson hotel nearby our guest house area. Mahesh thought we should be cautious and ease our way into the local food here in India - a wise choice. We ate at the buffet, and the food was extremely tasty. We were all pretty happy with the meal, and I finally had my first real meal in India of Indian food! After dinner Navin dropped us back off at the guest house and I promptly fell asleep for the night.

Sunday, December 2, 2007

India trip - Day 1

(2 Dec 2007)

Well, we finally arrived in India! Mahesh and I are here to give training seminars to the employees of the outsourcing/staff augmentation company we work with here In Delhi, Tavant. We left Friday evening from New York JFK and arrived in Brussels on time (7:30 am on Saturday). Then in Brussels there was a three hour delay due to some power outages - argh. We got into Delhi at 2:30 a.m. on Sunday, and after a not-too slow immigration line, we waited an hour for our bags to come through in baggage claim. Then, another hour and a half from the airport to the guest house in Noida (about 14-15 miles from Delhi proper). I then passed out at about 6a.m. for six hours on the bed! The two legs of the flight were each about 7 or 8 hours in length, so a more bite-sized trip compared with the straight 12 hours from New York to Tokyo.

We flew Jet Airways, and must say I was very impressed. The Indian food they served on the flight was great - hands down the best in-flight meals I've ever eaten. The seats were quite comfortable and in addition the the standard bask recline, the seat moves forward so you can get a few more degrees closer to flat (it's still coach, though).

This afternoon we're just relaxing here at the guest house and will be picked up by Navin, one of our guys at Tavant. Not sure what the plan is, but hopefully something fun!

Wednesday, November 28, 2007

Survival Guide to Internationalization (I18N)

As software engineers, we typically don't bother with many "things you should design for" until they become a necessity. Things like security and internationalization fall into that bucket. The agilists would say, "Of course, stupid, don't build it until you have a business needs for it" (sorry for my poor paraphrasing of a well-thought out paradigm, but you get the picture). The older and less un-wise I get, that advice of the agilists' sounds more and more correct. Alas, when the day comes to tackle the issues that the software gods (of old) admonished us to (security, I18N), it's time to jump in and rip a new hole in your domain logic.

As regards I18N and the purpose of this blog entry, I recently completed a major upgrade for a major U.S. bank on their loyalty card reward program web site. It's essentially an e-commerce site with a product catalog and a checkout flow where you redeem earned loyalty points for goods/events. One of the primary requirements was to "create a Spanish version of the site" (it was previously only in English). I have some background with I18N/L10N (that is, localization) and knew some of the basics of the problem space (learn to love Unicode - a disputable assertion in some parts of East Asia -, message properties files, ISO language codes, and so on), so I dove in and learned as much as I could and still get paid for it.

What follows is a synopsis of all the technical decisions I made to get the beast up and running. I'll offer some tangential advice along the way and point out some gotchas and other merriments I discovered along the way. To give some sort of form to this description, and also to trace my approach to the problem, I'll start all the way at the back-end database and move up through the layers to the JSP/HTML rendering layer.

Database
As the database we always use is Oracle, I was able to get some built-in features for free. The biggest was that Oracle supports UTF-8 as the default encoding for a schema; I remember a few years back that MySQL was only supporting ACSII, more or less, but that may have changed with the 5.x version. So, when creating the Oracle schema, you need to make the following settings:

NLS_CHARACTERSET = UTF8
NLS_NCHAR_CHARACTERSET = UTF8

The NCHAR character set acts like a secondary character encoding for a schema. From what I remember, though, it's largely deprecated. Luckily for me, my DBAs already set up databases and schemas to use UTF-8, so it was easy for me (unfortunately, I don't have an example to show here, though). To confirm the schema parameters you can execute the following query:

select * from v$nls_parameters;

As to clients who actually want to retrieve anything meaningful from the database, the executing Java process must the following environment variable defined:

NLS_LANG=AMERICAN_AMERICA.UTF8

Note that you can only set this as an environment variable and not as any JVM argument (I tried many, many times to get it to work with JVM args - why not, Oracle?).

Now that the database data dictionary is in place, you'll actually want make the data within your database retrievable in a locale-specific manner, so here are a few suggestions. If you are working with a normalized data schema, you will probably have a product table that starts off like this:

Pretty straight-forward. Now, a possibility for making locale-specific data can be this (click the image for a larger view):

We've moved the user-displayable data into the product_details table and made it local-specific by maintaining a foreign key to the locales table, which is a lookup table for all of the system-defined locales (which may or may not correspond to the ISO spec - but why not?).

This is the model I went with as it seemed like if I was going to continue with a normalized schema, may as well go whole hog. Of course, there's many pros and cons to this proposal, but I the main impetus for me on this project was to maintain a standard approach to database design, else you get mixed styles and nothing feels right (or cohesive). Looking back, as this product service is mainly a read-only sink, I'd probably do a bit (or a lot) of denormalization to really boost read performance - but couldn't a well-behaved/properly-designed cache net me the same performance boost/load reduce? Ahh, digressions....

On the other hand, for those rugged individuals who are moving to (or starting) working in a denormalized schema, if you start with the "before" version of table definition from above, the table below would be the "after", localized version.

Essentially, for every locale-variant of the product you will have another row in the table. Hence, you will need to drop the uniqueness constraint on the SKU column, as well as adding in locale-identifying fields (ISO_*_CODE columns).

Streams and Strings and character arrays, oh my...

Here's a little bit of handy advice: when using any kind of java.io class (like a reader/writer, or input/output stream), one thing you MUST always do is set a character encoding (if the API allows), and always set it to "UTF-8". For example, when you need to get the byte data from a String (to write to an output stream, for example), always use the overloaded operation for String.getBytes() that takes a string encoding parameter:

byte[] b = myString.getBytes("UTF-8");

If you do not do this for streams, according to the JavaDocs, the JVM will use the 'platform default' for encoding/decoding character data. Most likely, this will be ACSII, or at best, ISO-LATIN-1. Hence, all your well-laid plans for I18N will get shot to hell if even one subcomponent fails to do this. The biggest problem, of course, is tracking the source of the problem.

Another thing, don't trust log files or standard output as those already have their own streams that are most likely not setup to help you out when debugging encoding issues (I'm looking at you, log4j). You can, of course, write out a straight binary dump of a String or XML file to a flat file on your disk and examine it with a hex editor (and yeah, I've been there - ouch!) Unfortunately, this is the only way to absolutely ensure that data is correctly UTF-8 encoded sometimes.

Service

As our enterprise moved to a SOA style of distributed services, the product service stands between the client web application and the database. Luckily, my web service environment, Axis2, already handles all streams as UTF-8 encoded. What was interesting, however, was deciding whether or not to expose the increased complexity in the database to the domain objects and ultimately the service interface. To save a rather lengthy and not altogether thrilling discussion, I choose to isolate the complexity to the database (and Hibernate mappings/DAO layer) and keep the domain objects and service interface largely untouched. The client essentially just needs to indicate if it wants a locale-specific version of a given product, and if the product/localization combination exists, that "limited-scope view" of the product is returned. There are several secondary-level and edge cases to worry about, but this is already getting to be a rather lengthy entry.

Web application

The closer we get to the top, luckily, the going gets much easier as all the hard lessons should percolate upward wuite nicely. The biggest thing to be aware of with the web-app is to ensure all incoming data from clients, either as form data or query parameters, is to ensure that the stream is read as UTF-8 characters (this is also known as URIEncoding). I did this rather cheaply in a servlet filter; the interesting part of which is:

httpServletRequest.setCharacterEncoding("UTF-8");

JSPs/HTML pages

The client/browser needs to be made aware of the content type coming at it. This can be set in several ways:

as HTTP response header: Content-Type: "your_mime_type/your_mime_type; char-set=UTF-8"

in a flat HTML document, within the head/meta element:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
....
</html>

lastly, in a JSP:

<%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%>

Ajax call

Alright, now that we've come all this way, we actually want to render a product on the page. Surprise, surprise, the product data is not returned as part of the normal page load, but instead via an Ajax call. We, as an e-commerce team, were, and to a degree still are, in the process of reevaluating how the front-end code (that is, what the JavaScript and HTML cats work with) interacts with objects coming from a service (products, campaigns, and so on). So, in the spirit of copying what all the cool kids are doing, we decided to call a servlet (an extension of our product service, although appendage may be a more apropos label) to load the product data, via shared domain logic from the service proper, and return it as a JSON string.

Admittedly, this JSON servlet is kind of a bastardized REST endpoint (meaning, it's not really REST, but I'll lie to myself, anyways). I had initially gone down the road of creating a fully RESTful web service ala Resource Oriented Architecture (ROA), but then two things got in my: the need to pass multiple meta-data points in addition to the main resource URI (which, for space considerations, I'll add to a seperate blog entry), as well as an organizational one: my fellow architects at my company, one in particular, was quite opposed to the notion of ROA, he and I sparred for a few days, then I gave in because I had a project to finish. That being said, I still incorporated many of the ideals, if not actual practices, into my bastardized REST service.

A couple of choices I made about the design of the "REST" service are:

input

put a version tag in the URI itself
added the locale information as HTTP params (see the companion blog entry about my confilcts on this issue)

output

once again, make sure you write a UTF-8 encoded stream
set the Content-Type HTTP response header
setting the Content-Length HTTP header works great for debugging

Also, when testing out RESTful endpoints, the Unix tool curl was insanely helpful to do sanity checks when testing the endpoint (in any odev, qa, or prod environment), as it's hard as hell to debug the output of a servlet when you are setting headers, setting response codes - ya know, all those REST-type things we should be using now that we've all rediscovered the Joy of HTTP. You could use FireBug in FireFox, but that's just too damned easy.

Whew, that's a lot information and if you've made it this far, it's time for a (strong) drink. This entry is really the culmination of about three months of work, but many of the ideas had been fermenting for a long time. With a little bit of reflection and hindsight, there's some things I wish had done better, more efficiently, and so on, but I feel rather proud of the project. Let me know if any of this was helpful, a complete waste of time, or if you've got some ideas you'd like to share!

As a footnote, one reference that was insanely helpful was this article on the Sun Java web site.

Meta-data and RESTful web services

I'd like to think out loud about REST and URIs and ROA. The concepts behind REST are ideas I am really thinking about lately, but there some issues I find in reticifying it with my current software needs. Let's say I have a URL/URI:

http://www.my_domain.com/item/123

which represents the addressable location of the item I am interested in (uniquely, and cleverly, identified by the identifier "123"). What happens when my service undergoes many updates and versioning needs to be introduced? I could add a version tag into the URI like this:

http://www.my_domain.com/item/v-1.2/123

Although the question remains: is this a version of the resource, or of the mechnism via which I access that resource? Let's leave that question alone and move on to more practical matters.

Let's say I now need to indicate a return (mime) type. The resource may be available as a JSON string or an XML document. Naturally, or at least what we've been trained to do, is to stick a well-known extension to the end of the resource name, as in

http://www.my_domain.com/item/123.xml

and if I combine that with my versioning:

http://www.my_domain.com/item/v-1.2/123.xml

OK, this seems relatively reasonable, but what happens when I introduce another piece of meta-data, say locale? Where, on my last URI, do I "RESTfully" add this meta-data? After the mime extension?

http://www.my_domain.com/item/v-1.2/123.xml.en_US

in the URI itself?

http://www.my_domain.com/item/v-1.2/en_US/123.xml

This version seems a little weird, so let's skip that one and just consider locale at the end of the URI.

If we choose a convention for location of these points of meta-data and stick with it, we could be OK. But then what happens when doesn't care or assumes a default resonse type and drops the mime extension? The URI could becomes

http://www.my_domain.com/item/v-1.2/123.en_US

So now the service needs to figure out if the "en_US" is a document/mime type or a locale definition. Yeah, you could create a bas-ass regex engine to derive all the possible variants, but would you really want to?

I think I could go on and on in this vein, adding in more meta-data points (although I don't really have any more at this time), but suffice it to say that if your needs are simple (you don't care about versioning or locale or any of that other accoutrement), you could probably go through life quite happy with our original URI

http://www.my_domain.com/item/123

But when you need more expressiveness, I'm just not sure if making the URI bend over backwards (huge caveat) without creating your own, custom URI rules/syntax is going to get the job done in a reasonable, RESTful manner. I'm just not sure...

Tuesday, November 20, 2007

Test driving Scala

At the QCon conference in San Francisco two weeks ago, I heard Brian Goetz speak about concurrency, it's past and where we're at now. It was a really great session (I should probably blog about it in the future), but one of the problems he identified with the concurrency in Java is the notion of shared state between threads. Long story short, some alternate approaches to concurrency and safety include the techniques that functional languages like Erlang and Scala are taking.

In the interest of tooling around late night after my wife and son have fallen asleep , I've been poking around with Scala. I'm just getting started really, but one of the nice things they have included in the basic download kit, available from the scala-lang site, is an interactive command line console. It works very similar to ruby's irb. Type in a command, and watch it instantly get pissed off about your syntax! Very handy for getting your feet wet - in between late night diaper changes.

More info later. Will probably play around some more on my trip to India next week, as I'll be there for two weeks.

** UPDATE: Just in case you are crazy enough to develop on open solaris (on an x86 laptop, nonetheless), you will need to hack the shell scripts that launch the various command line utilities. Basically, if you aren't doing anything nuts, you can just yank the setting of the JAVA_OPTS env var where it invokes the JVM (typically on the last line of the line). I just kept the min/max JVM sizes as is (-Xmx, -Xms). Cheap and easy, but it works!

There is an open ticket on the scala site as a known issue, but this hack will get you out of bash complaining "bad interpreter" everytime you try to launch any scala command.

Monday, November 12, 2007

Backwards compatibility in services

With all the hoopla over web services (including WS-* and REST varieties) and asynchronous messaging, there seems to be little discussion on how to make these services (I'm using the term very generically here) compatible with existing service consumers. This issue is really not a problem for vendors or tools providers to solve, as it really pertains to the problem domain at hand and how the engineers and architects want to manage it. In other words, it's yer own damn problem.

After speaking with engineers and architects about how they approach service evolution, there seem to be some approaches that are common:

the no-harm, no-foul approach
the big-bang approach
the multiple service instances approach
the maintain-it-yourself approach

First up, the no-harm, no-foul approach (a/k/a close-your-eyes-and-hope-everything-is-cool) updates the services and expects the clients to "just work". Funny, life just doesn't work out the nicely. What frequently happens is that after after the new service code rolls into qa (or, god forbid, production), clients start breaking - and then it's either revert the service or update the rogue clients.

Second, the big-bang approach is to update the service and clients at the same time. While certainly handy, you need to be in control of both the service and all of the clients, then have qa and systems teams who can accommodate this approach. If all the clients are internal, this might not be a bad approach necessarily, but that point notwithstanding, I've found that either not all the clients get properly updated (either a code migration problem or forgetting to update some client's code) or, more likely, not every client was tested thoroughly.

The multiple service instances approach is perhaps the most deployment I've seen. When you need to release a new version of the service, just deploy a new web-app with the updated code and a new URL (typically, a version number will appear in the URL itself). On the surface, this works rather well as old services are not removed (hence, their clients keep on working as they should), but I feel there are two major drawbacks to this. The first is the database. Assuming the service uses a relational database (as most do), database schema are not backwards compatible (unless you really work at it - triggers, synonyms, and so on. Check out Scott Ambler's Refactoring Databases: Evolutionary Database Design; it's truly brilliant). The second problem is resource consumption - memory, disk space, power, etc. While appearing most obviously for small shops, many medium to large enterprises can ignore these problems. However, large industry players like Dan Pritchett, architect at eBay, are now intimately concerned with these issues as they build out more and more redundant data centers.

Lastly, the maintain-it-yourself approach deploys just a single version of the service (the most recent one), and it is able to successfully process and reply to clients of at all version levels. Typically, a service version is indicated somewhere by the client (in the URL, XML namespace, and so on), then the service uses that piece of information of aggregate the incoming parameters as needed in a version-specific manner, perform its normal functions at the current operating version level, and finally return the result to the client in the version they are expecting. This approach clearly has impacts on the engineers that need to create new functionality yet maintain backwards compatibility, and that is the biggest cost. It could be argued that the processing cycles required to calculate requested version, convert up, process, then convert down could be a drag on the performance. I tend to think that with everything else going on in a service, this cost should not be a significant factor - or something else squirrly may be going on. This approach does, however, allow the database to evolve at a natural pace, and this cannot be understated (at least, while we use databases in the manner in which we have traditionally).

As a full disclaimer, I have, consciously or unconsciously, used all four of the these approaches at one time or another - and in the order of discovery as laid out above. After the initial birth of the body of services, I naturally needed to keep adding functionality. After making a whole lot of mistakes (I screwed up the production servers/services/web-app and databases several times, ) in the process of maturing the services, the best solution for my system was to maintain backwards compatibility for all clients. Granted, because we have quite a good view of what/who all the clients of the services are, so we can be a bit more proactive in phasing out the old client implementations.