Software Carnival: December 2007

Thursday, December 27, 2007

Simplifying service interfaces

When I started building out services, I was lucky enough to miss out on the whole notion of distributed objects - the idea of treating a remote object as if it were local. I did, however, start off with thick clients and (XML) RPC-style style calls, using JAX-RPC (<shudder>). In this style, of course, all the operations are very fine-grained, even when loading a single entity. After knocking my head on this for almost a year, and as REST and asynchronous messaging was coming into vogue, I decided to rethink my approach to service interfaces/contracts.

In my new dircetion, I choose to create services based around an "entity"; for example, a product. Essentially, the Product Service and similar services are data-centric. I don't quite want to call these CRUD-style services per se, but in the long run, perhaps they are. In any case, the manner in which I've designed the interface uses the notion of the entity, as a whole, for both input and output parameters. Let's look at the operations I've defined on the Product Service:

Product store(Product product)
Product find(Product product)
Product[] search(Product product)

(Note: Just because I've designed my service interface like this, it does not necessarily mean there is a one-to-one correspondence with the gateway, a/k/a facade, into my domain logic.)

As you can see, there are only three operation declared on this service contract: store, find, and search.

The store operation does what is implies; it stores the product being passed to it. Where this gets interesting is that this operation will do either a SQL insert or update, yet that detail is invisible to clients. In this way, clients do not need to be concerned if this is a new entity or an update, and thus is relieved from worrying about any corresponding semantics between an insert or update (or PUT or POST in REST lingo). Further, we have an idempotent operation.

The find and search operations are quite similar. find is intended to locate a given entity by unique property of the entity. In this example, a SKU field or ProductId could be used as a unique field; this does, of course, require clients to know which fields are unique. Hence, find should only return one, distinct entity. search, on the other hand, can return zero to many entities. The output parameters to these operations should be clear enough, but let's talk about the inputs. You'll notice the interface requires an entity instance to be passed to it. I choose to employ a "query by example" style of interface for the service. This reduces the size and complexity of the interface because I do not need to create separate operations like:

Product findById(Long id)
Product findBySku(String sku)
Product findByName(String name)
... and so on

In this paradigm, the client only populates the fields it is interested in using in a lookup operation. For example, for a find operation, the client can just populate the ProductId or SKU field, and pass that to the service (obviously, with XML marshalling and unmarshalling in between). Similarly, for a search, the client can populate the price and/or other non-unique fields. (This interface does not account for the situation where you want to have multiple or ranges of values to search against for one field; for example, products whose price is greater than $4.95 and less than $9.95. As I don't have this business case right now, I haven't bothered to account for it.)

The next question you should be thinking is, "Why have two operations, find and search? Couldn't you accomplish the same with just one operation (like what you did with store)?"

Hmmm ... yeah, the longer I live with this duality, the less in favor of it I've become. Having two operations that, more or less, do the same thing seems not so optimized to me, especially as I went out of my way to make store a simple notion. Having this duality does require the client to distinguish between the two styles of entity lookup, but actually, that was part of the argument for the separation back in the day: If you know the exact entity you want, call operation A with the specific unique fields populated and you will only get back what you expect; else, call operation B and get an array. I think the argument does make sense, but I'm beginning to feel that perhaps the simpler interface is more expressive. If I merged the two operations, the service interface would become:

Product store(Product product)
Product[] find(Product product)

Thursday, December 6, 2007

Playing with Hibernate Shards

After hearing a lot of discussion at QCon and elsewhere about the notion of database sharding (a new, hipper name for the older, more mundane name "partitioning"), I decided to download the Hibernate Shards project and give it a whirl (what else do you do when you can't sleep?). I wrote an earlier blog entry on running multiple MySQL instances that will help when playing with Hibernate Shards.

As I'm reasonably proficient with Hibernate, I figured that I could be able to jump in and get a pretty good understanding just with a few days effort. Luckily, I was correct - my basic assumptions and paradigms for using Hibernate remain mostly intact (Configuration, SessionFactory, and Session object are almost the exact same). The details, of course, lie more in the distribution if data between multiple database instances.

Just a few quick Hibernate tech notes before talking about the data implications. You need to have a little bit more going on in your Hibernate config file(s), one file for each shard. Nothing big there. When creating a SessionFactory, you have to provide implementations for the following interfaces:

ShardAccessStrategy - a strategy for accessing sharded databases for queries (not loading an item by it's id). The provided implementations offer either sequential (including a round-robin, load balanced version) or parallel access.
ShardResolutionStrategy - a strategy for determining which shard to access when loading an entity.
ShardSelectionStrategy - a strategy for determining which shard to store a new entity in.

As relates to data, let's first discuss the idea of entity ID generation. As your data is now spread out multiple databases, you really can't use a database sequence for generating ids - unless you restrict each database's valid sequence values to a restricted range (db1 gets ids 0 - 3000, db2 gets ids 3001 - 6000, and so on). Somehow, this sounds easy but rather self-limiting. Out of the box, Hibernate Shards provides two mechanisms for managing ID generation:

ShardedUUIDGenerator - this implementation basically creates a large random number, and also encodes the shard id into the id. What you get is a really big number (my sample tests had about 30-36 digits each).
ShardedTableHiLoGenerator - this implementation creates a hilo table in one of the shards and uses that to generate all IDs. Of course, the shard then becomes not only a bottle neck for performance, but is a single point of failure.

After playing with it and thinking for a few days, the UUID generator seems the better choice, as long as your db is cool with big primary keys. MySQL worked fine with this, but I needed to make the column of type numeric(64,0) - 64 may be a bit large, but it made my sample code work so I left it. Of course, as your entity's id is shard specific, all the data related to that entity must reside on the same shard. This makes total sense and is definitely what you want from both management and performance perspectives.

Of course, deciding exactly how spread data amongst the shards is a big decision. Unfortunately, Hibernate Shards does not provide a facility to live resharding (not that it would be easy, mind you). Essentially, you would have to do some slight-of-hand while then data was being moved/redistributed, then update your Hibernate configurations and entity to shard mappings. Hibernate Shards does, however, provide a concept of virtual shard ids, and you can point multiple virtual shards to physical shards. This seems like a good idea as you can define many virtual shards which can map to just to two or three physical shards, then as you grow, it's only a slight config update to repoint the virtual shard. The documentation says "Virtual shards are cheap", so it's probably reasonable to create a bunch of virtual shards as long as it doesn't hurt performance.

Hibernate Shards does have a few gotchas:

HQL is pretty much unuseable at this point, and there seem to be some incomplete pieces with Criteria queries. This is mainly due just to the immaturity of the project; I expect they'll be coming along soon.
Master values from lookup tables. As Hibernate demands one and only one instance of an object with a given id, if you use master objects from a lookup table, you need to be careful that you don't get into strange cross-shard problems. This should only happen when saving/updating, but it is something to be aware of. For some advice about this problem, I have another blog entry about that.

All in all, Hibernate Shards is a rather nice wrapper for hiding all the complexity of spreading your data around multiple databases. As it's still in beta, I'm not sure if it's production-ready. As a web site, especially e-commerce sites, are more than just read-repositories for front-end HTML pages, there may be a need for more robust functionality before making the dive.

Running multiple instances of MySQL on the same machine

This may be documented elsewhere on the internet, but I thought I capture my own notes about how to setup and run multiple MySQL instances on the same box. I was playing with this mainly for experimenting with sharding, in particular with the Hibernate Shards project). I'm going to make several assumptions for this discussion: that the databases will, data excluded, be mirror of the other in terms of essential config and schema, that we're running on *nix (I'm using solaris x86), and that we're setting up two MySQL databases.

After the obligatory "download and install", create two directories that will hold all of the database files. For simplicity, I created these directories in the MySQL home directory. Next, and this is the most important part, create two separate MySQL configuration files, one for each database. You only need to specific the unique attributes per instance, which include the directory for files (datadir), socket, port, and the name of the pid file. Here's my first config file:

[mysqld]
datadir=/opt/csw/mysql5/shard0
port=3307
socket=/tmp/mysqld-shard0.sock
pid-file=/opt/csw/mysql5/shard0/pid

And here's my second:

[mysqld]
datadir=/opt/csw/mysql5/shard1
port=3308
socket=/tmp/mysqld-shard1.sock
pid-file=/opt/csw/mysql5/shard1/pid

As you can see, I've just tweaked the values slightly for the second instance.

The next thing to do is setup each database with the proper MySQL master tables and db files. As this is the standard way to setup a database instance, here's the command you need to run twice, one time for each insatnce:

<mysql_home>/bin/mysql_install_db --datadir=/opt/csw/mysql5/shard0
<mysql_home>/bin/mysql_install_db --datadir=/opt/csw/mysql5/shard1

All you need to do now is start up both instances, and you are good to go. The only special thing you need to when starting up is pass the name of the configuration file, like this:

<mysql_home>/bin/mysqld_safe --defaults-file=/opt/csw/mysql5/shard0/shard0.cnf

Note that, and this comes straight from the MySQL documentation, the defaults-file flag must be the first parameter on the command line.

And that's it!

Initial observations in India

(5 Dec 2007)

I should talk about the area where Tavant's office is located. This is a suburb of Delhi called Noida, and from our location it's about a 20 minute drive just to get to the Noida-Delhi border. Until ten years or so ago this area was just a rural village. But then some developers started building roads and office buildings, and starting two or three years ago, many software companies have moved into Noida. Adobe just completed a large building here (looks quite nice, actually). So, Noida has this interesting mix (which I have out is quite common in India) where you have the local rural peasants and pieces of Westernization/modernization - roads, offices, shopping malls (sooo many of these now in India) occupying the same space as very poor people, stray dogs, and, of course, the ubiquitous sacred cows walking everywhere, including in the middle of heavy traffic. In my high school studies, I remember discussing the notion of cows and how they treated as sacred animals and how they can walk anytwhere and have the right of way. I never really expected to see them in the middle of Noida's nasty traffic-thick roads, but they really go anywhere and nobody can (or does) move them, hit them, or ignore their presence.

Also the driving here in India is pure chaos. If you think New York City cabbies are crazy, that's nothing compared to this (although, NYC cabbies do make more sense to me now as a lot of them are from India). Basically, there's no real notion of a driving lane here. Yes, there are lines painted on the ground, but the are wholly ignored. With sooo many bikes, rickshaws, auto rickshaws, buses, trucks, two-wheelers (motorcycles), and cars all sharing the same road, and everyone is swerving and switching lanes and passing, you'd think I would have seen at thrity accidents by now. Often the distance between the car I'm riding in and anything else on the road is less two inches. All this much is not too bad, but on top of it all, everybody is honking their horn. It gets insanely loud when traffic slows down.

Through my years of hearing about India, talking about the culture and food, I always had the feeling that just being here would be a spiritual experience. I must admit that for the first few days, I really don't feel like I'm in that India that I had imagined. Mahesh and I were talking about this, and he says that the India of today is not the country he grew up in (he's from the south in Madras). He says that life and culture used to be much simpler, but now with so many cars now (compared to just ten years), all the Western-style shopping malls, and the changes to family life (mothers working late-night call-center jobs, for example), that Indian life has become much more complicated, at least here in Delhi. So, he definitely understands my lack of spiritual-bliss-upon-arrival feeling and he recommends that we travel outside of Delhi to see more of the country side and religous sights where we can experience the more traditional side of India. I have found it interesting, though, that both of us, a native Indian (who has been away from India for more than seven years) and a dumb mid-Western guy, both have the same impression. Not that there is anything wrong with change or modernization, but we both fear that maybe something is uniquely Indian is being lost in this shuffle. I wonder if this is what happens to all countries on the route to modernization?

Wednesday, December 5, 2007

India Trip - Day 1, part 2

(evening of 2 Dec 2007)

After taking a much needed nap, we were met by our friend Navin Goel, who is an employee at the company we are visiting here in India called Tavant. Navin picked us up in the afternoon and was took us to a local Hindu shrine call Akshardahm. Wow, that place is really stunning and quite beautiful. Akshardahm is a temple dedicated to an eigthteenth-century Hindu priest named Swaminarayan, and the complex has several large statues in his honor. The is some truly amazing sand stone and wood work that was done - the sand stone elephants scenery around the main temple is absolutely beautiful. Interestingly, Akshardahm was just completed two years ago, so it's extremely new. As night fell, there is a water works display with lights and music that follows the Hindu beliefs for the creation, duration, and end of life. It was quite elegant and nothing garish.

After Akshardahm, we went for dinner at the Radisson hotel nearby our guest house area. Mahesh thought we should be cautious and ease our way into the local food here in India - a wise choice. We ate at the buffet, and the food was extremely tasty. We were all pretty happy with the meal, and I finally had my first real meal in India of Indian food! After dinner Navin dropped us back off at the guest house and I promptly fell asleep for the night.

Sunday, December 2, 2007

India trip - Day 1

(2 Dec 2007)

Well, we finally arrived in India! Mahesh and I are here to give training seminars to the employees of the outsourcing/staff augmentation company we work with here In Delhi, Tavant. We left Friday evening from New York JFK and arrived in Brussels on time (7:30 am on Saturday). Then in Brussels there was a three hour delay due to some power outages - argh. We got into Delhi at 2:30 a.m. on Sunday, and after a not-too slow immigration line, we waited an hour for our bags to come through in baggage claim. Then, another hour and a half from the airport to the guest house in Noida (about 14-15 miles from Delhi proper). I then passed out at about 6a.m. for six hours on the bed! The two legs of the flight were each about 7 or 8 hours in length, so a more bite-sized trip compared with the straight 12 hours from New York to Tokyo.

We flew Jet Airways, and must say I was very impressed. The Indian food they served on the flight was great - hands down the best in-flight meals I've ever eaten. The seats were quite comfortable and in addition the the standard bask recline, the seat moves forward so you can get a few more degrees closer to flat (it's still coach, though).

This afternoon we're just relaxing here at the guest house and will be picked up by Navin, one of our guys at Tavant. Not sure what the plan is, but hopefully something fun!