Why use Event Sourcing?

[20/02/10]

Udi and I agree on probably 95% of what we talk about, one of the places that we have differing opinions is in the use of Event Sourcing I use the term as described previously to mean the rebuilding of objects based on events, not the definition that is currently on the bliki. To me this is an important distinction and I figured it would be worthwhile to write a post on why I feel the way I do, I explained parts of it in the previous post about CQRS and Event Sourcing but I wanted to talk not just about how the patterns are symbiotic but also some of the other reasons I use event sourcing.

Using a RDBMS

To start with let’s go through the alternative architecture, that is to rebuild objects from something that saves current state on the write side. I say something because that something could be many things; bigtable, mongodb, a relational database, xml files, photos of the objects that are then scanned in … It really doesn’t matter for the sake of this discussion, what matters is the storing of current state. For the sake of discussion let’s imagine that it is a relational database which Udi generally recommends here is the slide he generally uses to talk about it.

Data is stored, as normal in the relational database and the domain is instrumented to send events as well which can then be used with read model. There are some really good things about this architecture that I would like to go through before I talk about some of the issues I have run into with it in the past.

First of all this architecture is highly applicable to legacy systems that would like to move to a separated read model for some of their data (it is important to note that you may not move all of your data to a separated read model). it is also very familiar to development teams, operations, and management. We are well aware of how to deal with such a system, never underestimate the value of familiarity.

There are however some issues that exist with using something that is storing a snapshot of current state. The largest issue revolves around the fact that you have introduced two models to your data. You have an event model and a model representing current state. If you look in the above diagram you will see there are two operations, write and publish.

Any time that you represent things in more than one model you have to worry about keeping those two models in sync and this situation does not escape that. As an example, how do you rationalize that the data you saved to your database actually matches up with the events that you sent out? One can write unit tests to help try to keep things synchronized but they will eventually fall out of sync with each other and when they do, you have a problem.

Another problem with the having of two models is that it is necessarily more work. One must create the code to save the current state of the objects and one must write the code to generate and publish the events. No matter how go about doing these things it cannot possibly be easier than only publishing events, even if you had something that made storing current state completely trivial to say a document storage, there is still the effort of bringing that into the project.

Beyond all of that for me, the focus tends to be on the current state based model when the system is really event centric. This may sound like a nitpicky issue but I find teams doing this look at events with less importance than those who use events as storage as well as the latter only use events they are extremely event centric.

Using Event Sourcing

Using Event Sourcing does not change the architecture above much, the primary difference is that we have an Event Store holding the events to rebuild an object behind the domain as opposed to something storing the current state. There are however many interesting differences between the two architectures.

The first major difference is that it solves the problem of having two models. It removes the cost of having to deal with synchronization between the two and most importantly it removes the possibility that the two diverge. Having models diverge could be very bad as we use the event model as our integration model so others can create parallel models of our model. If we have a synchronization issue, they will have bad data while we have correct data, ouch. With Event Sourcing we only save the event (that can even be called “publishing it”)

There are however other benefits to using Event Sourcing as opposed to storing current state. I went through some of them briefly in another post. One of those benefits is that we can avoid having to use a 2pc transaction between the data model and the message queue (if we are using one)… the reason for this is that the event storage itself is also a queue, we could be trailing the event storage to place the items on the queue (or directly use the event storage as a queue).

Testing however is a big win with Event Sourcing, since all of your state changes are done through events you can simply test the events coming out the other side. This is particularly interesting when one considers that with events since you are being explicit about what is changing you are also testing what is not happening. Very few people write tests to show what doesn’t happen on behaviors (also a prime place where models can lose sync) eg: I am calling ScheduleAppointment on an object, do you check to make sure the address hasn’t changed? Since I in testing would assert that I only received an AppointmentScheduledEvent. Beyond that since we only deal with events on the other side we can view (and test) our domain as being a rather complex finite state machine which offers some very cool possibilities.

Of course we have left out what I think is one of the key values of Event Sourcing, we actually have the changes. Let’s say I get a concurrency violation (request is from version 5 while the current data is at version 12). With Event Sourcing I can ask for all of those events in between and see if any of those actually conflict with the command that I want to run (I will write a post on more about this works soon). In other words we provide intelligent merging…

You will note that I have not talked about any of the general benefits of Event Sourcing, such as the business value of the events, the value of having a log, the fact that the Event Store is additive only. I am leaving these out because many of these can exist in both systems. In the former you can save all of your events historically as well.

I hope this explains a bit about the differences between the models. I want to provide people as well a quick list of some of the pros and cons of each.

Current State Backing

Pros

Easier to retrofit on a legacy project

Well known technology/tools/skill sets

Easier sell for an organization

Cons

Dual models cost more and contain risk

Non-Event Centric