Transitioning to Event Sourcing, part 9: additional tools and patterns
Une version française de ce billet est disponible ici.
I will pass briefly on topics of interest and additional features a CQRS / Event Sourcing infrastructure could implement. Each of those topics would probably deserve their own post or even a book. If you need more information, there are a lot of material on theses subjects in discussion groups. Greg Young will also soon release an entire book on these subjects. So here are brief introductions:
Obviously, pure event sourcing like we implemented in part 8 has its limitations. It is fine for aggregates with few thousands events, but after that, we need to implement snapshoting.
The idea is to allow the aggregate to create a snapshot. When loading an aggregate, we only fetch a snapshot and the events that occurred since the snapshotting. Snapshots are stored in a table that is containing both the latest version of the aggregate, the snapshot data, and the snapshot version.
You should delegate the snapshoting process to a standalone process. The application itself do not have to manage snapshot creation. The event store will just “consume” snapshots created by the snapshotter. The snapshotter just need to query the snapshot table for aggregates that had, say, 100 commits or more since their last snapshot. It can simply load the aggregate in question and ask it for a new snapshot.
What happen when you need to change an event? In most cases, you don’t want to change the events already committed in the event store. You simply create a new event version (for example CustomerAddressCorrectedv2), and make the aggregate consume this event instead of the old one. You then inserts a converter at the output of the event store that will convert the old ones to the new ones before passing it to the aggregate constructor.
How do you convert an old event to a new one? Let’s say you have a new field in the event. You will have to take the same kind of decision as when you insert a new column in a table: you have to calculate a default value for that new field.
What happen when this is completely impossible? As explained by Greg Young, this probably means that the new event is not a new version of the same type of event, but actually a new event type. In that case, no upconversion is possible.
Rinat Abdulin is covering a few examples and more details in his versioning bliki page.
Now that you are not tied to the transactional model for your views, you have much more liberty. In particular, why not store directly the object you will need to display in a page instead of trying to do a cumbersome paradigm shift from and to a relational database? Databases that are storing unrelated object trees are called document databases. On .NET, I would highly recommend RavenDb. On other platforms, there are numerous choices like MongoDb. Some of them are even producing JSON output, so you could transfer the object directly to your AJAX application, without any transformation taking place in your application layer.
Again, Rinat Abdulin is providing an interesting read on the subject.
Smart merging strategies
So far, our concurrency strategy has been quite dumb: we retry the entire command when we detect a conflict. And this is fine for most of the applications. But if we need, we could also get the events that have been committed in between and look at them. We may be able to commit the events even if the command has been raced by an other one. That is, if the other events are not incompatible with our commit. Since events are capturing a detailed and very high meaning of what happened, this is usually simple to creates compatibility rules between pair of events. A good starting rule is, if there are one instance of the same kind of event in the 2 commits, we consider that as a real concurrency issue. But we can add smarter rules that actually look at the content of events.
This is particularly useful for highly collaborative applications. But it also opens to additional possibilities in deconnected systems. This is the case in the domain of mining prospection or in the police. For those cases, the client can embark a copy of the event store, and work from it. When the client reconnects to the server later, the commits are merged. Only conflicting events are merged manually by the user. By the way, this is how most of the relational database systems are implementing replication, through their transaction log.
This is a big topic. I will here only scratch the surface. An immense amount of money is spent worldwide integrating services within companies and across companies. There are multiple reason why integration is hard. Most of the time, applications are not designed to be integrated, and are implemented as if there we an isolated component. Fortunately, with the trend in SOA, integration becomes more of a concern now. An other problem are that most of today’s integration technologies are synchronous by default: REST, web services, CORBA, RMI, WCF and so on. All this synchronous communication does not scale. First, network latency kicks in. So if a service needs to call an other service that needs to call an other service that… You get the idea. And then, the more components you add to the chain, the more likely you will get a failure along the way. And I am not talking about the maintenance nightmare that such a web of interconnected services are producing.
Event sourcing is allowing for a very clean approach. Combined with publish / subscribe, applications becomes disconnected in time, and decoupled in responsibilities. If other applications need to integrate with yours, just ask them to subscribe to your event stream. You can publish your event stream with components called Service Buses, that precisely allow that. Once you are publishing to a service bus, you don’t even need to be aware that a new application is now integrated with yours. All that application has to do is to download the history of your event stream, subscribes to the stream, and then derive whatever information they need from it, or react to events in your application in very little time:
This is solving the “always up” issue. If one or the other application goes down, the events can be queued in the service bus until the consumer system goes back up.
If you choose a serialization format carefully, it also solves most of the maintainability issues. The source application can create new types of event without impacting the consumer applications. Since the applications are responsible for projecting what they need from the event stream, they will be responsible for taking into account (or not) the new types of event. Finally, consumer applications will not need new features or data from your application, since the event stream already includes everything that ever happened in your system.
Eventual consistency is when there is a delay between the commit of the aggregate and the corresponding refresh of your view model. Or more precisely, when those 2 processes happen in different transactions. Concretely, you only commit events at commit time, and a different process is picking them up to execute the view model handlers. If you already have a service bus (see “Service integration”), you can use it. And indeed, the view model can be viewed as a separate service, almost a different bounded context if you will.
However, most of the applications don’t need such a separation. Until you need it, you should commit the changes to your view model along with the events.
When do you need to split things that way? One reason is scalability, and we will talk about this one later. One other factor could be performance. Your view model might be huge, or have parts of it that are slow to update. You may want to isolate the update of those parts in a separate process, in order to keep a good performance when executing commands.
Scaling the query side
Once you have eventual consistency, it is quite trivial to scale the read side. This is good news, because most applications have usually a lot more reads than writes. All you have to do is to put a copy of your view model on each web node. Each of those view models would subscribe to the event stream:
That way, you have an almost linear scalability of your read side: just add more nodes. The other benefit is the gained performance due to the local access to the view model. You don’t need a network call anymore because your web layer sits on the same node as its view model.
Now, if you want to scale at very large levels, your problem is reduced to scaling the publish / subscribe middle ware.
Scaling the write side
For scaling this one, you will first need to process commands asynchronously. The UI would need to send the command and not wait for the result. This usually triggers a very different kind of user experience. Think of it as ordering something on Amazon: what you get when you are clicking on the “submit” button is not a success or failure statement, but something more along the lines of “Your order has been taken into account. You will receive a confirmation email shortly”.
If you need it, there are however ways for the UI to asynchronously fetch a command result.
Once you are sending commands to a queue, you can have competing nodes treating those commands. But that is only scaling the business logic treatment, not the event storage. Luckily, by its structure, it is also quite simple to shard events by their aggregate id. Since commands contains the aggregate id, you can route your commands to the right queue:
Transitioning to Event Sourcing posts:
- Part 1: the DDD “light” application.
- Part 2: go CQRS with DTOs for the read side.
- Part 3: define commands explicitly.
- Part 4: track state changes.
- Part 5: use events for updating your domain database.
- Part 6: store events.
- Part 7: use events to build a view model.
- Part 8: remove the domain database, go event sourced.
- Part 9: a brief word on versioning, smart merging strategies, eventual consistency, scaling the read side, scaling the write side.