Database Migrations vs. Database Differencing

Paul Stovell just wrote a fantastic article on database deployment, if you haven’t read it, go and do it now. It underlines the critical thinking that is being applied to the relationship between database administrators, software developers, their roles and responsibilities and the artefacts that they manage.

This is actually a debate/conversation that has been going on for some time internally about the best way to manage database migrations, earlier this year that debate/conversation actually spawned this post, but I think Paul has made a really good case for why the database migrations approach (as opposed to database differencing) works for him.

Similar Solutions, Same Problem

To be honest, I’ve always been conflicted about this. The migrations approach is seductive and other than being a little bit more cumbersome to get started I think it yields some pretty good results – and most importantly it totally appeals to my inner developer. I think that DB Pro is very quick and easy to get up and running and if used properly you can get the desired approach even if there are a few scenarios where it falls down and would force me to use the migrations approach.

I disagree with Paul fundamentally that the state based approach to database updates is dumb. I consider the database differencing approach, and the database migrations approaches to be two valid solutions to the same problem. I’ve produced the following diagram to help put it into context.

image

There are those people out there that are happy to define the state that they want to get to, and trust the tool to get them to get them from that state by building the migration for them. Then there are those people who define the migration, and trust the tool to keep track of what migrations have been applied. What it comes down to us understanding “what could go wrong”.

Corrupted Starting State

Despite the development teams best efforts managing changes to the database schema in version control, there will be times where the starting state of a database prior to the upgrade process does not match what the developer intended. In an ideal world the developers would test the upgrades against a copy of the database, but I’ve seen situations where this is not feasible because the database might be 1TB in a remote datacentre which means there is a lag between database copies arriving (if at all).

Using the database migrations approach this can sometimes lead to a problem where someone has made a change to the production database directly which blocks the migration script from working. I’m not saying that is a reason not to use database migrations, but its one of the things that “could go wrong”.

Because tools like DB Pro use a database differencing approach they don’t suffer this particular problem because they can compensate and change the database state to the desired configuration. The problem is – what if that original change was made for a reason, and along comes DB Pro and wipes it out (well, not quite, it has protections against data loss), once again, this isn’t a reason not to use DB Pro, its just another thing that “could go wrong”.

The point I am trying to make is that no matter which tool you use, there are situations will will trigger update failure, regardless of whether you are using a differencing approach or a migrations approach. It isn’t sufficient to sit back and say if all my scripts run, the database is in a good state.

Update Failure Recovery

Once you’ve got yourself stuck with an update situation you need to know how to get yourself out. With DB Pro, it will normally abandon changes to the database if it detects a situation where data-loss will occur or the scripts won’t run. But not working does not equal recovered.

What you need to do with DB Pro is get the offending schema changes into your definition of what you want the state to look like so that you can continue to move forward. With the migrations tool you need to hope you have a tool that will detect failures and rollback the changes so you are not left in an unknown state. A decent migration tool will do this for you – so I’m not arguing this as a point of difference.

Where the migration story can suck is that if you have got an uncontrolled change in the production database, you need to figure out how to replicate it – ironically a differencing tool will help you here because you can migrate the database up to the point that the change was made, perform a difference between  your local developer database and the production database and insert a migration in the middle. The truth is you are unlikely to need that migration anywhere other than your development and test systems but it is important to capture that intermediate state.

Inconsistent Approaches

The problem with database migrations is that there are too many cooks in the kitchen. Within the average software development project you have software developers modifying the schema in development, testers mucking around with sample data in the test environment, and database administrators giving the schema some treatment post deployment in production. What makes it worse is that everyone has their own ideas on how they want to accept changes to the environment that they control.

Paul and I could argue all we like about our preferred migration tool, but unless the DBA accepts it as a valid way to push a schema forward then we are both hosed, and we need to present him with some diff scripts to run. In both the differencing scenario, and the migration scenario this is pretty simple, Paul would just hand over his T-SQL scripts, and I would use the differencing tool to pre-generate the T-SQL scripts. But wouldn’t it be nice if we all played nicely?

Paul hit the nail on the head with a few of his suggestions. You really must use version control, you really do need to test your updates on a regular basis, obviously the best way to do this is via a CI build that works against a copy of the production database (as fresh as you can get).

I would go one step further and say that database administrators should also be forced to use the same process that the development team are using (and ideally put them on the development team). This way if they want to change an index on the database, then they should be forced to make that change via either migration scripts or a tool like DB Pro and push it to production. Doing this would have a dramatic effect when trying to reduce the number of failed database deployments.

Multiple Application Databases

All of this is grand in theory, but there is one more thing to consider when it comes to database migrations. The database is not always completely owned by a single application. I’ve seen customers that have 50+ applications talking to the same database, and they don’t even attempt to understand the boundaries of each application – it is truly a shared, stateful component. In situations like this the database needs to be treated like a separate product with its own release schedule, but governed by all the same rules (version controlled, continuously tested). Various dependent projects submit changes to this database, and where possible they get rolled out in advance – which kinda discourages radical changes to the schema.

Final Thoughts

I think that this space is getting interesting at the moment and I’m glad that we are seeing some debate on this topic. What I hope is that we end with the changes to the base platforms and tools required to make this a smoother process. For example I would love to see a BCL change for .NET which introduced a “System.Configuration.Migration” to compliment “System.Configuration.Install”, and then have all of the Microsoft tools target producing “migration libraries” which plug directly into this framework extension.

I also think that migrations isn’t just a database problem. And I am really pleased to see the likes of Fluent Migrator and Machine.Migrations making early moves into solving the more global problem of how to go from state A to state B cleanly, whether you are using a relational database, a document database or any other kind of stateful system.

Finally, I’ve been disappointed by DB Pro. The attempts to get other database vendors to buy into this process has largely failed, Microsoft has relied on Quest to produce TeamFuze which integrates DB Pro with Oracle, and the IBM DB2 plug-in for DB Pro seemed to go no where. If there really is a strong argument for using the migration approach over the differencing approach it is because of lack of tooling. At least with a migration approach you can “roll-your-own” fairly easily.

One thought on “Database Migrations vs. Database Differencing

  1. Pingback: January 11, 2011–VS/TFS Links and Quick Hits | Learn TFS

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s