Here is a simple database migration model that I recently adopted at work. It worked out really well. With little effort, we are now able to create new or upgrade existing database to any specific commit in the source control. And now since production SQL changes are migrated with an automatic batch process, it has streamlined our deploys as well.
Start with a Baseline Script
Baseline script is necessary for creating database from scratch. It's used for setting new environments or even daily development work where you want to make sure to start with a clean slate.
If you already have an existing database you will need to start by creating a baseline script. The best time to do that is after a fresh deploy to production. That way you know your development version of database should be the same as production, and you now have a clear starting point or baseline. Here are the steps I took.
- Script entire database schema with objects
- Clean up schema if necessary. (I had to remove replication directives, and creation of production users since those are production only settings.)
- Add any necessary seed data for testing purposes. When you rebuild a database it will most likely be used for development or QA, therefore most likely you will need some starting data.
Add Migration scripts
Migration scripts get applied in order in which they are created and only migrate database up. This model does not involve downgrading, simply because we haven't found a need for it (production database is never downgraded, and local/test version can always be recreated from scratch).
When a developer is working on a feature or a bug that needs database changes, he creates a new migration file with a next sequential number in the file name. For example: if the last migration file is Script-0048.sql next a new migration script will be Script-0049.sql.
They have to be sequential because that's how we can make sure that migrations are applied in order they were created, and can guarantee consistency between environments.
Version Control your SQL scripts
Next important piece is to version control your scripts. It plays the following roles:
- Source control becomes a mediator, so multiple developers cannot check-in script with the same name. In addition, if there is a conflict with names developers are forced to get latest and change their script name.
- Each branch has it's own list of migration scripts, and there is no doubt of what your database should look like to match the code base in any branch or individual commit. It simply must have all migration scripts applied to the baseline.
- It keeps track of changes and we want to make sure there are no changes once a migration script is in source control. (more on that in the Rules section)
Keeping track of migrations
How do we know what migrations scripts were applied to the database? Simple, we create a table that keeps track of executed script. That way it's easy to compare what's already executed and what scripts need to be applied to get to a specific point in time. A simple table like this will do.
Finally, your migration application takes care of figuring out which scripts are missing, executing missing scripts, and recording applied scripts in the log table.
Two Easy Rules for Stress Free Migrations
- Once a SQL migration script is committed and pushed to the source control it must never change. We do that to eliminate potential inconsistencies between environments, because once a script is out in the wild you can assume it was already applied somewhere and if script changes that environment will never get updated.
- Automate database migrations completely. There is absolutely no reason why you need to manually run the update scripts, it's frustrating, error prone, and it's waste of time. You can quickly write a batch process that will execute each script and add a record into the journal table, or you can use existing open source projects like DbUp for that. We've opted in for DbUp since it already does exactly that and has other nice features like wrapping all migration scripts in a transaction.
Rebuild or Migrate Your Database With One Click
We've created two Powershell scripts that will either create or upgrade local database with all migration scripts in the current source control tree. Rebuild will execute baseline script + migrations. Upgrade will only apply missing migrations and it's the same script that's used in production. There is no more need to use shared database, developer can migrate or re-create his version of the database in few seconds. I've also had an idea to include a module that will check on application start if the database is out of date and apply needed scripts, I wouldn't run it in production but it's perfect for development.
After setting up automatic migrations it was very easy to setup test environments for functional end to end testing with Selenium. Continuous integration server will pull latest from the code base, run database upgrade script, build and publish site, and execute functional tests.
Conclusion: A lot of impact for a little effort
I've been part of many overnight deployments that gone wrong due to some missing stored procedure, and felt the agony of chasing down errors at 2AM in the morning. It really doesn't take long to apply this model, even less so if you choose to use existing open source libraries like DbUp. While there is nothing radical about this practice, I know a lot of companies are still manually deploying their SQL scripts. It's a small change with big impact that will streamline your development and make production database migration smooth with guaranteed correctness. It worked out great for my company. How do you manage your database migrations?
What is a continuous delivery? It's a methodology that allows you to get new features and capabilities to market quickly and reliably. It's a practice that shortens work in progress and allows for rapid feedback on your new features or improvements. It's automation of many parts including testing, creation of environments, and one button push deployments. It's a magical unicorn that allows companies to deploy 10 times a day without much effort or risk of breaking stuff!
The reason why his story strikes a chord with me is because I strongly believe that automation can streamline the software development, make employees happy, and help organizations to become high performers. And automation is a big part of continuous delivery methodology. Throughout my career, I've participated in many painful deployments that lasted more than a day and usually were throughout the weekend. And I believe nobody should be a part of that because there is a better way that's within reach of many companies. As a result, developers can focus on doing stuff that matters, companies can deliver new features much more rapidly, and stake holders can see their ideas spring to life very quickly.
First, let's dive deeper into the normal software delivery practice of an average company.
- Product managers will come up with arbitrary due dates without doing technical capacity planning, making promises that we cannot keep. And in result, when a due date comes, product is rushed with many shortcuts taken, which results in lower quality and more fragile applications, which means more technical debt and unplanned worked later.
- Security is not even in the picture because new features are not getting to market quickly enough.
- Technical debt continues to get stock piled and is rarely paid off. Like financial interest, it grows over time until most of the time is spent on just paying off the interest in the form of unplanned work.
- Deployments are not automated and it takes long time to manually deploy. Therefore deployments are a lot less frequent, that means a huge number of features are being deployed at once. That means finished work does not make it into production for months, sometimes years (scary), and that means no rapid feedback on performance, adoption, or business value. Comparing that to the manufacturing plant, where at the bottleneck station you have a stock pile of work, and you have to stop everything just to catch up. At that point you cannot give any feedback because other stations already finished their work, and it's very costly to change already made stuff (unplanned work) and the solution is lower quality product (technical debt).
- Due to lack of automated testing, companies have to deploy even more infrequent since it takes an army of QA engineers to regression test the entire application, and it becomes even longer as more features are deployed.
- Failed features don't get pruned, but rather just left to rot and accumulate more security, technical, and maintance debt.
Unplanned work is not free, it's actually really expensive. Another side affect is when you spend all your time firefighting, no time or energy left for planning. When all you do is react, there's not enough time to do the hard mental work of figuring out whether you can accept new work. Which means more shortcuts, more multitasking. - The Phoenix Project
How Unicorns Work
The main idea behind continuous delivery is to reduce work in progress (WIP) that would allow for quicker feedback of what goes into production. For example, if you work on a 9 month long project it will take you longer than 9 months to see your code in production, and if something has a problem it will be very expensive to go back and change some design decisions. Therefore, there is a good possibility that a fix will just be a hack rather than a proper solution, meaning more technical debt accumulating, more problems later. And not to forget that after 9 months of projects it will take the whole weekend and huge amount of agony to release it.
Gene's term for WIP in IT terms is a "Lead Time" which measures how long it takes to go from code committed to code successfully running in production. And until code is in production is has absolutely no value because it's just sitting there. Focusing on fast flow will improve quality, customer satisfaction, return on investment, and employee happiness.
But you think it must be crazy to release so much a day, isn't it dangerous to make all those changes? It's actually a lot less risky to deploy small changes incrementally because you get rapid feedback. And even if there is a problem it's much easier to fix small problem sooner rather than later where you are forced to take shortcuts.
To get there, you have to "improve at bottlenecks", and any improvements that are not done at the bottlenecks are a waist of time. If your deployment process is a bottleneck, you have to automate it, until it becomes one button push deployment. There is absolutely no good reason why a developer cannot push a button which will create a QA environment that exactly matches production with code deployed, and if it passes automated functional tests it can be put into production with another push of the button. If the regression testing is a bottleneck then you need to pay off that debt by writing automated functional tests or end to end system tests.
"Automated tests transform fear into boredom." --Eran Messeri, Google
To become high performer, you will also need to add monitoring hooks to your applications, so that any developer can add his or her metrics at will. So when you release often, you can get rapid feedback on the performance, adoption, and value. That way you can make an inform decisions and rollback if necessary. It should be extremely easy for developer to add any kind of monitoring metrics to code and data must be accessible from production.
Gene proposed to spend 20% of the time to work on non-functional improvements, or non feature work, and I think if any organization adopted that they would be on their way of becoming a high performing unicorn. I honestly don't think it's much to invest comparing to opportunity cost of features not making out for long periods of time and where only 10% of features are successful. How can you test something when you can only release couple times a year?
Finally, you should be deploying to production with your features turned off that way your releases are not at the same time as deployments and turning features on and off is a simple button click.
It's not art, it's production. If you can't out-experiment and beat your competitors in time to market and agility, you are sunk! Features are always a gamble. If you are lucky, ten percent will get the desired benefits. So the faster you get those features to market and test them, the better you will be. - The Phoenix Project
And if you think 10 deploys a day is crazy take for example Amazon with a mean time between deployments of 11.6 seconds (insane). And it's not just them, companies like Intuit, Target, Google, Etsy, Flikr, Kansas State University and many others have embraced continuous delivery.
It Does Not Have To Be Radical. Small Steps Are Just Fine.
In a perfect world, a company with problems would stop everything to fix the production line and pay off the technical debt. And some companies like EBay had to do that to escape the vicious cycle. I don't think it has to be so drastic for an average company. I believe if you accept the culture of continuous improvement, and first focus on all the bottlenecks, you can soon get there. For example, you can make small changes that will bring a lot of improvement. For example, if you deployment is a manual process focus on automatically creating packages and create a script that will automatically deploy. If it requires database changes, add scripts to the package and your deployment script will deploy database changes automatically. There is no reason why a DBA has to compile and execute scripts by hand when it can be automated. If your need many QA engineers to regression test the site, why not spend some of their time to write automated tests, I'm sure they would be happier to find new bugs rather then doing senseless testing of the same stuff.
Gene urges us to create a culture of genuine learning and experimentation and that's how best companies get even better. In additional here a great quote if you think it's not relevant to you
Most of the money spent are on IT Projects these days and even if companies say it's not their core competency it's not true. Everyone must learn or you risk irrelevance in 10 years. - Gene Kim
Good luck and see you in the world of unicorns! :)
And it's not only startups that choose Node.js, recently a company like PayPal announced that they chose Node.js to be an application platform of choice. And some great benefits they reported so far:
Built almost twice as fast with fewer people
Written in 33% fewer lines of code
Constructed with 40% fewer files
So here are the books that I've really enjoyed and will get you up to speed if you are a seasoned pro in other languages like C# or Java.
Backbone What I like about backbone is that it has extensive suite of Unit Tests, the library itself is about 1700 lines of code, and it has very good comments. It's pretty incredible that such small library is the most widely used SPA library out there. I start with reading unit tests to understand the specifications. After I have general knowledge I would dive in to pieces that I find most interesting.
Express Whether you are looking to build a Restful API or a traditional web application, express is great a minimalist framework on top of Node.js for that. It's pretty simple and genius and also has a pretty small code base.