Sergey Maskalik's Blog
Sergey Maskalik

Sergey Maskalik's blog

In the pursuit of mastery

There are mission critical pieces in any business application that might call external services via some sort of an external API over the local network or Internet. The problem with delivering messages over the network is that networks sometime have a tendency to drop messages and cause timeouts. There is a great paper on Kyle Kingsbury’s blog where him and Peter Bailis provide a list of evidence that failures can happen in many levels of the network or operation system. And it’s not only network that can fail, the receiving 3rd party applications might also be down, under load and slow to respond.

Processes, servers, NICs, switches, local and wide area networks can all fail, and the resulting economic consequences are real. … The consequences of these outages range from increased latency and temporary unavailability to inconsistency, corruption, and data loss.

Therefore, if we want to have a reliable communication with external services we need to implement some kind of a retry mechanism that can redeliver messages and recover from faults.

Solving Guaranteed Delivery

Message Queue is one of the solutions that can address this problem and guarantee delivery. Rather than calling an external API within the application process, like a web request, you place a message into the reliable and durable message queue that guarantees that your message will be delivered to the consumer at least once. Since putting a message on the queue is usually a fast operation, it also speeds up your application performance. Most message queues guarantee delivery by providing some sort of a mechanism of acknowledging if message has been received by the consumer. And if consumer doesn’t respond after a period of time the message gets returned into the queue so it can get processed again. This basically guarantees that a message will get delivered or retried for a certain pre-configured number of times.

There are numerous alternative Message Queue solutions with each addressing certain problems in it’s own way. Since the main specification for calling critical services is guaranteed delivery and we are not building something like a high throughput trading application, the Amazon SQS provides the best alternative to the self-hosted MQs. One of the benefits is that you don’t have to administer message queue servers and spend a lot of time figuring out how to setup redundant clusters for reliability and worry about network partitions. Of course you loose on speed of placing a message into the queue. But a 20ms average put call to SQS is also good enough for this problem.

Until you figure out if you actually need something faster and spend all the time to learn and setting up the MQ cluster, I think Amazon SQS provides the best bang for your time. It’s easy to understand, has a great SDK and it’s ready to go. It’s also not very expensive. A million calls costs $1. Yes you do have to poll the queue, but you can also use the long polling and that will reduce the number of calls of one consumer to 1 call every 20 seconds (if no activity), which is about 120k calls per month, or about 12 cents.

Putting a message into the queue is trivial and doesn’t need more explanation. What is not trivial is creating a reliable application with workers to process the messages. We also want the ability to start multiple workers so they can process messages from the queue in parallel rather than one at a time. Since many workers can be started on one node and run in parallel we need our service efficiently use the CPU resources meaning not blocking threads while waiting for I/O.

I wasn’t able to find an open source application that can always listen to the queue and process messages as they come in. So I needed to write my own. A good candidate for this task is a Win32 Service, since it provides a platform for always running service that can also self restart itself on fault and boot up with windows automatically.

Creating Message Processor Win32 Service

The windows service must always be running, meaning that each worker will have a main while loop that will continue indefinitely. Also you need to start multiple workers so you have to use some sort of multi threaded solution. My initial version was to new up multiple Threads that invoke an asynchronous method. Like this:

protected override void OnStart(string[] args)
     for (int i = 0; i < _workers; i++)
      new Thread(RunWorker).Start();

public async void RunWorker()
    // .. get message from amazon sqs sync.. about 20ms
    var message = sqsClient.ReceiveMessage();

       await PerformWebRequestAsync(message);
       await InsertIntoDbAsync(message);
       // ... log
       //continue to retry

And it was working fine, however there is a problem with this code. The initial threads that were created would exit once the first execution of the method would hit first await. So there was really no point of creating those threads. In addition, I wasn’t passing a cancellation token to threads so I could not signal it to shut down whenever I wanted to gracefully exit the service. Thanks to Andrew Nosenko who pointed out a better and cleaner way of accomplishing the same goal using tasks.

Rather than starting threads manually you start each task and add it to the List collection. This way the threadpool is efficiently managing your threadpool threads and schedules it according to the system resources.

List<Task> _workers = new List<Task>();
CancellationTokenSource _cts = new CancellationTokenSource();

protected override void OnStart(string[] args)
  for (int i = 0; i < _workers; i++)

And inside of the RunWorkerAsync’s while loop you call token.ThrowIfCancellationRequested(); that will throw OperationCancelException and exit the thread when the cancel is requested.

With windows service when you start a service the main Win32 Service thread gives you some time to start your processes and it must return quickly, meaning not to get blocked. So your OnStop method is where you have to call your Task.WaitAll(_workers) which blocks the current thread until all workers have completed their tasks. So once the OnStop method begins you signal the cancellation token to cancel the tasks, and then you call Task.WaitAll and wait until all tasks run to completion. If all tasks have been completed prior to calling WaitAll it would just continue so there is no risk that it could finish faster. The OnStop method looks like this:

catch (AggregateException ex) 
    ex.Handle(inner => inner is OperationCanceledException);

It uses an AggregateException.Handle method which will throw any unhandled exceptions after it finished running. And since we are only expecting OpearationCanceledException it will just return.

Polishing it up

One problem with windows service application is that it’s hard debug, you cannot just attach the debugger to the running win32 service. To work around this problem we will use a topshelf project. Topshelf allows you to run your windows service just like you would run a console application with the ability to debug and step through the code. It also make it easier to configure, install and uninstall the service.

Here is a quick sample code that will make a message processor console application into a Win32 service.

public class MessageProcessor
    List<Task> _workers;
    CancellationTokenSource _cts;
    public MessageProcessor()
        _workers = new List<Task>();
        _cts = new CancellationTokenSource();
    public void Start() {  //.. same as above }
    public void Stop() { //.. same as above }

public class Program
    public static void Main()
        HostFactory.Run(x =>                                 
            x.Service<MessageProcessor>(s =>                        
               s.ConstructUsing(name=> new MessageProcessor());     
               s.WhenStarted(tc => tc.Start());              
               s.WhenStopped(tc => tc.Stop());               

            x.SetDescription("Amazon SQS Message Processor");        

Once you build an executable, you can run MessageProcessor.exe install from command line and the service will get installed, additional -help will show you all the commands that you can do.


Incorporating queues in your application architecture can help with guaranteed delivery of the business critical messages. It can also speed up your application process since it will offload the work to the external process. On the downside, your application becomes dependent on another application running in a separate process and it’s more code to maintain and deploy. To ensure you message processor doesn’t become a single point of failure, you will also need to have at least 2 nodes running this windows service for redundancy. However, if your business requires guaranteed delivery for the mission critical API calls, the overhead of maintaining message queue solution is worth it’s weight.

The problem is that most software engineers suck at sales. I personally felt like it was never my job, and never bothered to learn it. Only to discover later that if I ever wanted to grow a successful business I would have to eventually take the bull by the horns and do it. And it’s not only business, being a great salesman can help to build a successful career as well.

Having lack of knowledge and experience in selling, it was very important for me to hear a real story from a person in software industry, who was actually successful at selling his or her consulting services. Luckily, I came across this great interview on The Eventual Millionare podcast where I learned about Ian Altman . He built and sold software consulting businesses before, and he has coined his own method of selling that actually made a lot of sense to me.

I’ve always had this bias about salespeople being sleazy and dishonest, but what Ian found out is that it doesn’t have to be that way. He explains that if you focus on FIT (Finding Impact Together) you actually become a problem solver who works together with the client rather than against him. You focus on results and impact, and if your service doesn’t bring value to the client you walk away.

Enter “Same Side Selling”

I was eager to learn more about his approach and I picked up Ian’s book right away. Since I never closed a sale before, I found Same Side Selling book very informative. Further, I learned that rather than explaining all the things that you offer, if you focus on actual problems, your client would not need to translate it into his own language. And if what you are describing is an actual problem that his business is having, he will be much more likely interested in what you have to say.

The book is full of wisdom on how to sit down with a customer to identify the problem, and figure out if it’s worth solving. Author brings up analogy of puzzle pieces where you need ask a series of well formulated questions to learn about the pieces that your customer has and see if yours fit. Unless you actually get to the underlying issue you won’t know for sure that your solution will bring the results.

Honesty and focusing on the outcome rather than a sale is the best way to get on the same side with a client, and it’s much more likely to benefit both parties in the long run.

This approach of problem solving fits my personality very well, and I will definitely practice it in the near future.

I will have to read this book over again, especially before a meeting with a client, so I can refresh and tune in on the same side selling method. I will let you know how it goes.

Honest Solution

This book has opened my eyes that selling doesn’t have to be adversarial. That figuring out a problem with a client and getting his business should be mutually beneficial. Honest selling helps building reputation and gets repeat business.

Here is a simple database migration model that I recently adopted at work. It worked out really well. With little effort, we are now able to create new or upgrade existing database to any specific commit in the source control. And now since production SQL changes are migrated with an automatic batch process, it has streamlined our deploys as well.

Start with a Baseline Script

Baseline script is necessary for creating database from scratch. It’s used for setting new environments or even daily development work where you want to make sure to start with a clean slate.

If you already have an existing database you will need to start by creating a baseline script. The best time to do that is after a fresh deploy to production. That way you know your development version of database should be the same as production, and you now have a clear starting point or baseline. Here are the steps I took.

  • Script entire database schema with objects
  • Clean up schema if necessary. (I had to remove replication directives, and creation of production users since those are production only settings.)
  • Add any necessary seed data for testing purposes. When you rebuild a database it will most likely be used for development or QA, therefore most likely you will need some starting data.

Add Migration scripts

Migration scripts get applied in order in which they are created and only migrate database up. This model does not involve downgrading, simply because we haven’t found a need for it (production database is never downgraded, and local/test version can always be recreated from scratch).

When a developer is working on a feature or a bug that needs database changes, he creates a new migration file with a next sequential number in the file name. For example: if the last migration file is Script-0048.sql next a new migration script will be Script-0049.sql. > They have to be sequential because that’s how we can make sure that migrations are applied in order they were created, and can guarantee consistency between environments.

Version Control your SQL scripts

Next important piece is to version control your scripts. It plays the following roles:

  • Source control becomes a mediator, so multiple developers cannot check-in script with the same name. In addition, if there is a conflict with names developers are forced to get latest and change their script name.
  • Each branch has it’s own list of migration scripts, and there is no doubt of what your database should look like to match the code base in any branch or individual commit. It simply must have all migration scripts applied to the baseline.
  • It keeps track of changes and we want to make sure there are no changes once a migration script is in source control. (more on that in the Rules section)

Keeping track of migrations

How do we know what migrations scripts were applied to the database? Simple, we create a table that keeps track of executed script. That way it’s easy to compare what’s already executed and what scripts need to be applied to get to a specific point in time. A simple table like this will do.

Finally, your migration application takes care of figuring out which scripts are missing, executing missing scripts, and recording applied scripts in the log table.

Two Easy Rules for Stress Free Migrations

  • Once a SQL migration script is committed and pushed to the source control it must never change. We do that to eliminate potential inconsistencies between environments, because once a script is out in the wild you can assume it was already applied somewhere and if script changes that environment will never get updated.
  • Automate database migrations completely. There is absolutely no reason why you need to manually run the update scripts, it’s frustrating, error prone, and it’s waste of time. You can quickly write a batch process that will execute each script and add a record into the journal table, or you can use existing open source projects like DbUp for that. We’ve opted in for DbUp since it already does exactly that and has other nice features like wrapping all migration scripts in a transaction.

Rebuild or Migrate Your Database With One Click

We’ve created two Powershell scripts that will either create or upgrade local database with all migration scripts in the current source control tree. Rebuild will execute baseline script + migrations. Upgrade will only apply missing migrations and it’s the same script that’s used in production. There is no more need to use shared database, developer can migrate or re-create his version of the database in few seconds. I’ve also had an idea to include a module that will check on application start if the database is out of date and apply needed scripts, I wouldn’t run it in production but it’s perfect for development.

After setting up automatic migrations it was very easy to setup test environments for functional end to end testing with Selenium. Continuous integration server will pull latest from the code base, run database upgrade script, build and publish site, and execute functional tests.

Conclusion: A lot of impact for a little effort

I’ve been part of many overnight deployments that gone wrong due to some missing stored procedure, and felt the agony of chasing down errors at 2AM in the morning. It really doesn’t take long to apply this model, even less so if you choose to use existing open source libraries like DbUp. While there is nothing radical about this practice, I know a lot of companies are still manually deploying their SQL scripts. It’s a small change with big impact that will streamline your development and make production database migration smooth with guaranteed correctness. It worked out great for my company. How do you manage your database migrations?