понедельник, 17 апреля 2017 г.

Decimal.MinValue costs more than you expect

Recently during profiling session one method caught my eye, which in profiler's decompiled version looked like this:
public static double dec2f(Decimal value)
{
    if (value == new Decimal(-1, -1, -1, true, (byte) 0))
        return double.MinValue;
    try
    {
      return (double) value;
    }
    catch
    {
      return double.MinValue;
    }
}
This is part of the legacy code written years ago and according to the profiler (which was in sampling mode) there were too much time wasted in this method. In my opinion this is because try-catch block preventing inlining and I spent some time to refresh my memories about decimal to double conversion tricks. After ensuring this conversion can't throw, I removed try-catch block.
But when I looked again at simplified version of source code, I wondered why decompiled version shows strange Decimal constuctor while it is simple Decimal.MinValue usage:
public static double dec2f(decimal value)
{
    if (value == DecimalMinValue)
    {
        return Double.MinValue;
    }

    return (double)value;
}
First, I thought it is a decompiler's bug, but fortunately it also shows IL version of method:
    public static double dec2f(Decimal value)
    {
      if (value == new Decimal(-1, -1, -1, true, (byte) 0))
.maxstack 6
.locals init (
[0] float64 V_0
)
IL_0000: ldarg.0      // 'value'
IL_0001: ldc.i4.m1    
IL_0002: ldc.i4.m1    
IL_0003: ldc.i4.m1    
IL_0004: ldc.i4.1     
IL_0005: ldc.i4.0     
IL_0006: newobj       instance void [mscorlib]System.Decimal::.ctor(int32, int32, int32, bool, unsigned int8)
IL_000b: call         bool [mscorlib]System.Decimal::op_Equality(valuetype [mscorlib]System.Decimal, valuetype [mscorlib]System.Decimal)
IL_0010: brfalse.s    IL_001c
        return double.MinValue;
Well, looks like it is true that the Decimal constructor is involved EVERY time you use constant like Decimal.MinValue! So next question arises - what is inside this constructor and is there any difference in using Decimal.MinValue and defining local field like:
static readonly decimal DecimalMinValue = Decimal.MinValue;
Well, the answer is - there is a difference if every penny counts in your case:
                     Method |        Mean |    StdDev |
--------------------------- |------------ |---------- |
 CompareWithDecimalMinValue | 178.4235 ns | 0.4395 ns |
   CompareWithLocalMinValue |  98.0991 ns | 2.2803 ns |
And the reason of this behavior is DecimalConstantAttribute which specified how to create a decimal every time you use one of the decimal's constants.

воскресенье, 22 ноября 2015 г.

Pitfall of TaskCreationOptions.LongRunning

Recently I had conversation with colleague about using TaskCreationOptions.LongRunning for task which pings another service every N seconds till the end of application lifetime. Fortunately just a few days before I glanced over default task scheduler implementation in Reflector:

[SecurityCritical]
protected internal override void QueueTask(Task task)
{
    if ((task.Options & TaskCreationOptions.LongRunning) != TaskCreationOptions.None)
    {
        new Thread(s_longRunningThreadWork) { IsBackground = true }.Start(task);
    }
    else
    {
        bool forceGlobal = (task.Options & TaskCreationOptions.PreferFairness) != TaskCreationOptions.None;
        ThreadPool.UnsafeQueueCustomWorkItem(task, forceGlobal);
    }
}


The thing that catch my attention was the way dedicated thread is created. So I built small demo and it immediately confirmed that so called long-running task on default thread-pool scheduler uses dedicated thread only until it encounters first await. Afterwards it releases ‘dedicated’ thread and uses thread-pool threads to execute it’s continuation block, same thread-pool threads which are used by other tasks.

четверг, 12 февраля 2015 г.

Declarative IoC container configuration

Nowadays it's hard to develop modern application without using IoC container. Absence of container leads to coupled codebase, expensive and hard to maintain unit tests (and they're not "unit" at all) and, as result, slow development speed and low quality software.

When using IoC container it’s important to ease the process of component and dependency registration and provide enough information to reader (for example during code reviews). While I’m big fan of Convention over Configuration, it doesn't always fit my needs. In applications I develop, I split components into at least 2 categories: components with lifetime bound to lifetime of application and the majority of components with lifetime per operation, such as HTTP request or service bus message. Cache, message bus are examples of former and command handlers, repositories, ApiControllers represent the latter.

Having scope per operation brings the following benefits:
  • Atomicity and isolation. I usually treat this scope as unit of work and try to develop persistence so that result of operation is visible to outer world only after successful completion of operation.
  • Performance and scalability. Each concurrently executing request will have it’s own copy of state, handlers, repositories and so on, avoiding excessive contention. Watch and read Pat Helland’s “Immutability Changes Everything”.
  • Forget about exception safety. Writing exception safe code is hard! It is much easier to throw away scoped container with all it's instances than writing exception safe code.

So it is very important to understand lifetime of the component when reading its code. It helps find mistakes like usage of non thread-safe structures inside long-living class:

[ComponentRegistration(Scope = Lifetime.Application)]
internal sealed class MagicWand : IMagicWand
{
    private HashSet<string> _state = new HashSet<string>();
    // ...
}

As you may understand from this example, I use attribute to specify lifetime of the component. Unfortunately, there are some pitfalls when it comes to building container during application startup. When using Assembly.GetTypes() (or even GetExportedTypes()) on few specific (for example, by file mask) assemblies to discover components with registration attribute, you may occasionally load lot of assemblies which will significantly slow application startup. It may be fine for service running 24/7 that starts once in a while, but desktop application that consists of 3-5 assemblies of “business logic” and 30-50 of UI controls (like Infragistics) will suffer a lot.

This happens, for example, when assembly you’re scanning contains type which derives from type defined in another assembly. The problem can be minimized by moving components not related to DI to separate assembly, but I was looking for better solution. What if generate IoC specific component registrations at compile time?! Roslyn looks like the solution for this and many other problems I have, but I’ve got old proven hammer, which is Text Template Transformation Toolkit or T4.

Check out how I did it for StructureMap, meanwhile I’m going to do the same for my lovely Autofac.

пятница, 28 ноября 2014 г.

Service API: Batching

What if you need to transfer large number of items to remote service? The goal is to transfer information as fast as we can. What would be the most efficient solution?

Well, the first naive approach obviously doesn't work:
AccountChange[] accountChanges = ...

foreach (var change in accountChanges)
{
    client.SendChange(change);
}


We accumulating network latencies and the overall timing is awful:

Naive
======================
Elapsed: 00:00:30.4708671
Rate: 3281.82324683501

Next popular solution is to pack all items into single batch:
AccountChange[] accountChanges = ...

client.SendChange(accountChanges);
Which gives much better timing:

OneBatch
======================
Elapsed: 00:00:02.4503693
Rate: 40810.1750213733

Could be even better? Let's analyze what happens behind the scene.
First, we need to serialize _all_ items, then we transfer bits to our remote service, then service should deserialize _all_ items, before it can start processing them.
All these steps run on single thread no matter how many idle cores you have. So using single batch we removed all network round-trips, but we added latency and it doesn't scale out.

Now the winning strategy is clear: pack items into multiple batches. Smaller batch sizes lower latency, but increase overall time. Choose carefully depending on your case.

MultiBatch
======================
Elapsed: 00:00:01.0337523
Rate: 96734.9721978853

The sample is available here

But what if last approach doesn't work for us? If, for example, sender is limited in number of connections it can use or it should send items in single atomic batch? Could we still perform better then one single batch approach?

To be continued...

пятница, 21 ноября 2014 г.

Service API: Throughput and Latency

There are at least two things you should consider when designing scalable Service APIs - throughput and latency. Throughput is the number of request service can process per, for example, second. Latency is time to process single request. It is obvious, thank you, Cap! But how these two are related to scalability?

Well, throughput can be theoretically improved by scaling out your service. Add more instances and your throughput grows. I put "theoretically" because often your multiple instances end up waiting each other on some shared resource, database for example. That's why it is shouted on every corner that scalable architecture should be put into product from the beginning.

Latency is another story. You can't reduce latency by scaling out. Of course if your system is overloaded then latency goes high, you add instances and latency drops down. But you can't put it below some limit no matter how many instances you add. That's what I'm talking about!

So morale of the post: do not add latency "by design" - you can't mitigate this later on by scaling out.

But what about Service APIs? How can we add latency "by design" in Service APIs?
To be continued...

вторник, 8 октября 2013 г.

Setting WCF MaxBufferPoolSize quota can cause memory leak

If you use WCF and TransferMode.Buffered be careful with MaxBufferPoolSize setting. If non-zero value is used, then buffer manager won't release allocated memory. Which means if you set high value, let's say Int32.MaxValue, and you send or receive large message then this memory won't be reclaimed by GC.

Using 0 as MaxBufferPoolSize switches to GCBufferManager which simply allocates and releases memory every time buffer requested.

For more details look at System.ServiceModel.Channels.BufferManager internals.

вторник, 24 сентября 2013 г.

IQueryable is root of many problems

It's often suggested to avoid unnecessary abstractions in your projects. Some respectful community members like to repeat it over and over again. But how to distinguish between required and unnecessary abstraction? What values can abstraction bring in?

Let's take a look at sample code which uses NHibernate's session to access data.

 internal sealed class TaskCoordinator
 {
  private readonly ISession _session;
 
  public TaskCoordinator(ISession session)
  {
   _session = session;
  }
 
  public void DoSomething()
  {
   /* ... */
 
   var tasks = GetRunningTasks(1);
 
   /* ... */
 
   foreach (var task in tasks)
   {
    var previousTask = GetPreviousTask(task);
 
    /* ... */
   }
  }
 
  private Task[] GetRunningTasks(int workflowId)
  {
   var utcNow = DateTime.UtcNow;
   return _session.Query<Task>()
    .Where(t => t.IsActive)
    .Where(t => t.WorkflowId == workflowId)
    .Where(t => t.StartsAt <= utcNow)
    .Where(t => t.ExpiresAt > utcNow)
    .ToArray();
  }
 
  private Task GetPreviousTask(Task task)
  {
   return _session.Query<Task>()
    .Where(t => t.WorkflowId == task.WorkflowId)
    .Where(t => t.StartsAt < task.StartsAt)
    .Where(t => t.ExpiresAt <= task.StartsAt)
    .FirstOrDefault();
  }
 }


Problem is: should we use session directly or should be hide it under another level of abstraction.
But before making any decision let me ask you a few questions:

  1. Except type safety, what is the difference between these LINQ-queries and SQL statements used directly in code?
  2. Without studying whole codebase, can you tell me what are use cases and what indices should be created in database to cover these use cases?
  3. Can you estimate L2 cache hit ratio for your queries?
  4. Can you easily (easily means isolated, without building two-page test setup) test buggy method GetPreviousTask?
  5. Can you write unit (means without involving database) tests for TaskCoordinator?
These questions and your answers help you make informed decision and choose between previous implementation and introducing another level of abstraction:


 public interface ITaskRepository
 {
  Task[] GetRunningTasks(int workflowId);
  
  Task GetPreviousTask(Task task);
 }
 
 internal sealed class TaskCoordinator
 {
  private readonly ITaskRepository _repository;
 
  public TaskCoordinator(ITaskRepository repository)
  {
   _repository = repository;
  }
 
  public void DoSomething()
  {
   /* ... */
 
   var tasks = _repository.GetRunningTasks(1);
 
   /* ... */
 
   foreach (var task in tasks)
   {
    var previousTask = _repository.GetPreviousTask(task);
 
    /* ... */
   }
  }
 }

Actually title is provoking, it's not the IQueryable problem, it's a problem of any wide unbounded contract.
Wider Two Column Modification courtesy of The Blogger Guide