What if you need to transfer large number of items to remote service? The goal is to transfer information as fast as we can. What would be the most efficient solution?
Well, the first naive approach obviously doesn't work:
Naive
======================
Elapsed: 00:00:30.4708671
Rate: 3281.82324683501
Next popular solution is to pack all items into single batch:
OneBatch
======================
Elapsed: 00:00:02.4503693
Rate: 40810.1750213733
Could be even better? Let's analyze what happens behind the scene.
First, we need to serialize _all_ items, then we transfer bits to our remote service, then service should deserialize _all_ items, before it can start processing them.
All these steps run on single thread no matter how many idle cores you have. So using single batch we removed all network round-trips, but we added latency and it doesn't scale out.
Now the winning strategy is clear: pack items into multiple batches. Smaller batch sizes lower latency, but increase overall time. Choose carefully depending on your case.
MultiBatch
======================
Elapsed: 00:00:01.0337523
Rate: 96734.9721978853
The sample is available here
But what if last approach doesn't work for us? If, for example, sender is limited in number of connections it can use or it should send items in single atomic batch? Could we still perform better then one single batch approach?
To be continued...
Well, the first naive approach obviously doesn't work:
AccountChange[] accountChanges = ... foreach (var change in accountChanges) { client.SendChange(change); }
We accumulating network latencies and the overall timing is awful:
Naive
======================
Elapsed: 00:00:30.4708671
Rate: 3281.82324683501
Next popular solution is to pack all items into single batch:
AccountChange[] accountChanges = ... client.SendChange(accountChanges);Which gives much better timing:
OneBatch
======================
Elapsed: 00:00:02.4503693
Rate: 40810.1750213733
Could be even better? Let's analyze what happens behind the scene.
First, we need to serialize _all_ items, then we transfer bits to our remote service, then service should deserialize _all_ items, before it can start processing them.
All these steps run on single thread no matter how many idle cores you have. So using single batch we removed all network round-trips, but we added latency and it doesn't scale out.
Now the winning strategy is clear: pack items into multiple batches. Smaller batch sizes lower latency, but increase overall time. Choose carefully depending on your case.
MultiBatch
======================
Elapsed: 00:00:01.0337523
Rate: 96734.9721978853
The sample is available here
But what if last approach doesn't work for us? If, for example, sender is limited in number of connections it can use or it should send items in single atomic batch? Could we still perform better then one single batch approach?
To be continued...