A few months ago, Stephen J. Bigelow published an article listing 5 advices on how to reduce hybrid cloud latency. The good thing about this article is that it addresses a real issue: latency. However, we felt that we needed to add a few things to properly answer this core problem.
Processors wait all the time
Way too often, we mistakenly think that the performance of an internet service is directly associated with the computing power of a platform. However, when you take closer look, most of the time, processors are… waiting! Of course, we try to give them increasingly more things to do (virtualization), but it isn’t the core of the subject: we’re wrong to think we know where the issue comes from. When you split the information into various systems (in different places, geographically) and when you dismantle data processing centers’ storage units, you are winning in terms of “scalability” and redundancy. Nevertheless, by adding latency on every level, you are losing processing speed. Move information from acquisition cards to the computer memory, then onto the processor, after that onto the network interface and to finish with, from network to network: the time lost in these moving operations has become significantly higher than the information processing itself. It goes even further. Nowadays, with all this back and forth, the TCP Protocols induce latency in every request: establishing connections, acknowledgement, data transfer, and closing connections…
Amongst other things, Stephen J. Bigelow’s article insists on the need for proximity. Although it is true that the distance between the different connections does have an influence on the latency (especially if it goes all around the world from lack of peering), it is wrong to think that it is a necessary and sufficient condition. In the Internet world…
… the fastest way is not always the shortest one.
At best, it will be possible to reduce the distance by a few thousand kilometers and you will win a few milliseconds. However, there is much more to lose. Why? Because of all the hops and the efficiency of the routers deployed (performance, quality of settings, traffic prioritization order, overloading, uncertain QoS management mechanisms). This is why I’d rather have a 5000km path, as direct as possible, with IT engineers who know what they are doing, than a path with 18 hops going a few hundred kilometers around Paris, and overwhelmed routers on their knees.
Let’s go back to the 5 advices on how to reduce latency.
Yes, it is important to work with direct peering between IT centers and cloud infrastructures regarding the supervision of the quality of your service. Not because it shortens the distance, but because direct peering will reduce the number of hops and “unsupervised” infrastructures in between. Thus, it will reduce the latency.
In fact, the most important advice is in the conclusion: :“However, developers have to take the time to assess the possible conceptual changes that might help”. And here… it does not help much if you don’t have any way to evaluate these changes. Thankfully, there’s a solution. The following comments come from our experience as an Internet Operator and 16 years’ hindsight helping most global actors in improving their services’ performance.