Introduction & setup
The environment for this Snapshot consisted of a Witbe “Video & Media” Mini Robot hosted in the Witbe San Jose Office and connected to an Apple TV (4th generation) running tvOS 12.1.
The six US OTT video streaming apps tested were: Netflix, Amazon Prime Video, Hulu, Sling TV, PlayStation Vue, and YouTubeTV. VOD assets were randomly selected from the Popular Content section of each app, with similar type of content (no sports, no cartoon, etc.).
The internet service provider was Comcast Xfinity Business with 20 Mbps downloads speed. The Apple TV was connected to the Xfinity router in Wi-Fi.
We measured and verified the following KPIs:
- Global Video Availability: percentage of viewing sessions for which the video both successfully started to play & successfully played for a minute
- Video Initial Buffering Time: time between the Robot launching the video and the video starting to play
- Witbe VQ-MOS score: a score representing the video quality of the first minute of video, after the pre-roll ad was played or skipped by the Robot, measured with our famous Video Quality Mean Opinion Score algorithm
The KPIs were measured every 30 minutes on every app, over a 72-hour period, from Friday November 23rd to Monday November 26th 2018, for a total of more than 100 viewing sessions per app.
All graphs and other visual representations of KPIs in this QoE Snapshot are generated with Witbe Datalab, our new restitution interface for fault finding, root cause analysis and advanced analytics.
The challenges OTT service providers face are trifold. They need to make sure that the assets they offer actually start to play when they should. They also need to reduce as much as possible the video startup time. Finally, once the video starts, in addition to the quality of the video, they need to make sure that the video will keep running and will not break because of a playback issue or an ad insertion.
Based on these challenges, the three most important KPIs for video services providers should be: Global Video Availability, Video Initial Buffering Time, and Video Quality.
Half of the six apps that were tested therefore present an overall good Quality of Experience, with a Global Video Availability higher than 99%, an Video Initial Buffering Time lower than 6s and an acceptable VQ-MOS score.
An unusually long Video Initial Buffering Time
We are surprised to measure a very long Video Initial Buffering Time for the YouTubeTV app. The KPI is 2.5x higher than the second longest Video Initial Buffering Time (Amazon Prime Video with a decent 6.4s). By looking at this graph representing the distribution of the Video Initial Buffering Time KPI by buckets of 1s, we notice that it takes either around 8s or around 19s to start a video on YouTubeTV. We even observed a maximum of 21s.
Our first hypothesis was that this pattern was dependent on the time of the day. The chart above displays the Video Initial Buffering Time KPI over time and clearly proved our first hypothesis wrong.
Our second hypothesis was that this pattern was dependent on the assets themselves. We filtered the previous chart on one unique asset (“A Million Little Things”), and again proved our hypothesis wrong.
Given that this unusual pattern is not correlated to the time of day or the asset, we theorize that it is either an application issue (timeout or retry on the query) or a network issue (network routes, streaming server selection, dynamic CDN selection, etc.). Activating the network tracing capabilities of the Witbe Robot would give the means to analyze further.
Video Availability is the first KPI to monitor
Another surprising conclusion was the poor Global Video Availability on Hulu. 8% of the viewing sessions revealed a problem during either the start of the video (First Frame Availability) or the first minute of the video (Playback availability).
The Global Video availability is the single most important KPI for every video service provider, the best video quality and the best selection of content in the world mean nothing if the videos cannot be played. Based on our expertise a platform with one video out of ten that doesn’t play drastically increases the chance of customer churn.
The video above shows the kind of video playback failure the Robot regularly measured (Witbe Robots record video traces for every test, in order to replay the scenario and understand what the Robot measured — this video was re-encoded and compressed).
Our experience allows us to say that most of the time, such Global Video Availability issues may be due to a faulty ad insertion. Adapting ads for every user is one of the biggest technological challenges video service providers face today.
An analysis of the Blur scores in VQ-MOS
For these OTT apps, the artifact that impacted the Witbe VQ-MOS scores the most was blurriness. Sling TV and PlayStation Vue were particularly affected with scores significantly lower than the other four OTT apps.
Witbe VQ-MOS is a ten-year R&D effort in psycho-acoustic and psycho-visual analysis of video streams. It works without referential, or previous knowledge of the video, and in real-time. It is based on three main artifacts: jerkiness, blurriness and blockiness.
The four images above, encoded in PNG without loss and uploaded on this page without further compression, particularly highlight these discrepancies. The first image also shows the capabilities of the Witbe VQ-MOS to differentiate between artistic blur with a real blur artifact, which may be due to encoding or upscaling issues.
This second snapshot reveals some issues that are unusual and surprising. And yet, we want to highlight the fact that the overall quality of all 6 OTT apps tested remains really good. Considering the sheer quantity of videos these apps stream to massive numbers of viewers every day, they represent the impressive efforts of many truly talented people spending a lot of time fine tuning everything, using cutting-edge technology, and mobilized 24/7 to make it happen.
To conclude, the large and ever expending offer of video services in the US also comes with a high disparity in Quality of Experience. It seems that the app with the best overall quality is Netflix with all three KPIs in the green. They are known for their heavy investment in Video Quality, which they announced over two years ago (see our very first blog post by Witbe founder Jean-Michel Planche). Amazon Prime Video is the runner up with very good Quality of Experience which indicates that there are now a lot of excellent OTT video services to choose from in the United States. We hope that this number will only continue to grow.
For the next Witbe QoE Snapshot, we will compare OTT apps on mobile networks in Canada. Happy Holidays and see you all early January!
About Witbe QoE Snapshots
In the same way that a consumer report tests a product and publishes an analysis of its overall quality, the Witbe QoE Snapshots test digital services to make available to the market information on the true Quality of Experience delivered internationally. These QoE Snapshots should not serve as benchmarks, nor as rankings of operators by service, or by device. Rather, the goal of these QoE Snapshots is to provide a global overview of digital services, with multiple configurations and in various environments. The public will thus be able to better understand the technological complexity inherent to today’s services, like the distribution of video content. It is quite a technical feat – considering the efforts and means implemented – to broadcast videos on different devices and networks, with a quality that is acceptable by consumers with high expectations.
Since its origin, Witbe relies on a non-intrusive technology, based on Robots measuring the quality truly delivered. The Witbe Robots are placed at the edge of distribution, and connected to test devices, the same ones as those used by real users. The Robots measure the Quality of Experience actually delivered to the end-users by providing KPIs on the availability, performance and integrity of the service.
Each snapshot is composed of several analytical frames, highlighting interesting findings about the KPIs that were measured. In our last QoE Snapshot, we looked at who was the best OTT Video Mobile App in the UK. This time, we are publishing a look at the quality of six major OTT apps on Apple TV in the United States: Netflix, Amazon Prime Video, Hulu, Sling TV, PlayStation Vue, and YouTubeTV.