In an earlier blog, we provided an explanation on why the structural similarity (SSIM) index became the most popular objective model to predict perceived image/video quality, in both academia and industry. The contribution of SSIM is certainly outstanding, but is SSIM enough in real-world applications as the ultimate image/video quality measure or the ultimate image/video fidelity measure (considering its computation needs a reference image/video)?

The simple answer is NO. The complete answer could be a 100-page thesis, if not longer.

Note that the initial SSIM paper was published in 2004, and there are hundreds of papers published afterward that claimed to outperform SSIM in certain aspects, including many direct and indirect extensions of SSIM. While the progress in academic research is obvious, even now we can hardly find any other algorithm that achieves a better balance than SSIM between accuracy and complexity.

But this is not the main point here. Instead, the main question that needs to be asked is,

“What factors could stop hands-on video engineers from using SSIM in their daily work?”

Below is a list of real-world scenarios (mostly based on use cases in the video delivery industry) where one would not be “happy” using SSIM to measure video Quality-of-Experience (QoE):

  • A high-quality high-resolution (e.g., 4K) source video is transcoded into multiple video streams of different resolutions (e.g., 1080, 720, 360, etc.) and different bit rates and we would like to know the quality of the transcoded videos. However, SSIM cannot be computed, because the source (reference) and test videos have different spatial resolutions.
  • The same video stream shown on different display devices could result in very different perceptual QoE. For example, a strongly compressed video that exhibits very annoying artifacts on a 60-in TV (especially when watched closely) could appear to have fine quality when it is scaled to small size and viewed on a smartphone. However, SSIM is independent of viewing device and will give the same score.
  • The same compressed video stream is shown on a laptop twice, one at full screen and one in a small window. Again, the viewer’s QoE is very different in two cases, but SSIM can only provide the same score.
  • A high dynamic range (HDR) video (10 bits, 12 bits or 16 bits) is tone mapped to a standard dynamic range (SDR) video (8 bits) and shown on an SDR display. There is certainly information loss that we would like to capture. However, SSIM does not apply because it cannot compare images/videos with different dynamic ranges.
  • A high dynamic range (HDR) video is compressed and shown on an HDR display. SSIM is computable (because it allows setting the dynamic range of the image), but it has not been validated whether it provides meaningful scores on HDR video content.
  • A high frame rate (HFR) video (60fps, 120fps, etc.) is downsampled along the temporal direction to a low frame rate (LFR) video (30fps, 15fps, etc.). One would like to know what the impact is of the downsampling process on visual QoE, or what is the QoE benefit of using HFR video against LFR video. However, SSIM does not apply because it cannot compare videos with different temporal resolutions.
  • An engineer wants to use a QoE measure to provide real-time control in a software-based video processing/compression system. Although SSIM is not computationally demanding, it is still difficult to be computed in real-time with software, especially for HD or UltraHD videos.
  • An engineer works hard to improve a video processing/compression method and the SSIM score of the resulting video improves from 0.90 to 0.94. But how to interpret it? Does this mean the improvement is negligible, or highly visible? If it is visible, does the improvement raise the visual QoE from “good” to “excellent” range? The SSIM measure itself does not give an answer, which likely depends on the video content and application scenarios.

The list may be extended further, but what has been said above is sufficient to conclude that SSIM is NOT enough in real-world applications. At this point, the SSIMPLUS model developed by SSIMWAVE Inc. is the only existing method that addresses all of the above problems. However, the game will never end, as new needs will continuously emerge, which will provoke new development in the future.