In real-world video delivery systems, video engineers are often faced with major challenges while trying to monitor and control the quality of the videos being delivered. Over the years, people have converted to the view that the quality-of-delivery (e.g., bandwidth allocation and smoothness of delivery, sometimes interpreted as quality-of-service) is the only piece of the overall picture that matters. What really matters ultimately is the perceptual quality-of-experience (QoE) of the users at the very end of the video delivery chain. QoE is highly personal and may be influenced by many factors, but there are certain aspects that will strongly affect QoE and are predictable and manageable even before the video reaches the end users. One such aspect is the impact of the viewing device, as well as the resolution (numbers of pixels in each row and column) that is used to display the video on that device. For example, a strongly compressed video may appear to have fine quality when viewed on a smartphone but may exhibit annoying artifacts on a large size TV.

Fortunately, the impact of viewing device and resolution on visual QoE is predictable and manageable earlier in the video delivery chain. Many real-world systems allow user devices to send feedbacks about device and resolution information back to the network or video hosting server. Even if that is not the case, statistical data about viewing device types and resolutions may be available for a specific video service. Such information could help the quality control modules at the video hosting/network servers to make better compression and streaming decisions during both video preparation and delivery stages. The question is:


Apparently, to manage user QoE, one has to be able to measure it first. Somewhat surprisingly, in both the academic research literature and real-world industrial products for objective video quality assessment (VQA), the impact of viewing device and resolution is largely ignored. The most popular VQA metrics such as PSNR, SSIM, MS-SSIM, VIF, MOVIE, and VQM give the same score to a given video stream, regardless of on which device and resolution the video is being experienced. Even worse, when evaluating the videos coming out of video transcoders, these metrics cannot even be computed because the video resolutions are often changed after transcoding and these metrics can only compare videos of the same sizes.

In practice, engineers often resort to some quick remedies. For example, in the case of cross-resolution video transcoding where the videos are transcoded to a lower resolution and a lower bit rate than the source video, one can re-scale either the source video down or the transcoded video up to match the size of the other, followed by comparisons using standard VQA metrics. Such a remedy is problematic for multiple reasons. For example,

  • Re-scaling of an image or video frame inevitably creates aliasing (caused by downsampling), blurring (caused by low-pass filtering for anti-aliasing and/or interpolation), and/or misalignment problems. Any of them could produce further distortions, which may sometimes dominate the transcoding artifacts (e.g., even half-pixel misalignment could drop PSNR or SSIM drastically but would not lead to significant perceptual quality degradation). As a result, the scores given by the VQA metrics fail to reflect the actual quality degradation created by transcoding.
  • Still, only one score is obtained for every transcoded video, which could eventually be displayed on screens of different physical sizes and resolutions where another stage of re-scaling may occur, resulting in different visual QoEs. For example, when a small resolution video (say, 360p) free of compression artifacts is shown on a large-size high-definition TV (say, 60in at 1080p), the interpolation/scaling process could produce highly blurry pictures, which cannot be captured by running the video quality metrics on the low-resolution video content.
  • The scores produced by the VQA metrics do not make meaningful comparisons across different resolutions. For example, a PSNR value of 35dB may mean very different perceptual QoE for videos at two different resolutions. This drawback is particularly problematic when one has to make streaming decisions to choose from different resolutions for a streaming request.

In the end, quick remedies are like shooting in the dark. The only way to do it right is to employ a better video QoE measure that is built upon deep understandings of not only the human visual system (HVS) but also the impact of the display device, video resolution and viewing condition parameters on perceptual video quality. Fortunately, a new solution, named SSIMPLUS, has emerged in the past few years and has been made available to the industry by SSIMWAVE Inc. SSIMPLUS addresses the cross-device and cross-resolution issues in a highly accurate and efficient manner, together with many other desirable features such as cross-frame rate assessment and high dynamic range video quality assessment. It has been used for transcoder testing and development, encoding ladder design, and real-time video quality monitoring, among many other applications. To get an idea about how SSIMPLUS works, readers can refer to

  • A. Rehman, K. Zeng, and Z. Wang, “Display device-adapted video quality-of-experience assessment,” IS&T/SPIE Electronic Imaging: Human Vision and Electronic Imaging, San Francisco, CA, Feb. 2015.

Readers may also consult with SSIMWAVE Inc. for various uses of the SSIMPLUS metric.