It’s so easy that we take it for granted. We grab our cell phone, click on a Netflix or YouTube video, press play, and presto!, we have what amounts to television in the palm of our hands, coming to us from anywhere in the world.
But the truth is that the journey a video takes, from its creation with a camera to a viewer’s screen – glass to glass, as those in the industry like to call it – is long and complex. It’s a technological miracle that the process works as seamlessly as it does and, from the viewer’s vantage point, is so simple to use.
The miracle, largely hidden from the viewer, is that sending a video over the Internet requires moving “enormous amounts of data,” says SSIMWAVE CEO and co-founder Abdul Rehman.
Moving that much data, doing so at a cost that doesn’t become prohibitive to those doing the sending, means that the data must be compressed, and usually more than once. At the other end, all that data must be reconstructed into pixels suitable to the kind of screen or device the video is being viewed with. Picture acuity, colour, dynamic range, brightness, all require different treatments. Different viewing options – TV, tablet, phone – require that multiple versions of the video be instantly available that are tailored to each device’s unique properties.
SSIMWAVE’s technology monitors a video’s journey at each stage from the point of view of the human visual system – in effect deciding if a viewer would be pleased with the product – or not – before it gets to an actual human being’s eyes. SSIMWAVE, then, helps ensure the video quality remains intact through the journey and that the viewer will ultimately receive a pleasing product.
“SSIMWAVE’s business is multiple point monitoring – that’s the first thing we have to do, and do properly,” says Dr. Zhou Wang, SSIMWAVE’s co-founder and Chief Science Officer.
Compression – or encoding as it’s known – is the first stage of the video journey.
“Basically, you convert that data to something more manageable,” says Dr. Rehman. “Let’s say, from gigabits per second you come down to megabits per second. So, compressed by an order of 10, at least.
“Encoding is a very tedious, complex job.”
After encoding, the video moves along to a data centre – perhaps one controlled by Rogers, or Amazon – and from there it is distributed to those users who want to view it.
The problem here is the inherent unpredictability of the viewer’s choice of device – will the command be to view it on a TV or a cell phone? – as well as the unpredictability of the size of the pipe that will deliver the product; will it be the thick fibre available at a residence or the relative thin pipe available to a passenger’s cell phone while driving along the 401?
To manage those requests, multiple versions of the video are created that are of different sizes. Creating those versions requires transcoding, which is to say, further compressing a file that has already been compressed.
After transcoding, the video is packaged, which means the live streams are converted into files, or segments, which makes delivering the video easier, and allows for a dropped connection or a change in bandwidth – perhaps because several people in a house suddenly began watching on several different devices.
As the compressed video – essentially ones and zeros – is reconstructed into pixels on a device, impossibly fast decisions are made. Similar blocks within each video frame are copied in order to save time and bandwidth.
From SSIMWAVE’s point of view, the most important places that quality must be monitored and probes deployed is at the encoding/transcoding junctions.
“From a decision-making perspective, you have millions of decisions that are being made in this delivery chain. The most complicated decision-making engines are the compression ones. The encoding devices have to find smart ways of squeezing the data while keeping quality as high,” says Dr. Rehman.
“We entirely rely on visual data to directly say [to the encoder]: ‘This task was given to you, this is what you accomplished, and this is the quality at the end of the day.’”
“The other parts [of the delivery chain] are not as complicated. There is live traffic coming in, divided into packages. Here you’re checking package health, and the relationship between multiple packages. You are also checking so that if a video player is going to switch from [one resolution to another], is it going to be a smooth transition or not?”
“The important thing is, what is the viewer’s experience at the end of the day? Let’s say everything that’s supposed to get there, from [the beginning of the delivery chain to the end], gets there, but viewers are still not happy.”
“Perhaps you have a wire cut. Or you have a server down. You have to make alternative decisions. What was happening?”
“The nuances and the amount of problems that can happen across the delivery chain are unique.”
Once SSIMWAVE’s probes detect a problem, its technology then is capable of prescribing a fix.
In the past, engineers would bring to bear years of encoding experience and make adjustments according to what they believed would work – less science than art.
“These engineers, they’ve been working on encoding configuration for 20, 30 years,” says Dr. Wang. “They operate these encoders based on their experience. They’re engineers, but they work like artists, based on their experience. They adjust here and there. They try something – ‘this I like, this I don’t like’ – and they keep trying until they find something that seems good. But this is not scientific.
“There’s an optimal point to compromise with respect to resolution versus bitrate; the thing is, this optimal point is highly content dependent. For different content, it will be a different pair or combination of the two.
“Higher resolution doesn’t necessarily mean better quality. More pixels is harder to transmit that video to your TV. “
SSIMPLUS’ metric, he says, can identify that optimization point.
“With SSIMPLUS, you don’t need to rely on the experience of the engineers. They can only guess. It might be a good guess, but if you have many different kinds of content, it’s hard, and they won’t be accurate. They’re artists.
“SSIMPLUS is converting the artist’s job to a scientist’s job. If you model the human visual system correctly, you get the right metric that tells you where the optimization point is. It’s automatic. It’s fast. That’s the power of science.”
“I think our major contribution to this particular question, about operating encoder and transcoder, is as simple as converting art into science.”
And in so doing, delivering quality video to the palm of our hand from anywhere in the world.