Stereoscopic means two. Two eyes, one for each of ours, because we see the world twice over and understand things like distance, perspective, speed, acceleration and a whole slew of unconscious things because of this. Cinema has been trying, and trying because it’s still not perfect, to make movies that give us that same experience in the cinema. In fact, we’ve been trying to do it since the day we invented a camera.
For VR, this idea gets especially tricky. To capture a scene in a way that creates not one, but two panoramic images that when shown one to each eye creates a consistent and believable feeling of depth, is hard. But maybe not because of the reasons you’d imagine.
What makes stereo so hard?
The problem with stereo is two fold. On the one hand we have trouble, immense trouble in fact, in creating a perfectly parallax free image of the world – just for one eye. If that process is doubled, and there are inconsistencies in where things are to each eye, not only is the illusion lost but you’ve just created a sure-fire headache for your audience as their brains work overtime to reconcile those differences.
On the other hand we have a problem of viewpoints. Ideally, any orientation our viewer can take, wherever they look – we need to give them two pictures. Those pictures always have to have been captured slightly offset by the same distance. (Our eyes can not move around in our heads) When making a movie, that’s rather simple in principle – just put two cameras side by side and start filming. If we did that in VR – things go pear shaped the moment someone starts to turn their head.
As we turn our heads, the effect disappears (the cameras see almost the same thing) but worse than that, when we turn around the views (left and right) become swopped!
To solve this, many filmmakers are trying to emulate cinema’s way of capturing stereo – placing pairs of cameras side by side all the way around a sphere. The idea is this – if we capture enough pairs, and then stitch all the left eyes for one view and all the right eyes for the other – we’ll get stereo. And you do… to an extent.
You can only fit so many cameras into a rig, no matter how small they are. Human eyes, can reach millions of possible positions, even if all we do is slowly rotate our heads. But a rig of cameras can only represent a few of those, so what happens to all the small increments? Well, nothing, they don’t exist.
We would need very many more cameras, pointing in very many more directions, stitched in a very complicated way to even approach a level of stereo filmmaking. When we consider the fact that NO physical camera system can capture a perfectly parallax free image in 360 degrees the problems become a little easier to comprehend. If we can’t adequately insure that even one view is correct, how to do we it for all possible views?
Even if we could build a camera that captured hundreds of views, even more complicated than even the most ambitious Jaunt or Google or hobby camera, there are fundamental problems in capturing that many viewpoints. How do we know, which view should be stitched? Eyes are not always perfectly level and neither are they always looking straight ahead. We can tilt our heads, it’s second nature, but it ruins everything from a stereo playback perspective. Because the images were stitched assuming a certain posture on your behalf, images were matched as though your eyes were level and now they aren’t.
This problem, and it’s related challenges are neatly described in the hairy ball theorem. The idea is roughly: You have a ball; picture a longhaired tennis ball for example, though any ball will do. But it has to be completely covered in hair, everywhere. And you have to comb all the hair so that it lies flat.
No matter how you well you comb, or which way you go about it, there will be one part where the hair will stand up – away from the ball. This is such a simple but fundamentally frustrating idea.
The problem of the hairy ball is the same as that of stereo 360 capture, if you imagine that each strand of hair is a possible view into the world and they need to be paired together to create stereo. It doesn’t matter how, or in how many ways you go about pairing the cameras up to create the panoramas, at some stage you’re going to end up with views that can’t be paired up anymore. Vi from EleVR did an exceptionally easy to follow explanation of the theory, and I recommend reading it here.
There are more than a few stereo VR videos being made, and to be honest, a lot of them are breathtaking. But all of them will suffer from this problem. Fundamentally, it is not possible to represent all the millions of possible view combinations that our eyes can create using only 2 flat images. If the viewer, the person with the headset, plays along and doesn’t go to wild with their head movements all is well, but lean in, tilt your head around or lean all the way back (flipping the world) as some more adventurous viewers immediately do, and the illusion is broken.
Using a 3D renderer helps. The ability to capture parallax free with an unlimited amount of cameras let’s you do things that are literally impossible in the real world. The stereo panoramic render mode for Arnold, for example, has incredible fidelity and can handle mild head tilting as well – because of the complexity in the “views” combined. But even these idealized virtual cameras run into the hairy ball problem at the poles, and it is recommended that the “stereo-ness”, the amount of disparity between the two eyes be reduced as we near the top and the bottom of the picture. The problem, fundamentally, lies in the storage and playback of stereoscopic information, more than our ability to create them.
So is truly stereo panoramic video impossible? Yes.
But only if you think video has to stay fundamentally a flat medium.