Stanford’s Neural Holography Boosts Real-Time VR, AR Experiences
Credit: Dong Wenjie/Getty Images
The history of VR/AR displays is one of tradeoffs. Most of the market is dominated by single-plane models, which force the viewer to un-naturally focus on a single distance no matter how far objects in the scene should be. Waveguide-based multi-focal displays, like those by Magic Leap, are expensive and have a limited field of view. So there has been increasing interest in various alternatives. One of the most promising areas of research is holographic displays, which promise an easy-on-the-eye experience with realistic results.
Unfortunately, generating images for holographic displays is a complicated and time-consuming process. That’s where new research published by Stanford University Assistant Professor Gordon Wetzstein’s lab and presented at SIGGRAPH this month comes in. The technique, called neural holography, uses a specialized neural network trained with a camera-in-the-loop simulator to generate high-quality results, and that can run in essentially real-time — around 30fps currently.
How Holographic Displays Work, Simplified
For many of us, our first memory of a holograph was a dim, monochrome image of a common household object hidden behind glass in a museum display case. So it is nearly magical that they can be projected in color through a personal viewing system. But the basic principles haven’t changed: A laser light source is used and collimated (so that all the light waves are traveling in parallel). Then it’s transmitted through a spatial light modulator (SLM), which varies the phase on a per-pixel basis.
The result is a light field with interference patterns that create a 3D image of the scene. The user views in the image through a lens that results in a 2D projection onto their retina. In its simplest application, SLM uses a fixed transform, but for improved results, the transform needs to be more sophisticated. Stanford’s effort, for example, treats each pixel individually.
How Neural Holography Improves the Holographic Display Pipeline
CGH (Computer-Generated Holography) is the process of trying to recreate a scene as a holograph projected by a display — in this case, a near-eye, typically head-mounted display. Aside from the hardware, the biggest challenge in creating a realistic image is the transform applied by the SLM. It needs to create a believable holographic projection using only phase changes of the light passing through it.
Existing algorithms for calculating that transform are typically either direct, fast, and not very good quality, or iterative and too slow for real-time use. In the Stanford team’s paper, they provide numerous examples of existing methods and their shortcomings. To address them, the team focused on two complementary innovations.
First, they added an actual camera to the typical holographic simulation rig to enhance their ability to calibrate and train the system. By including optics, the rig is a better analog for a real display and the human eye than traditional systems that only look at the output image from the SLM. By training the system using optimizations such as structured gradient descent (SGD) to “learn” how to create high-quality transforms for a display’s SLM, they created iterative algorithms that improved on other published results. The camera is only needed for calibration and training. Once that step is complete, the results can be used with a simpler system for display.
Second, the team built an efficient neural net, Holonet, that they trained to create a model of the system itself, including both the SLM transform and optical aberrations. That model is used for displaying images, including ones not in the initial training set. The high-performance inferencing approach allows it to calculate the needed transforms in real-time, even for images as large as 1080p. As a result, the team is able to achieve direct results that are as good or better than previous iterative algorithms, and nearly as good as their own CITL iterative results.
Neural Holography Shows Impressive Quality With Excellent Performance
The team compared results from Holonet to a number of the leading previously published algorithms, including Wirtinger Holography, DPAC, GS (Gerchberg-Saxton), as well as to their initial CITL (camera-in-the-loop) effort. It produced superior results to all of them while providing impressive performance. Above is one frame from their comparison video, but you can see the full comparisons and Wetzstein’s SIGGRAPH talk online at Stanford’s Computational Imaging site.
Holonet Isn’t Limited to Holographic Displays
Wetzstein sees holographic displays as one of the most interesting areas for research in AR/VR displays, as it has not been developed nearly as much as more traditional options. However, he doesn’t see his team’s Holonet effort as only being useful for existing holographic displays, since varifocal and multifocal display rendering faces similar challenges. The team is exploring ways that the results can be combined with varifocal and multifocal solutions to create holographic versions of those approaches that would help address both realism and some common issues such as vergence-accommodation conflicts.