The need for inexpensive, reliable, 3D, 360-degree display technologies grows as augmented reality applications continue to increase in popularity. There is room for innovation in the field, as many volumetric displays have moving components, are prohibitively expensive, aren’t 360-degree, or some combination of those factors. By creating a 360-degree autostereoscopic display that is cost-effective and reliable, we could expand the audience who would benefit from collaborative, interactive experiences. Additionally, while augmented reality headsets are increasingly popular, a volumetric display readily encourages accessible, collaborative interaction without separating users by wearable technology. This technology could impact a variety of industries and demographics as the connected media landscape continues to expand.
Approach - Hardware
This system is comprised on three main components. First, a 4K monitor displays properly distorted, lenticularized content using a custom shader. That light-field information is then partitioned by a refracting medium, either a radial lenticular array or a holographic optical element. After the rays of light from the pixels bend through the refracting medium, the rays then bounce off the conical mirror, producing the proper image for each viewing angle.
Hardware - Radial Lenticular
The radial lenticular design is dictated by several parameters which all trade-off with one another. For the design of this lenticular, I plan to focus on maximizing the resolution and size of the image rendered in the cone, while also maintaining a reasonable number of viewing angles. Variables that can be changed include the number of cameras (which corresponds to the number of discreet viewing zones), the number of views per lenticular slice, and the number of times a particular sequence of views should repeat. Those variables dictate the minimum viewing radius size, the angular size of each lenticular slice, the angle of influence for each viewing zone, and the size of the image rendered in the cone. It is important to consider what a user can reasonably see reflected in the cone, before the rays no longer bounce back to the user.
Resolution is one over the number of views per each lenticular wedge. As the user moves radially around the display, the rendered views that are opposite the user (i.e., on the other side of the radial mirror) cannot be seen. Because of this assertion that the user doesn’t need to see all of the potential views from a static position, radial lenticulars operate differently than linear lenticulars. With a linear lenticular, every view that will be displayed needs to exist under every lenticular lens for the effect to work. For example, if there are eight potential images lenticularized, there would be a slice of each of those eight views underneath each lenticular lens. However, with this setup, it is not necessary to produce every view under every lenticular lens, because each lenticular lens only needs to display the views that the user could potentially see from that lens. Because of that, there is a rolling priority system for which views to display under each lenticular lens. If the user is standing directly in front of where a viewing zone is located for a particular image, the lenticular lenses closest to the user should have that view centrally located under the lenticular. The lenticulars that are adjacent to that area should still have information that can reach the user; however, the views have to shift to accommodate the off-axis position. In that way, there is a rolling priority system, where the generated view that is closer to the user will appear centrally under the lenticular lens and incrementally move off-center and eventually disappear as the user moves radially around the display.
Hardware - Interactivity
For hardware integration for interactivity, I plan to use Intel Realsense cameras, microphones, and arduino powered LEDs that all communicate with Unity. There is a plugin to use Intel Realsense in the unity environment, along with libraries that allow for more detailed signal processing and pose analysis. I will use serial messaging in unity to send message to the arduino that will control the LEDs that will wrap around the housing of the device. The LEDs will be used to indicate important signals to users. Those signals include that the system can see the user and whether the user appears to be engaging as well as can signal when the program is performing a request that requires time. Additionally, I will use unity to receive and process microphone inputs.
Software - Custom Shader
Unity has built in support for creating complex shaders that can be applied real-time. In Unity, I will write a shader that appropriately partitions the views based on the number of cameras in the scene, the number of views per each lenticular lens, and the number of times each sequence of views should repeat. Before the views can be lenticularized the media from the camera must be rotated around the display to match where it will physically appear on the reflected cone. If you don’t do this step, all of the views will be rendered on top of each other, resulting in an incorrect result. By rotating the views to their appropriate location, the imagery you should see when standing on the left of the display will appear on the left and same for all other directions. After the views have been rotated to the appropriate location, each of the camera’s views are sliced and reordered within the shader to produce the lenticularized result.
Software - Interactivity
Intel Realsense and similar depth sensing devices like Kinect and Leap Motion provide SDKs that allow developers to readily stream depth and color data into Unity. The Intel Realsense data can be further processed and interpreted using software like Nuitrack to do skeleton tracking which exposes body pose and joint positions. This can be used to allow the system to know when a user is close enough to the display or standing a particular place. This is good for gesture tracking as well. The color data from the Intel Realsense camera can be processed in OpenCV which has a Unity SDK. With OpenCV, the color data can be analyzed for object detection, face detection, and recognition and face pose analysis. This allows the system to recognize objects, people, and face pose (which could be used to interpret affective state). Face and gaze detection could be used in lieu of trigger words like “OK Google” and “Alexa” as presumably the users have intent to interact with the system if they’re looking at the character/sensors.
The sound recordings are used to detect volume, pitch, and speech. The speech is analyzed using a cloud-based service, which then streams the input and a response back to unity to influence how the character animates and responds. The speech analysis could be used to interpret special assigned words to trigger activities, games, or special animations or content in the display. The response generated by the cloud-service can be used to animate the characters mouth if the character audibly responds to users. The coupling of depth/RGB data and audio allow for more nuanced understanding of a user’s intent and affective state. In combination, this could be used to drive sympathetic animations from the character. Because the RGB data allows for face recognition, the character can potentially store information about users to be retrieved whenever that user interacts with the system.
Software - Procedural Character
I chose an animated dog character as the embodied representative for several key affordances. The face and body of this character have already been rigged and those parameters are accessible in Unity. The design of the character is ideal for this system and for expressing emotion because the head size is large relative to the body and the facial features of the character are contrasted which will read better with this device. Additionally, because this character is familiar and has recognizable physical embodiments of emotion, the user will more readily understand key animation poses. For example, if the character is happy, it will wag its tail and stick out its tongue, but if it’s nervous or upset, it will put its head and ears down. Based on the signals received and processed from the Intel Realsense camera and microphone, the dog character will animate. Those inputs will impact the character’s emotional pose, LookAt( ) behavior, and special animation sequences.
The ultimate goal of this project is to produce a low-cost, reliable, 3D, 360-degree autostereoscopic display. I will visually inspect the hardware/software implementation to determine if the desired 360-degree display is produced and measure the specifications of this display to other closely related devices. I will evaluate our software/hardware implementation by creating test patterns to discretize each individual view zone. Additionally, we will perform user studies to verify that the interaction system is intuitive and engaging.
The questions I’d like to answer about this project lie at the intersection of the affordances of the display itself and the media users can interact with. Because I’m designing a procedural character for this display, I want to explore believability and engagement through the user study as well as whether placing a character in-situ in a space impacts user experience. The nearest neighbor devices for the study will be an AR headset and a 2D monitor. In this individual-use user study, participants will engage in activities that evaluate the character’s ability to engage. The participants will complete a close ended warm-up task both meant to acclimate and provide a concrete objective. After the warm-up task is completed (e.g., playing catch), the participants will be asked to engage in a more exploratory capacity with the character on each of the displays. The activity will involve engaging the character’s LookAt( ) behavior, object detection, face-pose detection, speech recognition, and visual accessories (like the LED light-strip). The order in which the participants use each display will be randomized to avoid conflating intuitive use with prior experience with the activity they’re asked to compete.
If this device were to become commercially viable, it would have to be accessible for a wide variety of ages and expertise level. Because of that, I will look for a diverse group of participants with varying degrees of prior experience with technologies like video games, AR/VR, and voice assistants. As I design and implement features for my project, I will carefully prioritize scope that builds towards the user-study experience.
If the radial lenticular works optimally, it would be amazing to explore a portable version of this novel volumetric display. Because the optical elements of this device have no moving parts and are lightweight, this display would be a great candidate for portability. It’d be fascinating to expand on the AI character’s capabilities by adding context aware infrastructure that changes how the character responds depending on localized metadata curation.
Ethically, this technology has the potential to make volumetric displays incredibly accessible at scale, because both the mylar cone and radial lenticular would cost less than one dollar to fabricate when productized. This would severely reduce cost on a market where displays typically cost thousands of dollars per unit.