LIV Mobile

The first all-in-one easy-to-use mixed reality capture solution requiring only a Quest headset and a phone. The first (and only) mixed reality capture solution for Android.

I championed and started the project, and worked on every aspect of its development, including being its sole maintainer after the team moved on to other projects.

LIV Mobile (aka LIV.tv) has been one of my proudest career achievements yet. When I started tinkering with mixed reality capture tech back in 2016, one of my first experiments was taping a Vive controller to the back of an old iPad, and using it as a “magic window” into VR. The person holding it could walk around the room and get a better idea of what’s happening in the virtual world, making VR a much less isolating experience. The Quest, ARKit, and real-time AI background removal didn’t exist at the time, but four years later, the tech was finally in place to realize my dream, and I got to build it at LIV.

In many ways, the LIV Mobile was a continuation and combination of the LIV for Quest and LIV Camera iOS app that came before it. I wrote the first prototype on top of the LIV Camera app’s codebase in Swift, Objective-C, C++, and Metal, and utilized its existing AI background removal functionality from ARKit. It's alive!

Once we knew our vision was possible, it was time to build the real thing. After much deliberation, we settled on building the app in Unity. This had several advantages:

Supporting both iOS and Android became a much simpler task for our small team. Instead of building two completely separate apps, we only had to build one with video decoding and encoding backends specific to each platform.
When thinking about how we would scale the team, we thought finding graphics/XR engineers with Unity experience was going to be much easier than finding ones with native iOS and Android experience.
Unity’s slightly-higher-level graphics API wrappers for blitting textures and similar tasks meant the camera + game compositing system would be less daunting to non-graphics developers than one implemented with Metal/OpenGL.
Unity is a powerful 3D engine that would give us flexibility to try out more advanced features in the future that would be more work to implement if we had a custom engine on top of native graphics APIs.

We took my Quest video streaming and H.264 decoding code out of the previous prototype, and not long after, we had a Unity-based version up and running on iOS, which evolved into the publicly-released LIV Mobile app. Later, I created a new network video streaming protocol, and wrote an H.264 encoder with Apple’s VideoToolbox API, to send composited video back to the headset for an in-VR viewfinder feature.

In this screenshot:

The Quest headset is rendering a frame of the 3D world from the perspective of the iPad’s camera, and encoding into an H.264 stream.
The iPad receives that video stream over WiFi, and decodes each frame using the VideoToolbox API (or MediaCodec on Android).
The iPad is using AI to cut my body out from the background, and then layer that cutout on top of the game frame that it just decoded, creating a composited frame.
The iPad encodes the composited frame into another H.264 stream (again, using VideoToolbox/MediaCodec), which is sent back to the headset.
The headset decodes the frame it just received, and shows it in VR so that the player can see themselves without taking the headset off.

Calibration

There’s one important bit I haven’t mentioned in making the app work: aligning the coordinate space of the Quest with the coordinate space of the phone. While ARKit does a somewhat-decent job of tracking the phone’s position and orientation in space, and the Quest headset does a very good job of tracking itself and its controllers, it doesn’t know where the phone is relative to itself.

Making the VR system aware of the location of the camera is absolutely crucial for mixed reality capture, since the virtual camera needs to be positioned relative to the headset and controllers in the virtual world the exact same way it’s positioned relative to the headset and controllers in the physical world. If this is inaccurate, objects you’re holding in VR won’t look like they’re actually in your hands.

I came up with a simple-but-novel approach to solving this problem, which took advantage of several unique things when using a Quest headset and a phone together:

The Quest controllers have a flat ring at the ends, and we can easily determine where that ring is in the headset’s coordinate space.
Smartphones have flat screens.
The cameras on phones are nearly always perpendicular to their screens, and they don’t move relative to the screen.
The Quest and the phone’s tracking both have the same gravity vector since they are both in the same room on Earth, so we can be certain their pitch and roll are identical.

This meant that by pressing the controller against the screen, we could determine the location of the screen in the headset’s coordinate space. With that information, determining the location of the camera in the headset’s coordinate space was trivial. We could then show a diagram on the screen to tell the user where to put their controller. I implemented it with Unity’s math library, and you can see it all in action in the video at the top!

It’s also worth noting that this was the first ever mixed reality capture app that allowed for the camera to be moved around, instead of just locked in to one location in space, without using a dedicated tracker from the VR system. Typical MRC setups often determine the camera pose permanently by using the pose of the controller during calibration. During the calibration click action, I saved a 4x4 matrix representing the controller ring’s pose, and calculated (and saved) the inverse of the 4x4 matrix representing the phone’s pose at that point in time. The phone’s saved inverse matrix could then be used to transform the phone’s current pose into its pose relative to where the controller was during the click, and the saved controller matrix could then be used to transform that pose into one in the headset’s coordinate space. I also zeroed out the pitch and roll components of the transformation matrices–otherwise, accurate calibration could only be achieved if the controller was both touching and rotated a specific way relative to the image on screen, which would have slowed down users, increased failure points, and required more complex setup guidance.

Porting to Android

Android presented a new set of challenges though. First, we had to come up with a code architecture that allowed us reliably share code across platforms. For our C# code in Unity this was obviously straightforward, but we had to come up with a way to share our C++ networking codebase, while separating out our platform-specific video decode/encode codebases. I chose CMake as our build system, since it simplified the process of building the native library in different ways for different platforms (I even got the app running on macOS and Windows later on to simplify the development workflow).

The real issue with Android however, was the lack of background removal functionality in ARCore–something crucial to making the app work. While I would have loved to develop our own AI model, it was too time-consuming of a project for us to engage in, so I began a search for companies licensing similar technology we could use. I was integral to forming our partnership with Segmentive, an amazing team that built us some excellent-quality, high-performance models we could use on any platform.

Getting an H.264 feed out of Android’s MediaCodec decoding API and rendering in Unity also required some real voodoo magic, that I spent quite some time tearing my hair out over (and this might be the only place on the internet it’s documented…):

First, a SurfaceTexture needed to be created. This object ties together an OpenGL texture and an Android Surface, allowing us to set the decoder, which outputs to a Surface, to give us something we could render in Unity (or any other OpenGL pipeline).
Next, a Java class had to be created that implements the SurfaceTexture’s onFrameAvailable callback. This part was particularly annoying, since up until this point, our codebase had zero Java code in it–only C++, and C# (we used the Android NDK APIs for MediaCodec).
Get an ASurfaceTexture* object from the Java SurfaceTexture, so we could work with it in our NDK/C++ code.
Create a C/C++ function that gets called from Unity’s render thread (using IssuePluginEvent–if it’s called with a normal function call, it won’t run on the render thread, and Unity’s OpenGL context will be inaccessible). When initializing, create a new empty OpenGL texture with glGenTextures, and pass it to ASurfaceTexture_attachToGLContext() to link a GLES texture to the SurfaceTexture.
Get an ANativeWindow object (the NDK version of a Surface), and pass it to MediaCodec as the output surface.
When onFrameAvailable is called, set a flag to indicate a new frame is available. In every frame in Unity’s render loop, a function should be run on the render thread before the texture is drawn to check if a new frame is available, and if so, calling ASurfaceTexture_updateTexImage.
Finally, calling CreateExternalTexture in Unity to create a Unity texture from the GLES texture.

The one other annoying bit, albeit slightly less undocumented, was the need to create a GLSL shader to draw the texture. I wrote all of our shaders in Unity’s ShaderLab language, which compiled down to GLSL or MSL and avoided some duplication of work, but this was not possible here. I found no way of defining a samplerExternalOES uniform in ShaderLab, so Android needed to have its own shaders that couldn’t be run/tested on Windows/macOS/iOS. Regardless, I felt like a kid in a candy store once I had gotten this all working.

Bonus Content!

While I don’t have as many cool stories to tell, I also did some fun work on UX and marketing for the app. I worked together with my good friend StealthShampoo to produce the official announcement for the app below, and the teaser trailer for Reddit at the top of this post which rapidly became the top post on nearly all VR subreddits. Seeing the attention it got was certainly a thrill.

I learned a lot from working on this project. Not just technical knowledge, but also the importance of doing user testing early on. While the project drew a lot of attention (and downloads), it ultimately was less successful than I’d hoped. My laser focus on shipping what I thought was important made me lose track of what makes a product successful. I thought I knew what needed to be done from having used the app myself, but my extensive knowledge of how the app worked led to me subconsciously avoiding the app’s issues during my own use. While developing a project in stealth mode was exciting, the balance of getting feedback against saving surprises for launch is something I’ll be more mindful about going forward.

LIV Mobile

Calibration

Porting to Android

Bonus Content!

LIV for Quest + LIV Camera for iOS