The simplest mixed reality capture app ever, and the first for Android and iOS (Professional Work)
The first all-in-one easy-to-use mixed reality capture solution requiring only a Quest headset and a phone. The first (and only) mixed reality capture solution for Android.
I championed and started the project, and worked on every aspect of its development, including being its sole maintainer after the team moved on to other projects.
LIV Mobile (aka LIV.tv) has been one of my proudest career achievements yet. When I started tinkering with mixed reality capture tech back in 2016, one of my first experiments was taping a Vive controller to the back of an old iPad, and using it as a “magic window” into VR. The person holding it could walk around the room and get a better idea of what’s happening in the virtual world, making VR a much less isolating experience. The Quest, ARKit, and real-time AI background removal didn’t exist at the time, but four years later, the tech was finally in place to realize my dream, and I got to build it at LIV.
In many ways, the LIV Mobile was a continuation and combination of the LIV for Quest and LIV Camera iOS app that came before it. I wrote the first prototype on top of the LIV Camera app’s codebase in Swift, Objective-C, C++, and Metal, and utilized its existing AI background removal functionality from ARKit.
Once we knew our vision was possible, it was time to build the real thing. After much deliberation, we settled on building the app in Unity. This had several advantages:
We took my Quest video streaming and H.264 decoding code out of the previous prototype, and not long after, we had a Unity-based version up and running on iOS, which evolved into the publicly-released LIV Mobile app. Later, I created a new network video streaming protocol, and wrote an H.264 encoder with Apple’s VideoToolbox API, to send composited video back to the headset for an in-VR viewfinder feature.
In this screenshot:
There’s one important bit I haven’t mentioned in making the app work: aligning the coordinate space of the Quest with the coordinate space of the phone. While ARKit does a somewhat-decent job of tracking the phone’s position and orientation in space, and the Quest headset does a very good job of tracking itself and its controllers, it doesn’t know where the phone is relative to itself.
Making the VR system aware of the location of the camera is absolutely crucial for mixed reality capture, since the virtual camera needs to be positioned relative to the headset and controllers in the virtual world the exact same way it’s positioned relative to the headset and controllers in the physical world. If this is inaccurate, objects you’re holding in VR won’t look like they’re actually in your hands.
I came up with a simple-but-novel approach to solving this problem, which took advantage of several unique things when using a Quest headset and a phone together:
This meant that by pressing the controller against the screen, we could determine the location of the screen in the headset’s coordinate space. With that information, determining the location of the camera in the headset’s coordinate space was trivial. We could then show a diagram on the screen to tell the user where to put their controller. I implemented it with Unity’s math library, and you can see it all in action in the video at the top!
It’s also worth noting that this was the first ever mixed reality capture app that allowed for the camera to be moved around, instead of just locked in to one location in space, without using a dedicated tracker from the VR system. Typical MRC setups often determine the camera pose permanently by using the pose of the controller during calibration. During the calibration click action, I saved a 4x4 matrix representing the controller ring’s pose, and calculated (and saved) the inverse of the 4x4 matrix representing the phone’s pose at that point in time. The phone’s saved inverse matrix could then be used to transform the phone’s current pose into its pose relative to where the controller was during the click, and the saved controller matrix could then be used to transform that pose into one in the headset’s coordinate space. I also zeroed out the pitch and roll components of the transformation matrices–otherwise, accurate calibration could only be achieved if the controller was both touching and rotated a specific way relative to the image on screen, which would have slowed down users, increased failure points, and required more complex setup guidance.
Android presented a new set of challenges though. First, we had to come up with a code architecture that allowed us reliably share code across platforms. For our C# code in Unity this was obviously straightforward, but we had to come up with a way to share our C++ networking codebase, while separating out our platform-specific video decode/encode codebases. I chose CMake as our build system, since it simplified the process of building the native library in different ways for different platforms (I even got the app running on macOS and Windows later on to simplify the development workflow).
The real issue with Android however, was the lack of background removal functionality in ARCore–something crucial to making the app work. While I would have loved to develop our own AI model, it was too time-consuming of a project for us to engage in, so I began a search for companies licensing similar technology we could use. I was integral to forming our partnership with Segmentive, an amazing team that built us some excellent-quality, high-performance models we could use on any platform.
Getting an H.264 feed out of Android’s MediaCodec decoding API and rendering in Unity also required some real voodoo magic, that I spent quite some time tearing my hair out over (and this might be the only place on the internet it’s documented…):
SurfaceTexture
needed to be created. This object ties together an OpenGL texture and an Android Surface
, allowing us to set the decoder, which outputs to a Surface
, to give us something we could render in Unity (or any other OpenGL pipeline).SurfaceTexture
’s onFrameAvailable
callback. This part was particularly annoying, since up until this point, our codebase had zero Java code in it–only C++, and C# (we used the Android NDK APIs for MediaCodec).ASurfaceTexture*
object from the Java SurfaceTexture
, so we could work with it in our NDK/C++ code.IssuePluginEvent
–if it’s called with a normal function call, it won’t run on the render thread, and Unity’s OpenGL context will be inaccessible). When initializing, create a new empty OpenGL texture with glGenTextures
, and pass it to ASurfaceTexture_attachToGLContext()
to link a GLES texture to the SurfaceTexture
.ANativeWindow
object (the NDK version of a Surface
), and pass it to MediaCodec as the output surface.onFrameAvailable
is called, set a flag to indicate a new frame is available. In every frame in Unity’s render loop, a function should be run on the render thread before the texture is drawn to check if a new frame is available, and if so, calling ASurfaceTexture_updateTexImage
.CreateExternalTexture
in Unity to create a Unity texture from the GLES texture.The one other annoying bit, albeit slightly less undocumented, was the need to create a GLSL shader to draw the texture. I wrote all of our shaders in Unity’s ShaderLab language, which compiled down to GLSL or MSL and avoided some duplication of work, but this was not possible here. I found no way of defining a samplerExternalOES
uniform in ShaderLab, so Android needed to have its own shaders that couldn’t be run/tested on Windows/macOS/iOS. Regardless, I felt like a kid in a candy store once I had gotten this all working.
While I don’t have as many cool stories to tell, I also did some fun work on UX and marketing for the app. I worked together with my good friend StealthShampoo to produce the official announcement for the app below, and the teaser trailer for Reddit at the top of this post which rapidly became the top post on nearly all VR subreddits. Seeing the attention it got was certainly a thrill.
I learned a lot from working on this project. Not just technical knowledge, but also the importance of doing user testing early on. While the project drew a lot of attention (and downloads), it ultimately was less successful than I’d hoped. My laser focus on shipping what I thought was important made me lose track of what makes a product successful. I thought I knew what needed to be done from having used the app myself, but my extensive knowledge of how the app worked led to me subconsciously avoiding the app’s issues during my own use. While developing a project in stealth mode was exciting, the balance of getting feedback against saving surprises for launch is something I’ll be more mindful about going forward.