Our team is exploring what lies at the intersection of Computer Vision and Augmented Reality

Our project explored using face detection in augmented reality to extract their 3D position. Although the idea is simple, this proof of concept shows that the hardware in augmented reality headsets is ready to be utilized by developers to design and implement solutions to increase productivity in many industries such as construction, aerospace, and medicine just to name a few.

We were lucky enough to source the Magic Leap 2 augmented reality headset which is the most cutting edge on the current market. The headset comes with a plethora of sensors (see image below) that are accessible through the Magic Leap API. For our use case, we utilized the CV Camera, Depth Camera, World Camera, and IMU. The Magic Leap 2 is specifically designed for industry applications which sets it apart from other consumer grade headsets and consequently afforded us with comprehensive control of the device.

Magic Leap offers development for their devices in Unity, Unreal Engine, and Native. We chose to use Unity as it seemed to have more support from the Magic Leap developers and has a lot of support for third party packages. Prior to this project, none of us had experience with Unity so much of the preliminary research went into Unity development. After getting a grip of the engine, we then setup our devices to develop for the Magic Leap 2 using the Magic Leap Hub. Magic Leap provides a few Unity example projects that showcase how to use a few of their APIs for sensors such as cameras, hand tracking, and voice intents just to name a few. Our team referred to these example projects and the Magic Leap documentation to create barebones projects that accessed one sensor on the Magic Leap 2 in a clean and understandable manner without any boilerplate. After gaining an understanding of how to properly communicate with sensors on the Magic Leap 2, we then integrated OpenCV for Unity into our projects.

OpenCV is a powerful computer vision library that provides many functions to perform CV related tasks quickly and efficiently which is why we used it in this project. However, due to the nature of the Magic Leap API, there are many modifications we had to make in order to utilize OpenCV's functionality effectively. The first challenge was the format of the data coming from the RGB and Depth Camera. The RGB camera data that we're interested in is the frame data, camera intrinsics, and distortion coefficients. The frame data comes in the form of a byte array in the RGBA format. We wrote code to convert this data into a format that OpenCV's function could understand. The camera intrinsics and distortion coefficients were a bit easier to deal with, but still required a conversion into a OpenCV typed matrix. The Depth Camera's frame data is a byte array of 32 bit floats which took a while to figure out since the documentation isn't very clear about the data type. Converting the depth and RGB data to the OpenCV format was extremely challenging due to the enormous amount of image pixel data types available in the library. However, after bridging the Magic Leap data to OpenCV, we could then run useful algorithms such as Face Detection to get the pixels of a persons head. The next step is to then sample that head pixel coordinate in the depth map to give us a 3D position of where the detected head is. The problem with this is that the RGB and Depth data come from two different cameras with different positions, orientations, intrinsics, distortion coefficients, and resolutions. We used OpenCV to align the two cameras, however it was way too slow for our target of 300ms update time so we opted in for a more primitive approach. Instead, we created a testing software to manually adjust the two camera frames until they visually aligned with each other, allowing us to map from the RGB image to the Depth image extremely quickly with a simple Lerp function and a few multiplications. This method has the issue of not taking into account the camera intrinsics and distortion coefficients which can change over time due to environmental factors like temperature, but for our case this worked perfectly.

A large part of the development process was system architecture design as we wanted our code to be modular and partitioned based on function. Our design stack was separated into four parts: Hardware, Computer Vision related functions, UI, and a general Manager to handle passing CV data between each part. The hardware classes include functionality to interface with the sensors on the Magic Leap 2. They are responsible for ensuring permissions are granted, start and stop the device, and collect data in a thread safe manner to prevent blocking of other resources. The Computer Vision classes handle functionality such as face detection, hand tracking, and image transformations such as to undistort images and extrapolate RGB and depth data into 3D space. There are multiple User Interface classes that we developed to allow the user to interact with the software using their hands. Since Magic Leap doesn't have a builtin classes for this type of UI interaction (they mainly use the hand held controller), we had to develop everything from scratch which required many iterations to get the "feeling" just right. UI Design choices included: debouncing times, disabled after pressing times, size and shape of UI bounding box, and visual design of the buttons. We mainly used UI for debugging purposes so we could change values while in simulation, but the code is there readily available to incorporate into any project. The last class, the CV Manager acts as the main loop to request data from the sensors and request the processing of it from the other CV classes.

Ethics

Per the CVPR guidelines, we have taken into account ethical concerns that could arise from the further development of the technology presented here. From the very beginning, we knew that AR combined with CV has the capability to do harm (our logo at the top of this website was designed to reflect that possibility) but realized that the benefits are too great to pass up on. We will first discuss what we did to prevent misuse of our code and then go into more detail behind what other developers need to take into account.

The Magic Leap 2's power comes from the sheer amount of data a developer has access to from the many sensors on board the device. This data, if leaked or mishandled can be extremely dangerous to the user. Data includes: visual depth and RGB frames from your environment, eye positions, sounds from your environment, and exact position. Fortunately, Magic Leap provides functionality to developers to ensure that certain permissions are set by the user before accessing data from any sensor. We included these checks in our code before accessing RGB and depth data.

Even if the user allows our app to collect its data, its also our responsibility as developers to make sure the data doesn't get stored where it shouldn't be. As a team, we agreed to not use any services that would connect the Magic Leap headset to any external networks to eliminate the possibility of data from delocalizing itself from the device.

During our development we realized just how many possibilities stem from this simple project. There's potential for medical imaging, architectural visualizers, and manufacturing guidance systems. However, there's also potential for more harmful technologies related to the military. AR technology is already used in fighter pilot helmets to help visualize targets, now this technology is ready to be used by numerous infantry men. Our simple project is able to derive a 3D position relative to the user, which could be used to initiate automated strikes against detected targets. This is the main issue with AR and CV in general, there are so many avenues to take and we must take care on which path we go down as developers.