Polidea Labs #3: Augmenting Reality with an iPhone
In the next episode of Polidea Labs series, we take on the subject of Augmented Reality. The article on Web AR is coming soon and if you want to read all about our VR experiment, you can find it here. This time, we are dealing with augmenting reality on iOS.
The dark ages of Apple shamelessly keeping quiet about AI topics are gone forever. Apple’s recent WWDC truly burst with news introducing frameworks that harness machine learning. These come in many flavors. At Polidea Labs, we took a quick look at the two of these tools: ARKit and Vision.
You might want to keep in mind that these are all betas. (We hope you love Apple’s betas just like we do
Before diving into augmented reality with aforementioned tools, there are several steps to follow.
- Install Xcode 9 Beta on your Mac and iOS 11 Beta on your device. By the way, do you know what that means…? Wireless builds, yay! Finally! Except not really. Both Vision and ARKit do a lot of heavy lifting and as a result, they eat up the battery like crazy. Therefore…
- Keep the charger close at hand.
- Be patient — Xcode 9 does its best but there tend to be lots of crashes.
- Forget the documentation — Apple engineers had better things to do than writing documentation.
After wading through all this, you’re ready to go.
ARKit provides a platform for developing AR (augmented reality) experience in iOS apps. This means adding 2D or 3D elements to the live view from a camera in an iPhone or iPad, in such a way that it feels like these elements inhabit the real world. ARKit integrates iOS device camera and motion features to create augmented reality experience. It also offers integration with SceneKit and SpriteKit, as well as more low-level control with Metal 2.
There are three main layers that ARKit can be broken up to:
- Tracking, which provides real-time information about device’s relative position in the physical environment. ARKit uses visual inertial odometry, which sounds somewhat like rocket science, right? But in simple words, it uses AVFoundation and CoreMotion under the hood to estimate the 3D position of the device relative to its starting position.
- Scene understanding, which offers features like plane detection, hit testing or light estimation. With these functionalities, ARKit empowers us to integrate any virtual content into the physical world.
- Rendering. It’s a nod to the SpriteKit and SceneKit developers especially — ARKit implements most of the rendering for you. The good news is that both Unity and Unreal are said to support all the ARKit features (which opens up some interesting opportunities for us after our recent PolideaLabs investigation.
All you need to do is tell the session of your
ARSceneView (showing up as a regular camera view) to run with a specified configuration. ARKit handles all the processing. You can create an
ARWorldTrackingSessionConfiguration which enables tracking the device’s movement with six degrees of freedom: the three rotation axes (roll, pitch, and yaw), and three translation axes (movement in x, y, and z).
ARSession object outputs snapshots (
ARFrame objects) that contain all the data concerning the state of the session. What you’ll do about that content is entirely up to you — imagination is the limit (or actually, the likely enigmatic crash of ARKit).
Special thanks to Mr Kipling for a delicious virtual cupcake
Vision framework introduced by Apple provides some powerful image analysis and computer vision techniques that let identify objects in images and video registered by the device’s camera. Vision is built up on CoreML framework (also a crisp new machine learning tool from Apple), which enables using self-trained models (or one of the ready-made, e.g. these offered by Apple) that come in Xcode friendly
How do we play with Vision? Basically, we create a request handler (
VNImageRequestHandler) which performs all the requests from a passed array. These request objects inherit from
VNImageBasedRequest class and involve tasks such as:
- face recognition
- horizon detection
- aligning the content of two images
- scene classification
- barcodes or text detection
- image tracking
While creating such a request, we pass a completion handler in which we can then handle some
VNObservation based objects that are produced while executing the request. What these observations contain depends on the type of request performed, choosing from the above.
The funny thing about Vision is that while it copes seamlessly with the sizing of the image (you don’t need to do any scaling), it’s completely helpless when it comes to determining the orientation of the image. It requires you to specify it explicitly, otherwise, all the image based requests will come unstuck.
So to sum up, Vision hands in some high-level, on-device solutions accessible via a simple API. To be honest, some of them — like barcode detection request — leave a lot to be desired. But others, like classification requests, rely entirely on the CoreML model used, so there’s more to play with creating — or choosing — the right one.
InceptionV3 model gives 88% confidence recognizing a lipstick.
If these are the first steps in making mobile augmented reality easy for developers, I think we are heading in the right direction. Certainly the forthcoming updates of ARKit and Vision frameworks will bring better performance, less battery consumption and more accuracy. We’re counting down the days until the official release in September.
Creating a good augmented reality experience requires a fairly large amount of domain knowledge. The designers will have a field day with this for sure. As from the developer’s point of view, thankfully the tools like ARKit or Vision take care of much of that fuss and sew it all up neatly, even though they’re still quite unpolished.
Have you ever dreamed about virtually designing your living room in real time, or depicting bedtime stories right on the floor in your kids’ bedroom? People around the globe already express incredible passion about the new toys from Apple. This fall AppStore will surely dazzle us with tons of fun and useful concepts.
Senior Software Engineer