First look: Build HoloLens apps with Unity

Programming for Microsoft's holographic headset leverages familiar tools -- Visual Studio and the Unity 3D framework -- in very unfamiliar ways

Martin Heller

Microsoft's HoloLens is an intriguing device that changes the relationship between user and user interface by mixing the real and the virtual. But in the 100 days or so since its launch, not much has been said about how we'll build apps for it.

That's all changed. Last week, at Microsoft's Build 2015 event, I had the chance to build my own HoloLens application. While the hands-on labs that Microsoft was holding in an offsite hotel were heavily scripted, they were using real development tools and, we were told, "near-final" hardware. Certainly the device I used was very different from the prototype rig I tried out at the launch back in January. Instead of a computer hung around my neck and a tether to a PC, the latest HoloLens prototypes are stand-alone devices that let me walk around freely.

There's still a lot of secrecy around HoloLens. No electronics were allowed in the room that had been set up with development workstations. Photographs were out of the question. All you could take in was a notebook and a pen; everything else had to be locked up in a rack of locker. It was like going back to school.

My hands-on session was relatively short, only 90 minutes, and much of the development I did was limited to placing objects on a Unity stage, attaching pre-ritten C# scripts to those objects, and adjusting object properties. Even so, it was possible to get a feel for how more complex applications could be built and, more important, to see how Microsoft was approaching the problems of bringing 1:1 scale 3D to developers and designers.

For a basic interactive app, using a common 3D application development framework like Unity makes a lot of sense. It lets you create 3D objects, place them on a stage, and apply textures. Getting the balance between texture and the underlying shape right will be important in a mixed-reality model like HoloLens because users will be expecting to interact directly with objects. As a developer, you're going to have a lot less control than you might in a more traditional 3D model.

Microsoft has chosen to give HoloLens developers familiar tools. Visual Studio is at the heart of the process, as all HoloLens apps are Windows 10 apps. (A keynote demo showed Universal Windows apps being pinned to physical objects and following a user as they walked through a room.) Key HoloLens methods and objects are part of the Windows 10 SDK, so you can start writing code before the hardware ships. Remote debugging tools mean you can work with a tethered HoloLens, or simply load the software and go, much like using a Raspberry Pi running Windows IoT Core.

But Windows is at heart a 2D platform, and HoloLens is most definitely a 3D device. That's why Microsoft's Unity partnership makes sense. It's a familiar tool for building interactive 3D environments -- and not only for gaming. Unity has become key to developing interactive 3D visualizations for everything from oil exploration to financial services.

So what was it like building an app for HoloLens?

Like all development, HoloLens programming starts at a keyboard in front of a monitor. You don't wear a HoloLens headset to write code or even to test the code you've written. Working with Unity, you'll create a stage, add objects, then start to attach scripts to those objects.

If you've not used a tool like Unity before, it's more like working with Flash or Silverlight than building a traditional desktop application. Objects persist on a stage until they interact with each other or with user input. You're best off thinking about them as independent entities in an event-driven environment, where events come from other objects or from a user -- where a user can also be code running outside the stage.

Our hands-on lab session made use of Unity for application design and development, drawing on Visual Studio as a code editor for Unity C# scripts and as a tool for loading code onto HoloLens hardware.

Code from Unity was exported into Visual Studio as a Windows Store Universal app, then delivered to a HoloLens via Visual Studio's remote debugging tools -- much like working with Windows Phone. Despite the fact that it's a display device, HoloLens' Windows Holographic is essentially UI-less. Devices are connected to a host PC via USB and managed through a set of Web pages. Apps are loaded over the USB connection, started from Visual Studio, and stopped from the HoloLens Web UI.

One of the first details to note about HoloLens is that interaction with an app is limited; Microsoft currently provides three basic tools for developers to build into their apps. The first, gaze, is the most complex as it requires understanding how 3D objects relate to each other and the position of the user's head. It offers what Microsoft calls "the measure of intent." Gaze lets you select specific objects, while the other two options, an "air-click" gesture and voice recognition, handle direct interactions with the selected object.

I was initially skeptical about how much control that would give me, but the combination of gaze and a basic air-click is surprisingly effective. Microsoft has demonstrated it used in combination with other gestures, including an air-drag, but we didn't get access to them in the lab. Instead we were able to use gaze to direct a cursor onto an object, air-clicks to select and deselect, and head motions to move selected objects in 3D space.

One of the most important objects in your code is the user's head position, an instance of the HoloLens SDK's stereocamera object. Stereocamera is constantly updated, as the user moves, so you'll need to poll it regularly in your camera scripts. HoloLens is best considered a camera in the Unity stage, and you'll replace the default camera with your HoloLens code. From the head object you can extract head.position and head.forward, which are used to create a raycast that's then used to find the point of intersection with your 3D models.

You'll use Unity transforms to place a stage in relation to your HoloLens; and as it's a 1:1 mapping, you really are placing your stage two meters out and half a meter down. Getting that initial placement right is important. You don't want to destroy the mixed-reality illusion by dropping a user right into the middle of an object.

Using Unity scripts, I could then trigger actions on objects, defining a ring cursor that was attached to my gaze raycast. By handling intersections with objects on the stage I could add actions to objects that could be triggered by air-clicks or by voice commands. It's clear that voice is going to be a key interaction mode for HoloLens, and the SDK's recognition tools work well, able to recognize arbitrary phrases.

It's easy to imagine HoloLens apps ending up like something out of "Harry Potter," with keywords and phrases that you're unlikely to say in conversation with the people around you while you're wearing the headset and discussing an object you're exploring.

One of HoloLens' default tools is its free-space mapping. At the launch in January, I suggested jokingly that it was like wearing an inside-out Kinect, as it's able to produce a 3D scan of the environment you're looking at, then use it as a frame for your applications. Turning on the mapping mesh that HoloLens generates shows that that Kinect comparison was more accurate than I realized at the time; it's very similar to the tools used in the Kinect Windows SDK (which makes me wonder if features like the Kinect's machine-learning powered gesture definition tools will make their way to the HoloLens SDK). The mesh makes a useful debugging tool, as it shows how stages and objects will interact with the real world: where they can be placed, what hides them, and how a user can navigate around them.

Unity's scripting model makes it easy to attach code to objects. This includes applying physics models and using HoloLens' spatialized audio. Audio can be triggered by actions, or it can be a background source -- perhaps music or sounds related to an immersive environment.

There's a link between HoloLens apps and the underlying hardware that really isn't there with any other platform. Yes, phone and tablet apps are personal, but the experience of seeing code you've written floating there in front of you, as large as, well, life, is quite different from simply pressing F5 and watching your code build. It's even more immersive when you realize you can stand up and walk around your application, looking at it from all sides -- even from underneath!

The experience isn't perfect; especially the field of view. It's unclear whether the limited field of view was a function of the hardware, the code we were writing, or both. Even so, I found the process fascinating, and the results were surprisingly convincing when combined with spatialized audio effects and Unity's physics model. While Microsoft made it clear we were using "near final" hardware, it wasn't clear how much work would be needed to make it final or what constraints had been placed on the application we were building.

As first looks at hardware and software go, this one was very different. Microsoft is going a long way to point out that HoloLens is a platform, and like all platforms it needs software. That's why showing Build attendees how to write that software makes a lot of sense. They're HoloLens' initial audience, and without them and their code, the HoloLens won't be anything more than a niche product.

Copyright © 2015 IDG Communications, Inc.