Time to review the definition of AR...again
At our #ARSyd meetup recently I started the event by reviewing the current definition of Augmented Reality on wikipedia as usual. It's interesting to check back on this regularly to see how the common definition of AR is changing and evolving.
"Augmented reality (AR) is a term for a live direct or an indirect view of a physical, real-world environment whose elements are augmented by computer-generated sensory input, such as sound or graphics."http://en.wikipedia.org/wiki/Augmented_reality This time it was interesting to note that it has expanded beyond the last version that simply included "computer-generated imagery". So at least now we're starting to use other senses beyond vision. However, there's a few other attempts to define areas around, or related to Augmented Reality that I think are worth discussing. I'd like to do a little conceptual algebra in an attempt to simplify this down to a more useful definition that provides a broader and more useful scope for the AR UX discussion. First, lets look at the work of Steve Mann. He is one of the early pioneers in the field of wearable computing and has amassed a rich body of work based on action based research. He's spent years wearing and living with this type of technology. He has made some significant contributions to the discussion of the impacts of this new technology on privacy and is one of the founders of the glogging community. Glogging is short for cyborg-logging and is an extension of blogging into a more pervasive and realtime format. What I think is particularly relevant to this discussion is a venn diagram he created and posted on the Mediated Reality wikipedia page. In this diagram he positions AR as a superset of Virtual Reality and positions them both as a form of Mixed Reality. Then he defines the whole group as a subset of Mediated Reality.
Part of this analysis is obviously a reference to Azuma's definition and Milgram's Mixed Reality Continuum which is a great tool for communicating the differences between Reality, Augmented Reality and Virtual Reality to people new to these concepts.
However, all of this is really based on some very subtle and blurry differences between constantly evolving technologies. It's also strongly based on the realist assumption that "Reality" exists "out there". But I think there is quite a strong case to be made that Reality is actually generated inside our minds. All our experiences are processed through our senses so in that way all Reality is Mediated.
I know this is quite an abstract philosophical discussion, but I think it's directly relevant to how we define Augmented Reality. Before I get to that point let's look at the experience of one of the other early pioneers of wearable computing. Thad Starner has been wearing a computer continuously since 1993. He has a persistent monocular display and a one handed keyboard that lets him continuously and relatively silently interact with his personal information system. He can quickly access the time and date he first met you, the notes from the conversation and then share information related to that. His key focus from his experience of using a pervasive wearable display for such a long time is not just how it can Augment your view of the world around you...but how it can Augment how you think and the information resources available to you. This is a more "Things that make us smart" approach than a "Holodeck" approach to defining and using these technologies. And this brings us back to the philosophical discussion above. If we accept that all Reality happens in our mind and is by definition, Mediated by our senses...then surely any technology that extends our memory, provides additional information, clues, hints or expanded interaction is a form of Augmented Reality. Personally I think the "live direct or an indirect view of a physical, real-world environment whose elements are augmented by computer-generated sensory input, such as sound or graphics" type of Augmented Reality that people are now used to seeing on youtube is really just the start of a new type of User Interface. As computing devices continue to shrink and dissolve into the world around us this type of AR just becomes the natural User Experience for this new Internet of Things. But that's just the first step and I think if we limit ourselves to just overlaying 3D images using GPS and basic computer vision then we're really selling the potential of the AR industry very short indeed. What do you think the term Augmented Reality should really mean?
User Experience sessions at the Augmented Reality Event
The ARE conference team have released a sneak preview of the schedule and they've organised an outstanding lineup.
http://augmentedrealityevent.com/schedule/ On May 18th in the afternoon is a session dedicated to ARUX and I'm hoping that we can kick the use of the #ARUX tag up a level on twitter. If you're reading this then you should definitely follow the event and join in the discussion. So far it looks like Sally Applin is doing a presentation entitled "AR and Social and Sensors - Oh My!". I'm hoping this relates back to the Poly Social Reality work she's been developing.http://www.dfki.de/LAMDa/accepted/ACulturalPerspective.pdf Jon Cabiria's presentation entitled "AR and the Human Factor: Designing for the mind". His focus is on the social cognition perspective of AR and AR as a vehicle for change. Amber Case one of the co-founders of GeoLoqi is doing a presentation entitled "Non Visual AR". She describes herself as a "Cyborg Anthropologist" so I'm really looking forward to hear what she has to say. And Clark Dodsworth from Osage is doing a presentation entitled "Context is King". I couldn't agree more...I think the whole "Content is King" statement was merely a typo. The topic of his presentation is AI, Salience and the Constant Next Scenario. There's another ARUX session just before with Tish Shute from UgoTrade, Ivan Franco from YDreams, Brendan Scully from Metaio and Mike Kuniavsky from ThingM. And Paige Saez an artist from San Francisco is also presenting in the AR and Storytelling session. UX is one of the critical components that will define how Augmented Reality is adopted by the broader market. This looks to be a pivotal point for the ARUX community and I'm really looking forward to the discussion it generates. I'll be presenting too so I'll look forward to seeing you there or talking to you online.
An exploration of User Experience for Augmented Reality
Panagiotis Ritsos is an AR researcher with Synthetic Toys and a former member of the Vision and Synthetic Environments laboratory (VASE) of the University of Essex.
Introduction
Although Augmented Reality has been the subject of research and scientific literature for almost two decades, the public and media attention that the field has received the last couple of years, with the appearance of Handheld AR (HAR) applications and services was quite significant and unprecedented. The concept was brought out of the laboratories of universities and research institutions and into the smartphone user's hands. Indeed, many researchers, innovators and developers feel that HAR may be the backbone on which the field, overall, will attain 'household' popularity.
The first step in examining UX in AR context is to determine the factors that affect interaction with AR systems, taking into account the multifarious nature of AR, as the ergonomic requirements, for example, for a Military-type AR simulator are different from a handheld AR tour-guide. In an attempt to provide a starting point, we identified certain ‘core’ concept groupings that we feel can lay the groundwork and can be expanded upon a case-by-case basis. The reader must note that, in practice, most concepts intertwine and the underlying correlations must be, ultimately, taken also into account. Overall the collection of aspects discussed can form a theoretical foundation within which UX in AR can be examined further. Factors of UX in AR Use Cases
One recent contribution from the community working on AR standards is the introduction of “use cases”. Use cases describe an application or service by classifying it in three major categories — Guide, Create and Play — and must meet the criteria of augmented reality, as described by Azuma. Grouping UX in AR factors using use cases offers semantic structure to a field that has many different paradigms. It is also important to consider that each use case may have unique requirements and underlying dynamics, in terms of UX. Context Awareness
AR’s contextual and in situ nature demonstrates how the concept is intertwined with Context Awareness (CA). Extracting information about a user’s location, posture, intentions and environment has direct consequences to how convincingly AR systems ‘meld’ real and virtual. Context comprises of more than the user’s location, including environmental features, nearby resources and social conditions. Naturally the means of interpreting and detecting all this information as well as the method through which they become explicitly or implicitly available to the user has a direct impact on the resulting user experience.
UX Factors Map
Arguably, properly placing and registering synthetic information spatially and temporally is the greater challenge in AR, with direct consequences on UX. Humans have an extremely sensitive perceptual system, able to detect small anomalies and irregularities in the synthetic world, such as mis-registrations and delays when motion is involved. Naturally, the ideal solution for accurate registration and localisation is a positional error of less than 1 mm and angular error of less than 0.5 degrees with no drift. Nonetheless, since that is virtually unattainable with current technology, we tend to ‘accept’ lesser accuracy dependant on the application. Realistically, being able to negotiate virtual doorways may be an acceptable level in AR registration, however even that may cause disruptions and in severe cases cause nausea and dizziness, thus affecting negatively UX. Moreover, temporal positional stability and consistency —for example if a synthetic chair is in the ’same place’ when entering and exiting a room after some time —of the synthetic environment are equally important. Depending on the use case, registration —and therefore associated UX —requirements may be less stringent AR Input UX factors
Input can be separated in four major categories.The reader must take note that the categories described below — although presented separately — are often parts of a complementary registration mechanism. Visual Input: Visual input usually implies the use of a camera for tracking and context identification as well as motion detection. Examples can be marker-based (QR codes etc) or marker-less tracking. Applications range from simple placement of virtual objects on top of those targets to more complex environmental features detection and kinæsthetic tracking. With modalities like the aptly-named Kinect from Microsoft, a system can recognise movement and posture, having arguably high potential of utilisation in AR context. Ease of use, responsiveness and accuracy are factors that affect UX. Auditory: Sound can also be useful as input, both as voice commands as well as inferring the user’s context, say from ambient noise. The same UX factors that affect visual apply in this case, too, however, privacy and social comfort of using voice commands is more important in this case as voice commands may not be appropriate in some cases, say like in cinemas, or libraries. Tactile: By tactile we classify all interfaces that require contact (touch) with a surface, like keyboards, touch-screens, joysticks or mouse. Once again the aspects affecting user experience are similar as above with one added issue involving obtrusiveness for mobile, untethered systems. Touchscreens, chord and small-form keyboards have been used in the past but are often a hindrance and quite tiring to use after a while. Other Sensory Modalities: Researchers have investigated the use of various schemes in order to increase accuracy of secondary sensory modalities (accelerometers, compasses, gyros, active badges etc) that detect user context, most notably combining more than one (sensor fusion) into hybrid sensors. Active (sensor-emitter) tracking systems require powered-device installation and are often susceptible to interference, whereas inertial sensors, although completely passive, exhibit drift. Complex solutions employing computer vision-based modalities are often computationally demanding and in some cases unusable (occlusion). One additional factor affecting UX in this case, apart from the overall accuracy of the systems, is seamless switching between modalities when thatis needed, say like when indoors to/from outdoors location tracking switching is required. AR Output UX factors
Output, likewise, can be separated in three major categories. Visual: Visual output is probably the most important aspect of AR, present through the history of the field and in almost all incarnations and concepts. However, the information that may be visually presented ranges from simple annotations to complex 3D architectural and humanoid modelling -- and therefore having specific UX requirements. Depending on the use case and the required abstraction level, high fidelity and accurate representation of modelled entities may or may not be needed. Consequently, the presentation medium’s characteristics are quite important to UX in AR. For instance, Head Mounted Display (HMD) issues like narrow field-of-view, inadequate depth perception, low display brightness and poor ergonomics can hinder the sense of presence. Moreover, visual output encompasses the aforementioned problem of registration. Overall, things that should, primarily, be assessed from a UX perspective regarding visual output include:
- presentation media quality (field of view, brightness, contrast, depth perception, ease of use etc.)
- content quality (realism, abstraction, frame rates etc)
- synthetic world consistency and stability (registration, temporal and spatial stability).
Auditory: Auditory output is somewhat simpler to implement with reasonable quality as high fidelity, directional sound can be employed with small cost and with relatively non-obtrusive gear (headphones, etc). However, much like in the case of input, sound can be disruptive, having noteworthy safety and privacy implications.
Health and safety considerations are of paramount importance, almost in all AR scenarios. Unwise implementations of AR systems can be potentially disruptive, cause accidentsand in extreme cases have health implications. HMDs for instance have long been in the centre of attention from researchers for ocular and non-ocular symptoms of usage (nausea, dizziness etc). Naturally, any form of UX study, whether this involves assessment, or standardisation must take into account these aspects. Collaboration and feedback from healthcare and medical field experts would be immensely helpful in identifying potential issues and -- why not -- solutions and recommended practices. Integrity, Privacy and Security
AR currently remains a ’personal’ experience to great extent. Nonetheless, we can anticipate that AR will eventually evolve towards shared environments, much like users today tend to ‘meet’ in various shared spaces – like social networks, massive multiplayer online (MMO) games etc. – with important implications regarding interaction. The ultimate incarnation of the ’Play’ use case, described above is a shared synthetic environment, where participants can interact with each other and with the real/virtual environment. Nonetheless, shared environments have intrinsic integrity, security and privacy implications. Just as Vernon Vinge describes in Rainbows End, sharing or accessing someone’s ’view’ of things may or may not always be desirable. Likewise, hiding important information behind synthetic ‘cloaks’, or allowing ‘virtual’ access to protected resources are potential problems. Granted, this level of augmentation may appear as technologically distant, but such implications have been examined for quite some time in shared environments research and any complete study of UX in AR will, eventually, include these aspects, when fully-featured AR shared spaces appear. Sense of Immersion
All of the above concepts contribute to varying degrees to what we call “sense of immersion”, otherwise knows as “presence”. Many assessments, concerning both VR and AR environments try to quantify immersion. One could say that the sense of immersion is the integration of attitudes towards a system, evaluated by the user in terms of importance. For some people the poor quality of the fidelity of the synthetic world is restrictive while others find registration problems and spatially instability is more disruptive. In any case, we would not be very reluctant to say that, to a large extent, sense of immersion is the archetype of user experience in AR when it comes to usage of a system or service. Conclusion
User experience describes how a user feels about using a system encompassing feelings, motivation, satisfaction and overall attitude. In AR usage context, UX can be associated to the feeling of immersion, also denoted as sense of presence. However, a more complete view of UX also includes concepts like branding, marketing image, standards compliance support and overall quality of service. In theory, AR enthusiasts, developers and researchers can identify, through assessments and analysis, the underlying patterns and correlated factors that affect the user’s overall experience and refine the notion of UX in AR even further. AR is foremost a human-centered technology. It is a concept whose sole purpose is to enhance one’s — or a group’s — reality. A human-centered approach is of paramount importance to AR evolution and an excellent starting point to enhance the field’s technological and marketing reality. Note: This post is based on a paper, titled “Standards for Augmented Reality-A User Experience Perspective” and submitted on the Second International AR Standards Meeting, in February 2011. All referenced material can be found on the original document.
The 4 key User Experience modes of Augmented Reality
AR is already a very diverse field and it seems like there are new tools and examples springing up every day. However within all of this chaos and rapid change there are 4 key physical modes that can be used to classify these user experiences. There are some notable examples that don't fit into these 4 modes, however these are more exploratory and fringe examples that have not as yet been widely distributed or adopted. The modes outlined below are the dominant forms that seem to appear again and again throughout the AR landscape. While these modes closely relate to the differences between computer vision based AR and locative AR, we believe this location/vision based divide Is dissolving rapidly and the spatial relationships of these ARUX modes outlined below will remain. However we will revisit this post on a regular basis and look forward to your feedback and discussion so please add your comments below.
OverviewThese 4 modes are based on the physical relationship between the user, the camera, the camera's field of view (fov) and the visual/auditory display. These relationships are primarily based upon the visual and auditory sensory channels as these also dominate the current AR experiences with visual by far leading the way. NOTE: Experimentation across the other sensory channels is an interesting area that is definitely open for new UX research, especially in terms of accessibility. But of all the structural aspect of these modes, the relationship between the user and other people are the most important. It is this relationship that has been used to define the names for each of these modes. Public
The public ARUX mode is commonly presented within a retail, installation art, public kiosk or family lounge environment. The Kinect game platform is an example of the new breed of AR that's quickly moving into our lounges. The camera is often encased within a pedestal, tower, wall mounted box, set top box or point of sale display. This camera (or sometimes multiple cameras) have a wide field of view that the user can interact with, often expanding to the user's whole body. The AR image output can be presented using a projector, large flat screen display or tv. These displays are often on or up against a wall and tend to naturally create a sense of "experience space" in front of them. This structure or set of relationships creates what tends to feel like a public experience which often draws a crowd or in some cases even extra participants. It is often used to gain people's attention and create a buzz, or at home can create a sense of group participation - it is at all levels a shared experience. The public ARUX mode is dominated by computer vision based AR and because of it's installation format has a fairly limited distribution, however gaming platforms are quite likely to change this. Intimate
The intimate ARUX mode is the dominant form seen on youtube or vimeo and commonly involves a desktop or laptop PC with a webcam. This really exploded with the release of the Adobe Flash based FLARToolkit software which uses fiduciary markers and is based upon the seminal ARToolkit. The experiences themselves are usually used by a single individual sitting in front of the PC with one or two other spectators sitting next to the user or standing behind them. The camera is often the built-in device at the top of the laptop screen or an external device designed to sit on top of the monitor. Many examples also show users moving these external cameras around, however they are still usually tethered by a cable. It will be interesting to see how the spread of wireless cameras changes this spatial dynamic. The display is usually the built-in laptop screen or a desktop monitor so is commonly between 15 and 24 inches, which can limit the image quality and size. This obviously has an impact on the immersiveness of experience, but is an environment that most internet users are now very comfortable with. The structure of this experience is quite similar to web browsing, however it would probably benefit more from leaning closer to the PC gaming experience. It is often a single user experience and there are very few if any examples of experiences that create long or ongoing engagement. It is almost exclusively based upon fiduciary markers and computer vision software, however with the release of browser based applications using FLARToolkit or similar it has benefited from the massive distribution delivered by the web. The big challenge is that it also often requires that the user prints out a marker, image or already has a magazine or printed promotional material. While a large number of users on the internet are now aware of, or even comfortable with this ARUX mode the majority of them seem to have simply watched videos of these applications rather than actually printing out the markers and using them themselves. Personal
The personal ARUX mode has spread rapidly along with the recent rapid adoption of camera, compass and GPS enabled smartphones. According to some market assessments the number of smartphones have now surpassed the number of PCs and look set to becoming the most widely distributed internet access method. The modern smartphone is a very personal device and the small screen format makes for a very personal media experience. Smartphones now come with both forward and rearward facing cameras. By far the most broadly used experience is using the rearward facing camera in Mobile AR browsers like Layar, Junaio and Wikitude. Tablets like the iPad2 now come with 2 cameras and are likely to impact this mode heavily. All of the major Mobile AR browsers support locative based AR utilising the GPS and orientation/compass sensors in the phone. Junaio Glue is also notable in it's use of computer vision based Natural Feature Tracking that uses the camera to recognise relatively natural images. Some applications are now making good use of the front facing camera to augment the view of your face or even hands for gestural input. Smartphones currently provide a "keyhole" AR experience with limited image quality, however the iPhone4 retina display has pushed the image quality to a whole new level and this trend will only continue. The rapidly growing new segment of tablet based computers are also helping to stretch this "keyhole" into a "window" and may soon even justify adding a new ARUX mode of their own. The structure of this experience creates a very personal and pervasive feeling. It is always in your pocket and with 3G and wifi constantly available it provides an "always on" platform. However, the need to constantly hold the phone and now tablet up can be quite tiring for some users. It also can create quite a sense of social stigma in public places or large crowds. With Layar now being used by millions of users and Junaio, Wikitude and the wide range of other developing Mobile AR apps the Personal ARUX mode may currently be the most broadly adopted of all the modes. The combination of rapidly improving built-in sensors, faster mobile broadband speeds, bigger better quality screens (including the new Tablets) and lower priced devices mean that this trend is likely to continue for some time. Private
The private ARUX mode is based upon the newly developing wearable or head mounted displays from companies such as Vuzix. While these devices are now at the "early adopter consumer" level price point, there are still a number of building blocks and integration steps required to really achieve this type of experience. This largely limits it to R&D labs and academic institutions. However, it is possible that this could change quite quickly over the next 1-2 years. Combining this with smartphones is likely to accelerate the rate of adoption once the price is reduced to a suitable level. The camera or cameras are usually mounted on the head as part of, or attached to the wearable display. There are a wide range of issues in terms of smoothly capturing images, encoding them and sending them to the smartphone or PC, combining them with the augmented information and then returning them to the display without creating a significant lag. The displays currently come in two main formats. Either "video see through" where the images from the camera are presented on small displays in front of the users eyes. Or "optical see through" where the user looks directly at the real world environment and images are presented upon transparent/translucent lenses. Video see through devices currently have issues in terms of field of view, resolution, energy requirements and refresh rate. Optical see through devices face challenges with occlusion (e.g. completely blocking elements from the real world) and image registration. However, both fields are making rapid advances. On top of this there are a range of other technologies that are currently further out including contact lense based displays and retina projectors. All of these devices are usually used in conjunction with "6 degrees of freedom" or orientation tracking sensors. The structure of this experience has the potential to be the most immersive of all the ARUX modes, however it is currently the most invasive and often involves many cables and attachments. This two is rapidly changing and improving. Based on the cost, technical skills required to integrate all the elements and the physical size and nature of these wearable displays this is currently the least distributed of all the ARUX modes. However, it is likely to be the one that most fully integrates locate AR and computer vision based AR into a truly engaging and immersive experience. It's also certainly the most fun to experiment with.
Summary
This analysis is designed to be a foundation resource for our community and we hope it will stimulate some interesting and ongoing discussion. Based on that discussion we'll review the structure and content as well as updating it as the market evolves. So let us know what you think by sharing your perspective in the comments below.
Welcome to AR-UX.com
Augmented Reality has been developing for around 30 years and over the last 1-2 years it really seems to have captured people's imaginations. YouTube and Vimeo are overflowing with rough and ready "new tech demos" and highly polished/post-produced "future of AR" videos. AR is now soaking in to the mainstream psyche.But in terms of User Experience research AR is still pretty close to the wild west. This is as much a challenge as it is an opportunity.
This site aims to build a rich and ongoing discussion among the people who want to explore, define and refine the User Experiences created by Augmented Reality.
Join the discussion, add your comments here, submit your ideas for posts and tweet out interesting links using the #arux hashtag.


