Of all cultural heritage subjects, buildings present the difficulty of their size, making it less evi- dent to confront the real site to any virtual representation of it. The obvious solution to this is the installation of on-site devices, or the use of mobile devices.
On-site augmentedreality applications dedicated to cultural heritage have appeared more than ten years ago. A well known application which pioneered this research field is the Archeoguide project [Vas- silios et al. 2001] (figure 1), which aimed at giving visitors a view of what the famous building of Ancient Greece looked like, on-site. The user, who had to wear a see-through head-mounted display (HMD), was geo-localized and 3D models of the buildings were displayed on them. Computation was taken care of by a laptop the user had to carry in a backpack. Later on, Zoellner et al. [Zoellner et al. 2007] used an ultra mobile PC (UMPC) to handle the computation and display in the iTACI- TUS project. Tracking was done through GPS localization and improved by video tracking. The goal of this project was to overlay old prints related to what was capture in the live video feed.
Conventional OpenGL projection of textured 3D meshed models is performed to get augmented stereo- scopic images. The rendering is done using the Ocu- lus DK2 Software Development Kit which generates suitable camera projections for the two eyes. The SDK also provides a shader program to process the rendered images in order to take into account the chromatic aberrations caused by the low-cost optical system of the device. The synthetic images gener- ated by the DIBR module are used as background while the synthetic depth maps serve to initialize the Z-buffer. As the OpenGL rendering cameras rely on the same geometric models as those of the DIBR, the 3D objects are easily rendered and integrated inside the real environment. The Z-buffer test determines for each rendered pixel whether it is occluded or not by the real scene.
Finally, we attempt to fit VARI 3 into Stoev and Schmalstieg’s taxonomy . The output window clearly falls into case O3, be- cause it is a mobile handheld display. As mentioned above, the VARI 3 tablet is similar to the handheld pad in the PIP system in terms of interaction possibilities, but not in terms of perception, due to the use of two distinguishable display devices. Recall that the PIP system incorporated a dumb tablet allowing for position to be tracked, but the associated graphics were displayed on a see- through head-mounted display, the same device used to display all other virtual objects in the scene. The secondary environment (that seen through the tablet) in VARI 3 can be classified into different V states depending on the current mode of operation. In normal see-through operation, the secondary environment can be classified as Case V1 because the two environments are fixed with respect to one another. It is also conceivable that systems like VARI 3 may allow for unlinking the coordinate systems for other reasons, such as a magnification or perspective-taking function. When using the displacement function, a teleport destination is chosen from a top- down schematic view. This map view is fixed regardless of the user’s viewpoint or the tablet position and orientation. Viewing the tablet from an angle does not change the displayed schematic. This means that the displacement function is a special case of cases V2 and V3, in which the tablet output window displays a 2D projec- tion of the secondary environment. Because this projection is at the same depth as the tablet display, it is not possible to view different parts of the secondary environment by viewing the tablet from an angle.
Regarding the eye-tracking technology, Feit et al.  studied its limitations to understand its functionality extend and usage. They found that the accuracy and precision varied not only between users (especially if they wore glasses or contact lenses), but also because of different lighting conditions. First, they observed that eye-tracking tends to have large variability in accuracy (which, in our study, justiﬁes the use of depth cues to improve eye tracking precision); and, they found that in environments with greater luminance, eye tracking measures were more accurate. Finally, regarding the eye-tracking device precision of measurement, Antonya  studied the relation between the accuracy of the measure and the distance between the observer and the stared object - but all the studied distances were into the personal space. The targeted objects were projected images, thus the eye-tracking device used was similar to an augmentedrealityhead-mounted display (as the user is able to see both the real world and virtual objects). The results indicated that the farther the objects were, the higher the relative error for the depth measure was.
This demonstration addresses a wide spectrum of the technological chal-
lenges a project faces including audio- visual scene analysis to understand the user’s context, the collection, creation, fusion and delivery of AR content, 3D audio rendering, mobile human- machine interactions and finally the provision of a high-accuracy localiza- tion system. To this end, the VENTURI project consortium brings together dif- ferent ERCIM members such as Inria, Fraunhofer, FBK, mobile device manu- facturers SONY and ST Microeletronic and software companies like Metaio GmbH and EDIAM Sistemas. The common goal of all these partners is to design a hardware and software plat- form dedicated to such applications.
AugmentedReality assistance for R&D assembly in Aeronautics
Martin PRUVOST Pierre MIALOCQ Fakhreddine ABABSA Arts et Métiers Paristech, France Safran Helicopter Engines, France Arts et Métiers Paristech, France email@example.com pierre.mialocq@saf-
In this work we explore the possibilities of reduced order modeling for augmentedreality applications. We consider parametric reduced order models based upon separate (affine) p arametric d ependence s o a s t o s peedup t he a ssociated d ata assimilation problems, which involve in a natural manner the minimization of a distance functional. The employ of reduced order methods allows for an important reduction in computational cost, thus allowing to comply with the stringent real time constraints of video streams, i.e., around 30 Hz. Examples are included that show the potential of the proposed technique in different situations.
Fig. 27. Pose from a single, closed object contour (MSER)  (see video ), 
5.4 Chosing the "best" matching techniques.
It is difficult to state that one approach is better than the other. This is usually a trade-off between stability, number of extracted keypoints, recall, percentage of outliers, computational cost, etc. It has to be noted that most of these low-level matching methods are proposed in OpenCV or VLFeat [ 142 ]. This is the case for SIFT, SURF, FAST, BRIEF, ORB, MSER, etc. It is then easy to test each methods in a specific context and chose the most efficient one. SIFT, which is patented in the US, have proved for year [ 72 ] to be very efficient and a good choice (although quite heavy to compute). From a practical point of view, it seems that FAST is often used in augmentedreality libraries; it is for example used in Vuforia from Qualcomm or Zappar .
a virtual object in a video stream will imply to be able to generate synthetic images —compliant with the laws of physics— at those 30 fps so as to make it realistic.
In this work we aim at producing physically-based synthetic images at such rates. This could have a vast range of possible applications, from moni- toring and decision making in the framework of industry 4.0, to assistance for laparoscopic surgery, to name but a few . For instance, augmentedreality could allow us to project in diﬀerent formats (such as tablets or smartphones, glasses, ...) hidden information within a manufacturing process (stresses, de- fects, ...). In laparoscopic surgery, on the other hand, critical information such as the precise location of blood vessels is often hidden for the surgeon . In this work we are considering the possibility of enriching existing ob- jects with hidden information as well as incorporating synthetic objects to the scene so as to help in the industrial design process, for instance. Project- ing all this information on top of video streams could be of utmost interest for these and many other applications. Very few applications consider, on the other side of the virtuality continuum, the possibility of manipulating reality so as to make it appear as modiﬁed by virtual objects. The sole exceptions seems to be , where each video frame is considered as a two-dimensional elastic continuum that can be deformed according to the presence of a virtual object. Reality appears thus modiﬁed by the presence of virtual objects if wee it solely through the video stream.
A practical implementation of a SLAM solution was tested for augmentedreality: visual position measurements are issued from an adaptation of PTAM , where the missing scale factor is provided by the depth component of the processed RGBD data. Velocity measurements are obtained from our novel geometrical flow estimation method, and combined with these visual position measurements and inertial data to reconstruct a continuous and smooth trajectory that meet the augmentedreality requirements. Cartography of the environment is still subject to long-time drifts or distortions when loop closure occurs, which is one of the biggest limitations of the actual system. This might be fixed by the addition of better features for detection and description: for example, the ORB detector and descriptor [ Rublee et al., 2011 ] is comparable to SIFT in terms of accuracy, but runs about 100 times faster. The pose prediction of the tracking module is highly improved by our trajectory estimation, which means that the following pose estimation should not suffer from the reduce in the number of "low-quality" features and the addition of "good-quality" ones. In addition, better descriptors could be used in the relocalization module instead of the actual FSBI , and could be of great help for loop-closure detection [ Angeli et al., 2008 ]. After loop-closure, the map should be globally corrected, by a distribution of error in the loop either uniformly or weighted by the path length, or by more sophisticated relaxation methods [ Sprickerhof et al., 2011 ]. If the map can be efficiently corrected by loop-closure detection, the algorithm could benefit of two advantages. First, in a situation where pose estimation is submitted to large drifts, the local nature of pose optimization in the tracking module of PTAM may lead to the duplication of the environment in the cartography, and consequently to oscillations between two positions in the camera pose estimation, one for each representative of the cartography. On the contrary, if loop-closure is detected, this harmful effect would be avoided. Secondly, drifts of the cartography yield drifts in the visual attitude measurements: even if the cartography is initially aligned with the inertial reference frame, the visual measurements used in the attitude observer ( 9.2 ) eventually lead to erroneous attitude estimation. This has a direct consequence in the position observer ( 9.5 ), where the gravity acceleration is expressed in the camera frame thanks to this attitude estimate, and makes impossible the sole use of accelerometers when neither velocity nor position measurements are available.
as strong as ever. This paper discusses how imagery and AugmentedReality (AR) techniques can be of great help not only when discovering a new urban environment but also when observ- ing the evolution of the natural environment. The study is applied on Smartphone which is currently our most familiar device. Smart phone is utilized in our daily lives because it is low weight, ease of communications, and other valuable applications. In this chapter, we discuss technical issues of augmentedreality especially with building recognition. Our building recog- nition method is based on an efficient hybrid approach, which combines the potentials of Speeded Up Robust Features (SURF) features points and lines. Our method relies on Approxi- mate Nearest Neighbors Search approach (ANNS). Although ANNS approaches are high speed, they are less accurate than linear algorithms. To assure an optimal trade-off between speed and accuracy, the proposed method performs a filtering step on the top of the ANNS. Finally, our method calls Hausdorff measure  with line models.
2.2. Natural Environment 2.2.1. Related work
Environmental changes are a subject of interest for researchers, managers and the public at large. Changes in the environment affect people’s lives and therefore are a subject of great concern. J. Danado et al  present a mobile system enabling the quantity of water and the pollution levels in artificial lakes and natural rivers to be visualised. The system proposed is based on a client server architecture and consists of two modules: an augmentedreality module and a geo-referencing module. The geographical information system (GIS) was used in addition to augmentedreality , to modellise natural landscapes and shows their evolution over time. The case study presented in  illustrates the propagation of weeds.
ity on a PC  and on mobile devices , companies such as Metaio GMBH, 13th Lab or Qualcomm provide industrial and cost effective frameworks relying on computer vision al- gorithms. After a dedicated initialization protocol, they pro- pose a way for the user to automatically reconstruct and track the environment and define a plane where augmented objects can be displayed. Nevertheless, such approaches lack of ab- solute localization and are computationally expensive in large environments.
linked contexts, but where the task’s objects lie in the world of computing, states that
the systems considered aim to make interaction more realistic.
All the definitions proposed in literature leave little room for multimodality. How- ever, augmentedreality nowadays not only exceeded the stage of repositioning virtual indices in a video flow but also proposes sound and even tactile augmentations. To take into account the multimodal aspect of real world, we also propose a new defini- tion of augmentedreality: Augmentedreality is the superposition of sensory data
B. State of the art
Over the last few years, the boom in augmentedreality in industry has especially given rise to projects devoted to automatic task assistance. In particular, the prototype KARMA  can be cited as being at the origin of such a concept as early as 1993. Then it was a matter of letting oneself be guided by the system in order to carry out repair work to printers. Other, more ambitious, projects later followed such as ARVIKA  whose purpose was to introduce AR in the life cycle of industrial product, Starmate  to assist an operator during maintenance tasks on complex mechanical systems, and more recently ARMA  which aims to implement an AR mobile system in an industrial setting. More recently, Platonov  has offered a more developed system that belongs to a new generation of assembly-dismantling systems for maintenance based on the use of markerless RA. Using a HeadMounted Display (HMD) equipped with a camera, the operator is guided, step by step, through the assembly procedure thanks to the virtual information that is superimposed onto the image (Fig. 1). KUKA may also be quoted as an example of programming training of their robot by enhancing the view of people with different information systems and the simulation of trajectories of the tool .
A promising alternative is to mix real objects (e.g., physical prototypes, tools, machines, etc.) with virtual objects to create a mixed reality interface. This mixed prototyping (MP) concept is a powerful potential methodology for assembly evaluation and product development in the next manufacturing generation. The underlying technology is called AugmentedReality (AR)  and has the goal of enhancing a person’s perception of the surrounding world rather than replacing it with an artificial one. In an AR interface, it would be possible to realize the concept of mixed prototyping, where a part of the design is available as physical prototypes and a part of the design exists only in the virtual form. With such an interface, it would be possible to combine some of the benefits of both physical and virtual prototyping.
- STUDIERSTUBE: (University of Vienna, 1997) is one of the first architectures dedicated to augmentedreality applications. It allows to explore the use of 3D interaction methods in a work space where many tasks are carried out simultaneously. This project was developed in order to find a powerful metaphor for 3D interaction similar to classic PC metaphor. - Tinmith: (University of South Australia, 1998) is a library of hierarchical objects similar to UNIX system file. It includes a set of classes that manage the data flow from various sensors, filtering and rendering process. Classes are developed in C++ language and use systems Callback and serialization of data streams using XML technology. A dataflow graph allows to manage communications between different objects.
In mobile outdoor augmentedreality applications, accurate local- ization is critical to register virtual augmentations over a real scene. Vision-based approaches provide accurate localization estimates but are still too sensitive to outdoor conditions (brightness changes, occlusions, etc.). This drawback can be overcome by adding other types of sensors. In this work, we combine a GPS and an iner- tial sensor with a camera to provide accurate localization. We will present the calibration process and we will discuss how to quantify the 3D localization accuracy. Experimental results on real data are presented.
Many existing systems target peripheral vision as a gateway for information delivery to the user. Early successful implementations of peripheral cues in high-load tasks include the peripheral vision horizon display, first proposed by NASA in the 1960s  and later formalized to deliver orientation information to aircraft pilots . Current applications have evolved to respond to the deluge of digital information available to human sensory modalities, in search of new ways to optimize presentation. Peripheral head-mounted displays have evolved to deliver information without obstructing the central field of view, though they traditionally require gaze deviation to convey complex concepts  Flicker, both highly perceptible and easily implemented via low-power LED arrays, has been integrated into several glasses-based designs for near-eye peripheral notification –. While a robust method in varied environments, flicker is fundamentally limited in the complexity of information it can convey. Beyond notification tasks, glasses-based peripheral augmentation has also been implemented for navigation or situational awareness for pedestrians , , or cyclists .
In this paper we describe the implementation of an optical corner tracker for an augmentedreality system that is precise, fast, and robust, and which can be implemented using a standard, consumer-level camera and PC. In Section 2 planar homographies are first reviewed since they form the mathematical core of the 2D tracker. Section 3 then provides a description of a fast and reliable region detector that allows the system to self-identify predetermined planar patterns consisting of black and white corners. Section 3.2 then proposes an accurate corner tracker which uses a robustly computed homography to predict corner positions that are then refined using localized search windows. Experimental results are then presented in Section 4, which show the tracker’s stability and robustness to occlusion, scale, orientation, and lighting changes. Additionally, a comparison between corner tracking and commonly used blob tracking techniques is made.