OpenXR [4] is an API that is developed by the Khronos Group for developing XR applications that address a wide range of XR devices. XR refers to a mix of real and virtual world environments that are generated by computers through interactions by humans. XR includes technologies such as virtual reality (VR), augmented reality (AR) and mixed reality (MR). OpenXR is the interface between an application and XR runtime. The runtime handles functionality such as frame composition, user-triggered actions, and tracking information.
OpenXR is designed to be a layered API, which means that a user or application may insert API layers between the application and the runtime implementation. These API layers provide additional functionality by intercepting OpenXR functions from the layer above and then performing different operations than would otherwise be performed without the layer. In the simplest cases, the layer simply calls the next layer down with the same arguments, but a more complex layer may implement API functionality that is not present in the layers or runtime below it. This mechanism is essentially an architected
"function shimming" or
"intercept" feature that is designed into OpenXR and meant to replace more informal methods of
"hooking" API calls.
Applications may determine the API layers that are available to them by calling the xrEnumerateApiLayerProperties function to obtain a list of available API layers. Applications then may select the desired API layers from this list and provide them to the xrCreateInstance function when creating an instance.
API layers may implement OpenXR functions that may or may not be supported by the underlying runtime. In order to expose these new features, the API layer must expose this functionality in the form of an OpenXR extension. It must not expose new OpenXR functions without an associated extension.
An OpenXR instance is an object that allows an OpenXR application to communicate with an OpenXR runtime. The application accomplishes this communication by calling xrCreateInstance and receiving a handle to the resulting XrInstance object.
The XrInstance object stores and tracks OpenXR-related application state, without storing any such state in the application's global address space. This allows the application to create multiple instances as well as safely encapsulate the application's OpenXR state since this object is opaque to the application. OpenXR runtimes may limit the number of simultaneous XrInstance objects that may be created and used, but they must support the creation and usage of at least one XrInstance object per process.
Spaces are represented by XrSpace handles, which the application creates and then uses in API calls. Whenever an application calls a function that returns coordinates, it provides an XrSpace to specify the frame of reference in which those coordinates will be expressed. Similarly, when providing coordinates to a function, the application specifies which XrSpace the runtime to be used to interpret those coordinates.
OpenXR defines a set of well-known reference spaces that applications use to bootstrap their spatial reasoning. These reference spaces are: VIEW, LOCAL and STAGE. Each reference space has a well-defined meaning, which establishes where its origin is positioned and how its axes are oriented.
Runtimes whose tracking systems improve their understanding of the world over time may track spaces independently. For example, even though a LOCAL space and a STAGE space each map their origin to a static position in the world, a runtime with an inside-out tracking system may introduce slight adjustments to the origin of each space on a continuous basis to keep each origin in place.
Beyond the well-known reference spaces, runtimes expose other independently tracked spaces, such as a pose action space that tracks the pose of a motion controller over time.
Figure 4.6.4.1-1 depicts the lifecycle of an application that uses OpenXR for interaction and rendering with/to an HMD.
WebXR [5] is a set of APIs that are developed by the W3C to provide support for augmented reality (AR) and virtual reality (VR) in web environments, hence the name WebXR for cross reality in the web. When a WebXR session is created, the mode of the session is indicated, i.e. whether it is an AR or VR session. VR sessions may be consumed in 2 ways, inline and immersive. In the inline mode, the VR content is rendered on the 2D screen as part of the web document. In the immersive mode, the content is rendered on an HMD with an immersive 3DoF experience. AR sessions are always immersive.
A typical lifecycle of a WebXR application will start by checking for availability of the WebXR API support in the current browser. When the user requests the activation of a WebXR functionality, an XRSession with the desired mode is created. The XRSession instance is then used to request a frame to render using the requestAnimationFrame call. Complex scenes may require threaded rendering, which may be achieved through the usage of Worker instances. WebGL is then ultimately used to render to the provided frame. When calling the requestAnimationFrame, the application provides a callback function that will be called when a new frame is about to be rendered. The callback function will receive a timestamp, indicating the current timestamp of the XR pose. It also receives an XRFrame, which holds information about the current XR poses for all objects that are being tracked by the session. This information is then used to perform correct rendering by the application. The XRFrame offers two main functions, the getPose and getViewerPose. The getPose functions returns the relationship between two XRSpaces, which are passed in as input to that function. The getVeiwerPose returns the viewer's pose in relationship to a reference XRSpace that is passed to the function call.
WebXR defines a set of reference XRSpace(s) as described in the
Table 4.6.4.2-1: