The 3D XYZ coordinate system as shown in
Figure A.1 can be used to describe the 3D geometry of ERP and CMP projection format representations. Starting from the center of the sphere, X axis points toward the front of the sphere, Z axis points toward the top of the sphere, and Y axis points toward the left of the sphere.
The coordinate system is specified for defining the sphere coordinates azimuth (Φ) and elevation (θ) for identifying a location of a point on the unit sphere. The azimuth Φ is in the range [−ϖ, ϖ], and elevation θ is in the range [−ϖ/2, ϖ/2], where ϖ is the ratio of a circle's circumference to its diameter. The azimuth (Φ) is defined by the angle starting from X axis in counter-clockwise direction as shown in
Figure A.1. The elevation (θ) is defined by the angle from the equator toward Z axis as shown in
Figure A.1. The (X, Y, Z) coordinates on the unit sphere can be evaluated from (Φ, θ) using following equations:
X = cos(θ)*cos(Φ)
Y = cos(θ)*sin(Φ)
Z = sin(θ)
Inversely, the longitude and latitude (Φ, θ) can be evaluated from (X, Y, Z) coordinates using:
Φ = tan-1(Y/X)
θ = sin-1(Z/(sqrt(X2+Y2+Z2)))
A 2D plane coordinate system is defined for each face in the 2D projection plane. Where Equirectangular Projection (ERP) has only one face, Cubemap Projection (CMP) has six faces. In order to generalize the 2D coordinate system, a face index is defined for each face in the 2D projection plane. Each face is mapped to a 2D plane associated with one face index.
Equirectangular mapping is the most commonly used mapping from spherical video to a 2D texture signal. The mapping is bijective, i.e. it may be expressed in both directions and is illustrated in
Figure A.2.
ERP has only one face and the face index f for ERP is always set to 0. The sphere coordinates (Φ, θ) for a sample location (i, j), in degrees, are given by the following equations:
Φ = (0.5 - i/pictureWidth)*360
θ = (0.5 - j/pictureHeight)*180
Finally, (X, Y, Z) can be calculated from the equations given above.
Figure A.3 shows the CMP projection with 6 square faces, labelled as PX, PY, PZ, NX, NY, NZ (with
"P" standing for
"positive" and
"N" standing for
"negative").
Table A.2-4 specifies the face index values corresponding to each of the six CMP faces.
The 3D coordinates (X, Y, Z) are derived using following equations:
lw = pictureWidth / 3
lh = pictureHeight / 2
tmpHorVal = i − Floor( i ÷ lw ) * lw
tmpVerVal = j − Floor( j ÷ lh ) * lh
i' = −( 2 * tmpHorVal ÷ lw ) + 1
j' = −( 2 * tmpVerVal ÷ lh ) + 1
w = Floor( i ÷ lw )
h = Floor( j ÷ lh )
if( w = = 1 && h = = 0 ) { // PX: positive x front face
X = 1.0
Y = i'
Z = j'
} else if( w = = 1 && h = = 1 ) { // NX: negative x back face
X = −1.0
Y = −j'
Z = −i'
} else if( w = = 2 && h = = 1 ) { // PZ: positive z top face
X = −i'
Y = −j'
Z = 1.0
} else if( w = = 0 && h = = 1 ) { // NZ: negative z bottom face
X = i'
Y = −j'
Z = −1.0
} else if( w = = 0 && h = = 0 ) { // PY: positive y left face
X = −i'
Y = 1.0
Z = j'
} else { // ( w = = 2 && h = = 0 ), NY: negative y right face
X = i'
Y = −1.0
Z = j'
}
Denote (f
d,i
d,j
d) as a point (i
d,j
d) on face f
d in the destination projection format, and (f
s,i
s,j
s) as a point (i
s,j
s) on face f
s in the source projection format. Denote (X,Y,Z) as the corresponding coordinates in the 3D XYZ space. The conversion process starts from each sample position (f
d,i
d,j
d) on the destination projection plane, maps it to the corresponding (X,Y,Z) in 3D coordinate system, finds the corresponding sample position (f
s,i
s,j
s) on the source projection plane, and sets the sample value at (f
d,i
d,j
d) based on the sample value at (f
s,i
s,j
s).
Therefore, the projection format conversion process from ERP source format to CMP destination format is performed in the following three steps:
-
Map the destination 2D sampling point (fd,id,jd) to 3D space coordinates (X,Y,Z) based on the CMP format.
-
Map (X,Y,Z) from step 1 to 2D sampling point (f0,is,js) based to the ERP format.
-
Calculate the sample value at (f0,is,js) by interpolating from neighboring samples at integer positions on face f0, and the interpolated sample value is placed at (fd,id,jd) in the destination projection format.
The above steps are repeated until all sample positions (f
d,i
d,j
d) in the destination projection format are filled. Note that (Step 1) and (Step 2) can be pre-calculated at the sequence level and stored as a lookup table, and only (Step 3) needs to be performed per sample position for each picture in order to render the sample values.