Transformation Notes

October 26, 2021

Here’s my notes on transformation between 3d spaces. All maths tries to follow math textbook notation.

Rotation

Inverse of 3x3 Matrix

Inverse of 3x3 Matrix can be defined as dot and cross products computation (so it’s easier to remember).

$ \quad U = \begin{bmatrix} u_x \\ u_y \\ u_z \end{bmatrix} \quad V = \begin{bmatrix} v_x \\ v_y \\ v_z \end{bmatrix} \quad W = \begin{bmatrix} w_x \\ w_y \\ w_x \end{bmatrix} $

$ \begin{aligned} \quad M = \begin{bmatrix} U \\ V \\ W \\ \end{bmatrix} = \begin{bmatrix} u_x & u_y & u_z \\ v_x & v_y & v_z \\ w_x & w_y & w_z \\ \end{bmatrix} \\ \end{aligned} $

$ \displaystyle \quad M^{-1} = \begin{bmatrix} \frac{V \times W}{U \cdot (V \times W)} & \frac{W \times U}{V \cdot (W \times U)} & \frac{U \times V}{W \cdot (U \times V)} \end{bmatrix} $

View Matrix

We can think View Matrix construction as translating the camera to the origin and aligning camera with the standard axes. This can be written as:

$ \quad V = R^T \times T^{-1} = \begin{bmatrix} u_x & u_y & u_z & 0 \\ v_x & v_y & v_z & 0 \\ w_x & w_y & w_z & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} 1 & 0 & 0 & -p_x \\ 0 & 1 & 0 & -p_y \\ 0 & 0 & 1 & -p_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} = \begin{bmatrix} u_x & u_y & u_z & -(U \cdot P) \\ v_x & v_y & v_y & -(V \cdot P) \\ w_x & w_y & w_z & -(W \cdot P) \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} $

The inverse is simply:

$ \quad V^{-1} = T \times (R^T)^{-1} = \begin{bmatrix} 1 & 0 & 0 & p_x \\ 0 & 1 & 0 & p_y \\ 0 & 0 & 1 & p_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \times \begin{bmatrix} \frac{V \times W}{U \cdot (V \times W)} & \frac{W \times U}{V \cdot (W \times U)} & \frac{U \times V}{W \cdot (U \times V)} & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $

If the axes $U, V, W$ are orthonormal then the inverse simplifies to:

$ \quad V^{-1} = T \times R = \begin{bmatrix} u_x & v_x & w_x & p_x \\ u_y & v_y & w_y & p_y \\ u_z & v_z & w_z & p_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} $

Projection Matrix

Infinite Projection Matrix

Seems like most people have been moving towards infinite projection matrix with reversed Z, meaning nearZ is 1 and farZ is 0. Another nice things is it has simple inverse:

$ \quad P = \begin{bmatrix} \frac{1}{aspect \; * \; tan(0.5 * FOV_y)} & 0 & 0 & 0 \\ 0 & \frac{1}{tan(0.5 * FOV_y)} & 0 & 0 \\ 0 & 0 & 0 & near \\ 0 & 0 & 1 & 0 \\ \end{bmatrix} $

$ \quad P^{-1} = \begin{bmatrix} aspect \; * \; tan(0.5 * FOV_y) & 0 & 0 & 0 \\ 0 & tan(0.5 * FOV_y) & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & \frac{1}{near} & 0 \\ \end{bmatrix} $

$aspect = \frac{viewport_{width}}{viewport_{height}}$

Viewport Matrix

Viewport Matrix transforms NDC space to screenPos. Let the bounds in screen space defined as:

$ \quad x_{min} = left \\ \quad x_{max} = right \\ \quad y_{min} = top \\ \quad y_{max} = bottom \\ $

and

$ \quad offset = 0.5 * \begin{bmatrix} x_{min} + x_{max} \\ y_{min} + y_{max} \end{bmatrix} \\ \quad extent = 0.5 * \begin{bmatrix} x_{max} - x_{min} \\ y_{max} - y_{min} \end{bmatrix} $

then the Viewport Matrix $Vp$ is defined as (note that y-axis is flipped):

$ \begin{aligned} \quad V_p &= \begin{bmatrix} extent_x & 0 & 0 & offset_x \\ 0 & -extent_y & 0 & offset_y \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \\ \quad V_p^-1 &= \begin{bmatrix} \frac{1}{extent_x} & 0 & 0 & -\frac{offset_x}{extent_x} \\ 0 & -\frac{1}{extent_y} & 0 & \frac{offset_y}{extent_y} \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{aligned} $

For full screen viewport, we have:

$ \begin{aligned} \quad x_{min} &= 0 \quad\quad x_{max} = width \\ \quad y_{min} &= 0 \quad\quad y_{max} = height \\ \quad offset &= extent = 0.5 * \begin{bmatrix} width \\ height \end{bmatrix} \\ \end{aligned} $

which yields:

$ \begin{aligned} \quad V_p = \begin{bmatrix} 0.5 * width & 0 & 0 & 0.5 * width \\ 0 & -0.5 * height & 0 & 0.5 * height \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} \end{aligned} \\ \quad V_p^-1 = \begin{bmatrix} \frac{2}{width} & 0 & 0 & -1 \\ 0 & -\frac{2}{height} & 0 & 1 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix} $

Rotation From To Matrix

One common problem in computer graphics is to compute matrix that rotates unit vector $a$ (from) to another unit vector $b$ (to). The most intuitive way is to use formula for rotation matrix around axis $u$ and angle $\theta$ [wiki]:

$ \begin{aligned} \quad R &= \begin{bmatrix} u_x u_x (1 - cos\theta) + cos\theta & u_x u_y (1 - cos\theta) - u_z sin\theta & u_x u_z (1 - cos\theta) + u_y sin\theta \\ u_y u_x (1 - cos\theta) + u_z sin\theta & u_y u_y (1 - cos\theta) + cos\theta & u_y u_z (1 - cos\theta) - u_x sin\theta \\ u_z u_x (1 - cos\theta) - u_y sin\theta & u_z u_y (1 - cos\theta) + u_x sin\theta & u_z u_z (1 - cos\theta) + cos\theta \end{bmatrix} \\ &= u u^T (1 - cos\theta) + cos\theta \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} + sin\theta \begin{bmatrix} 0 & -u_z & u_y \\ u_z & 0 & -u_x \\ -u_y & u_x & 0 \\ \end{bmatrix} \end{aligned} $

The steps are:

compute rotation axis: $\quad u = \frac{from \times to}{\lVert from \times to \rVert}$
compute rotation angle: $\quad \theta = acos(from \cdot to)$
compute rotation matrix: $\quad R(u, \theta)$

Plugging in the following identities:

$ \begin{aligned} \quad v &= a \times b \\ \quad u &= \frac{a \times b}{\lVert a \times b \rVert} = \frac{v}{sin\theta} \\ \quad uu^T &= \frac{vv^T}{sin^2\theta} = \frac{vv^T}{1 - cos^2\theta} = \frac{vv^T}{(1 - cos\theta)(1 + cos\theta)} \\ \quad cos\theta &= a \cdot b \end{aligned} $

we can obtain the final rotation matrix:

$ \begin{aligned} \quad R &= u u^T (1 - cos\theta) + cos\theta \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} + sin\theta \begin{bmatrix} 0 & -u_z & u_y \\ u_z & 0 & -u_x \\ -u_y & u_x & 0 \\ \end{bmatrix} \\ &= \frac{vv^T}{1 + a \cdot b} + \begin{bmatrix} a \cdot b & 0 & 0 \\ 0 & a \cdot b & 0 \\ 0 & 0 & a \cdot b \end{bmatrix} + \begin{bmatrix} 0 & -v_z & v_y \\ v_z & 0 & -v_x \\ -v_y & v_x & 0 \end{bmatrix} \\ &= \frac{vv^T}{1 + a \cdot b} + \begin{bmatrix} a \cdot b & -v_z & v_y \\ v_z & a \cdot b & -v_x \\ -v_y & v_x & a \cdot b \end{bmatrix} \end{aligned} $

The final result is exactly the same with [Inigo’s].

Screen Position and Pixel

I make distinction between screen position and pixel so it’s clear on my head. This is also part of my own vocabulary:

Screen Space is a continuous coordinate from [0, width] x [0, height] and Screen Position refers to the center pixel and usually obtained from SV_Position.xy.
Pixel is a picture element that can be identified with pair of integer (x, y) and it ranges from [0, width) x [0, height)
Given the pixelId (2, 5) the screen position is (2.5, 5.5).

Floating point operations

When working with screen position and pixel, it is helpful to understand different behaviors of common floating point operations/conversions:

floor, ceil, frac are well understood
round is used to round a number to the nearest integer. In some languages, this can mean “round half away from zero”. Various rounding operations are well explained in [wiki].
trunc is floor towards zero, i.e. $trunc(x) = sgn(x) \lfloor |x| \rfloor$. int(floatValue) actually performs truncation.
fmod can be implemented as the following:

float fmod(float x, float y)
{
    // return fma(trunc(x/y), -y, x);
    return x - trunc(x/y) * y;
}

snapToPixel snaps screen position to the nearest pixel:

float2 snapToPixel(float2 screenPos)
{
    //  screen | -1.00 | -0.75 | -0.50 | -0.25 | 0.00 | 0.25 | 0.50 | 0.75 | 1.00  |
    // --------+-------+-------+-------+-------+------+------+------+------+-------+
    //  pixel  | -1.5  | -0.5  | -0.5  | -0.5  | -0.5 | 0.5  | 0.5  | 0.5  | 0.5   |
    // --------+-------+-------+-------+-------+------+------+------+------+-------+
    return ceil(screenPos - 1.0f) + 0.5f;
}

wrapInt wraps around an integer number (including negative) for a given count. This behaves intuitively and not the same as mod.

int wrapInt(int index, int count)
{
    //             idx  | -5   -4 | -3   -2   -1 |  0    1    2 |  3    4    5 |
    // -----------------+---------+--------------+--------------+--------------+
    //  wrapInt(idx, 3) |  1    2 |  0    1    2 |  0    1    2 |  0    1    2 |
    //  index % 3       | -2   -1 |  0   -2   -1 |  0    1    2 |  0    1    2 |
    // -----------------+---------+--------------+--------------+--------------+
    return (index % count + count) % count;
}

vpIntersect intersect viewport in screen space, this can be useful to make sure sample position doesn’t go outside screen space.

// ro has to be inside rect
float rectIntersect(float2 ro, float2 rd, float2 rectMin, float2 rectMax)
{
    float2 rcp_rd = rcp(rd);
    float2 t_min = (rectMin - ro) * rcp_rd;
    float2 t_max = (rectMax - ro) * rcp_rd;
    float2 t_front = max(t_min, t_max);
    float t = min(t_front.x, t_front.y);
    return t;
}

// viewport intersect in screen space
float vpIntersect(float2 screenPos, float2 dir, float2 vpMin, float2 vpMax)
{
    return rectIntersect(screenPos, dir, vpMin + 0.5, vpMax - 0.5);
}