siware.dev

# Line Fitting

## Line Fitting via Least Squares

Given a data with $n$ samples ${(x_0, y_0), (x_1, y_1), …, (x_{n-1}, y_{n-1})}$, we want to approximate the function with a line $y = f(x) = mx + c$.

We want to minimize the the error function by:

\begin{aligned} \quad E &= \sum_i{(f(x_i) - y_i)^2} \\ &= \sum_i{(m x_i + c - y_i)^2} \\ \end{aligned}

Note the variables are $m$ and $c$ and they can be found by setting each partial derivative to $0$.

Let’s find the first equation from $\frac{\partial E}{\partial m}$:

\begin{aligned} \quad \frac{\partial E}{\partial m} &= 0 \\ \sum_i{2 (m x_i + c - y_i) x_i} &= 0 \\ \sum_i{2 (m x_i^2 + c x_i - x_i y_i)} &= 0 \\ 2 (m \sum_i{x_i^2} + c \sum_i{x_i} - \sum_i{x_i}{y_i}) &= 0 \\ m \sum_i{xi^2} + c \sum_i{x_i} &= \sum_i{x_i}{y_i} \\ \end{aligned}

second equation from $\frac{\partial E}{\partial c}$:

\begin{aligned} \quad \frac{\partial E}{\partial c} &= 0 \\ \sum_i{2 (m x_i + c - y_i)} &= 0 \\ 2 (m \sum_i{x_i} + c \sum_i{1} - \sum_i{y_i}) &= 0 \\ m \sum_i{x_i} + c n &= \sum_i{y_i} \\ \end{aligned}

We have two unknowns $m$ and $c$ and two equations, if you work them out, you’ll get:

\begin{aligned} \quad m &= \frac{n(\sum_i{x_i y_i}) - (\sum_i{x_i})(\sum_i{y_i})}{n(\sum_i{x_i^2}) - (\sum_i{x_i})^2} \\ \quad c &= \frac{(\sum_i{x_i^2})(\sum_i{y_i}) - (\sum_i{x_i})(\sum_i{x_i y_i})}{n(\sum_i{x_i^2}) - (\sum_i{x_i})^2} \\ \end{aligned}