AnovaGLM
Define a vector of models $\mathbf{M}$ and the corresponding base models $\mathbf{B}$
\[\begin{aligned} \mathbf{M} &= (M_1, ..., M_n)\\\\ \mathbf{B} &= (B_1, ..., B_n) \end{aligned}\]
where $M_1$ is the simplest model and $M_n$ is the most complex model.
When $m$ models, $(M_1, ..., M_m)$, are given, $\mathbf{M} = (M_2, ..., M_m)$, $\mathbf{B} = (M_1, ..., M_{m-1})$.
When one model is given, $n$ is the number of predictors except for the predictors used in the simplest model. The $\mathbf M$ and $\mathbf B$ depends on the type of ANOVA.
Let $m$, the number of columns of $M_n$'s model matrix; $l$, the number of predictors of $M_n$.
Define two sets, $\mathcal{C} = \{x \in \mathbb{N}\, |\, 1 \leq x \leq m\}$, the index of columns and $\mathcal{P} = \{x \in \mathbb{N}\, |\, 1 \leq x \leq l\}$, the index of predictors.
A map $id_X: \mathcal{C} \mapsto \mathcal{P}$ maps the index of columns into the corresponding predictor sequentially, i.e.,
\[\begin{aligned} \forall i \in \mathcal{C}, id_X(i) = k &\implies i\text{th column} \text{ is a level of } k\text{th predictor}\\\\ \forall i, j \in \mathcal{C}, i \lt j &\implies id_X(i) \leq id_X(j) \end{aligned}\]
The included predictors of $M_j$ and $B_j$ are $\mathcal{M}_j \subset \mathcal{P}$, $\mathcal{B}_j \subset \mathcal{P}$, respectively.
We can define a vector of index sets for each model, and calulate degrees of freedom (dof) of each predictor
\[\begin{aligned} \mathbf{I} &= (I_1, ..., I_n)\\\\ \mathbf{df} &= (n(I_1), ..., n(I_n)) \end{aligned}\]
where $\forall i \in I_k, id_X(i) \in \mathcal{M}_k\setminus \mathcal{B}_k$, and $n(I)$ is the size of $I$.
The deviance vector $\mathbf{D}$ for $\mathbf{M}$ and $\mathbf{S}$ for $\mathbf{B}$ are defined as the sum of squared deviance residuals (unit deviance) of each model. It is equivalent to the residual sum of squares for ordinary linear regression.
The explained deviance of each predictor is the difference of $\mathbf{D}$ and $\mathbf{S}$
\[\mathbf{E} = \mathbf{D} - \mathbf{S}\]
The mean explained deviance $\epsilon_i^2$ is therefore
\[\epsilon_i^2 = \frac{E_i}{df_i}\]
The mean residual deviance $\sigma^2$ is the squred estimated dispersion (or scale) parameter for $M_n$'s distribution.
For ordinary linear regression,
\[\sigma^2 =\frac{D_n}{df_r}\]
where $D_n$ is the residual sum of squares of $M_n$; $df_r$ is the degrees of freedom of the residuals, i.e. $df_r = nob - n(\mathcal{C})$, where $nob$ is number of observations.
F-test
F-value is a vector
\[\mathbf{F} \sim \mathcal{F}_{\mathbf{df}, df_r}\]
where
\[F_i = \frac{\epsilon_i^2}{\sigma^2}\]
For a single model, F-value is computed directly by the variance-covariance matrix ($\boldsymbol \Sigma$) and the coefficients ($\boldsymbol \beta$) of the most complex model, the deviance is calculated backward; each $M_j$ corresponds to a predictor $p_j$, i.e. $id_X[I_j] = \{j\}$.
Type I
Predictors are sequentially added to the null model $B_1$, i.e.,
\[\begin{aligned} \forall i, j \in \{x \in \mathbb{N}\, |\, 1\leq x\leq n\}, i < j &\implies (\mathcal{B}_i \subset \mathcal{B}_j) \land (\mathcal{M}_i \subset \mathcal{M}_j)\\\\ \mathcal{M}_i &= \mathcal{B}_i \cup \{p_i\}\\\\ \mathcal{B}_{i+1} &= \mathcal{M}_i \end{aligned}\]
Calculate F-value by the the upper factor of Cholesky factorization of $\boldsymbol \Sigma^{-1}$ and multiplying with $\boldsymbol \beta$
\[\begin{aligned} \boldsymbol{\Sigma}^{-1} &= \mathbf{LU}\\\\ \boldsymbol{\eta} &= \mathbf{U}\boldsymbol{\beta}\\\\ F_j &= \frac{\sum_{k \in I_j}{\eta_k^2}}{df_j} \end{aligned}\]
Type II
The included predictors are defined as follows,
\[\begin{aligned} \mathcal{B}_j &= \{k \in \mathcal{P}\, |\, k \text{ is not an interaction term of }p_j \text{ and other terms}\}\\\\ \mathcal{M}_j &= \mathcal{B}_j \cup \{p_j\} \end{aligned}\]
Define two vectors of index sets $\mathbf J$ and $\mathbf K$ where
\[\begin{aligned} J_j &= \{i \in \mathcal{C}\, |\, id_X(i) \text{ is an interaction term of }p_j \text{ and other terms}\}\\\\ K_j &= J_j \cup I_j \end{aligned}\]
And F-value is
\[F_j = \frac{\boldsymbol{\beta}_{K_j}^T \boldsymbol{\Sigma}_{K_j; K_j}^{-1} \boldsymbol{\beta}_{K_j} - \boldsymbol{\beta}_{J_j}^T \boldsymbol{\Sigma}_{J_j; J_j}^{-1} \boldsymbol{\beta}_{J_j}}{df_j}\]
Type III
All elements of $\mathbf{M}$ are the most complex model, and the base models are models without each predictors, i.e.
\[\begin{aligned} \mathcal{M}_j &= \mathcal{P}\\\\ \mathcal{B}_j &= \mathcal{P} \setminus \{p_j\} \end{aligned}\]
And F-value is
\[F_j = \frac{\boldsymbol{\beta}_{I_j}^T \boldsymbol{\Sigma}_{I_j; I_j}^{-1} \boldsymbol{\beta}_{I_j}}{df_j}\]
LRT
The likelihood ratio is a vector
\[\begin{aligned} \mathbf{L} &= \frac{\mathbf{E}}{\sigma^2}\\\\ \mathbf{L} &\sim \chi^2_{\mathbf{df}} \end{aligned}\]
When a single model is provided, $\mathbf{L}$ is computed directly by the variance-covariance matrix.
Calculate likelihood ratio by the the upper factor of Cholesky factorization of $\sigma^2 \boldsymbol{\Sigma}^{-1}$ and multiplying with $\boldsymbol \beta$
\[\begin{aligned} \sigma^2 \boldsymbol{\Sigma}^{-1} &= \mathbf{LU}\\\\ \boldsymbol{\eta} &= \mathbf{U}\boldsymbol{\beta}\\\\ L_j &= \sum_{k \in I_j}{\eta_k^2} \end{aligned}\]