AnovaFixedEffectModels
Define a vector of models $\mathbf{M}$ and the corresponding base models $\mathbf{B}$
\[\begin{aligned} \mathbf{M} &= (M_1, ..., M_n)\\\\ \mathbf{B} &= (B_1, ..., B_n) \end{aligned}\]
where $M_1$ is the simplest model with fixed effects and $M_n$ is the most complex model.
When $m$ models, $(M_1, ..., M_m)$, are given, $\mathbf{M} = (M_2, ..., M_m)$, $\mathbf{B} = (M_1, ..., M_{m-1})$.
When one model is given, $n$ is the number of predictors except for the predictors used in the simplest model. The $\mathbf M$ and $\mathbf B$ depends on the type of ANOVA.
Let $m$, the number of columns of $M_n$'s model matrix; $l$, the number of predictors of $M_n$.
Define two sets, $\mathcal{C} = \{x \in \mathbb{N}\, |\, 1 \leq x \leq m\}$, the index of columns and $\mathcal{P} = \{x \in \mathbb{N}\, |\, 1 \leq x \leq l\}$, the index of predictors.
A map $id_X: \mathcal{C} \mapsto \mathcal{P}$ maps the index of columns into the corresponding predictor sequentially, i.e.,
\[\begin{aligned} \forall i \in \mathcal{C}, id_X(i) = k &\implies i\text{th column} \text{ is a level of } k\text{th predictor}\\\\ \forall i, j \in \mathcal{C}, i \lt j &\implies id_X(i) \leq id_X(j) \end{aligned}\]
The included predictors of $M_j$ and $B_j$ are $\mathcal{M}_j \subset \mathcal{P}$, $\mathcal{B}_j \subset \mathcal{P}$, respectively.
We can define a vector of index sets for each model, and calulate degrees of freedom (dof) of each predictor
\[\begin{aligned} \mathbf{I} &= (I_1, ..., I_n)\\\\ \mathbf{df} &= (n(I_1), ..., n(I_n)) \end{aligned}\]
where $\forall i \in I_k, id_X(i) \in \mathcal{M}_k\setminus \mathcal{B}_k$, and $n(I)$ is the size of $I$.
The explained deviance of each predictor is the difference of $\mathbf{D}$ and $\mathbf{S}$
\[\mathbf{E} = \mathbf{D} - \mathbf{S}\]
The mean explained deviance $\epsilon_i^2$ is therefore
\[\epsilon_i^2 = \frac{E_i}{df_i}\]
The mean residual deviance $\sigma^2$
\[\sigma^2 =\frac{D_n}{df_r}\]
where $D_n$ is the residual sum of squares of $M_n$; $df_r$ is the degrees of freedom of the residuals, i.e. $df_r = nob - n(\mathcal{C})$, where $nob$ is number of observations.
F-test
F-value is a vector
\[\mathbf{F} \sim \mathcal{F}_{\mathbf{df}, df_r}\]
where
\[F_i = \frac{\epsilon_i^2}{\sigma^2}\]
For a single model, F-value is computed directly by the variance-covariance matrix ($\boldsymbol \Sigma$) and the coefficients ($\boldsymbol \beta$) of the model, the deviance is calculated backward; each $M_j$ corresponds to a predictor $p_j$, i.e. $id_X[I_j] = \{j\}$.
Type I
Predictors are sequentially added to the null model with fixed effects $B_1$, i.e.,
\[\begin{aligned} \forall i, j \in \{x \in \mathbb{N}\, |\, 1\leq x\leq n\}, i < j &\implies (\mathcal{B}_i \subset \mathcal{B}_j) \land (\mathcal{M}_i \subset \mathcal{M}_j)\\\\ \mathcal{M}_i &= \mathcal{B}_i \cup \{p_i\}\\\\ \mathcal{B}_{i+1} &= \mathcal{M}_i \end{aligned}\]
Calculate F-value by the the upper factor of Cholesky factorization of $\boldsymbol \Sigma^{-1}$ and multiplying with $\boldsymbol \beta$:
\[\begin{aligned} \boldsymbol{\Sigma}^{-1} &= \mathbf{LU}\\\\ \boldsymbol{\eta} &= \mathbf{U}\boldsymbol{\beta}\\\\ F_j &= \frac{\sum_{k \in I_j}{\eta_k^2}}{df_j} \end{aligned}\]
Type II
The included predictors are defined as follows,
\[\begin{aligned} \mathcal{B}_j &= \{k \in \mathcal{P}\, |\, k \text{ is not an interaction term of }p_j \text{ and other terms}\}\\\\ \mathcal{M}_j &= \mathcal{B}_j \cup \{p_j\} \end{aligned}\]
Define two vectors of index sets $\mathbf J$ and $\mathbf K$ where
\[\begin{aligned} J_j &= \{i \in \mathcal{C}\, |\, id_X(i) \text{ is an interaction term of }p_j \text{ and other terms}\}\\\\ K_j &= J_j \cup I_j \end{aligned}\]
And F-value is
\[F_j = \frac{\boldsymbol{\beta}_{K_j}^T \boldsymbol{\Sigma}_{K_j; K_j}^{-1} \boldsymbol{\beta}_{K_j} - \boldsymbol{\beta}_{J_j}^T \boldsymbol{\Sigma}_{J_j; J_j}^{-1} \boldsymbol{\beta}_{J_j}}{df_j}\]
Type III
All elements of $\mathbf{M}$ are the most complex model, and the base models are models without each predictors, i.e.
\[\begin{aligned} \mathcal{M}_j &= \mathcal{P}\\\\ \mathcal{B}_j &= \mathcal{P} \setminus \{p_j\} \end{aligned}\]
And F-value is
\[F_j = \frac{\boldsymbol{\beta}_{I_j}^T \boldsymbol{\Sigma}_{I_j; I_j}^{-1} \boldsymbol{\beta}_{I_j}}{df_j}\]
LRT
The likelihood ratio is a vector
\[\begin{aligned} \mathbf{L} &= \frac{\mathbf{E}}{\sigma^2}\\\\ \mathbf{L} &\sim \chi^2_{\mathbf{df}} \end{aligned}\]