In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE)[1][2][3] was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models.[1] The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression.[1] MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models.[3] MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.  
  Principles
 We are concerned with a mixed effects model for the random vector  with the following linear structure.
 with the following linear structure. 
 
 
Here,  is a design matrix for the fixed effects,
 is a design matrix for the fixed effects,  represents the unknown fixed-effect parameters,
 represents the unknown fixed-effect parameters,  is a design matrix for the
 is a design matrix for the  -th random-effect component, and
-th random-effect component, and  is a random vector for the
 is a random vector for the  -th random-effect component. The random effects are assumed to have zero mean (
-th random-effect component. The random effects are assumed to have zero mean (![{\displaystyle \mathbb {E} [{\boldsymbol {\xi }}_{i}]=\mathbf {0} }](./_assets_/9eeb6d58df7c2bebd93f7b14896017924b310ba7.svg) ) and be uncorrelated (
) and be uncorrelated (![{\displaystyle \mathbb {V} [{\boldsymbol {\xi }}_{i}]=\sigma _{i}^{2}\mathbf {I} _{c_{i}}}](./_assets_/9fabefa4e017ef43ab1a896ca7fbf13dd0a24d22.svg) ). Furthermore, any two random effect vectors are also uncorrelated (
). Furthermore, any two random effect vectors are also uncorrelated (![{\displaystyle \mathbb {V} [{\boldsymbol {\xi }}_{i},{\boldsymbol {\xi }}_{j}]=\mathbf {0} \,\forall i\neq j}](./_assets_/a4d3b3d9c0b0381ed2ad9c1f057bb916d9ca74a9.svg) ). The unknown variances
). The unknown variances  represent the variance components of the model.
 represent the variance components of the model. 
This is a general model that captures commonly used linear regression models. 
 - Gauss-Markov Model[3]: If we consider a one-component model where  , then the model is equivalent to the Gauss-Markov model , then the model is equivalent to the Gauss-Markov model with with![{\displaystyle \mathbb {E} [{\boldsymbol {\epsilon }}]=\mathbf {0} }](./_assets_/fd4c808236abae98d83438c287ed9c63bfe30fc9.svg) and and![{\displaystyle \mathbb {V} [{\boldsymbol {\epsilon }}]=\sigma _{1}^{2}\mathbf {I} _{n}}](./_assets_/9710c973d6fd1f797a2c72e024809409c9612e7d.svg) . .
- Heteroscedastic Model[1]: Each set of random variables in  that shares a common variance can be modeled as an individual variance component with an appropriate that shares a common variance can be modeled as an individual variance component with an appropriate . .
A compact representation for the model is the following, where ![{\displaystyle \mathbf {U} =\left[{\begin{array}{c|c|c}\mathbf {U} _{1}&\cdots &\mathbf {U} _{k}\end{array}}\right]}](./_assets_/9af1ac7f9eee04228198acbca036ef6b160c8a1a.svg) and
 and ![{\displaystyle {\boldsymbol {\xi }}^{\top }=\left[{\begin{array}{c|c|c}{\boldsymbol {\xi }}_{1}^{\top }&\cdots &{\boldsymbol {\xi }}_{k}^{\top }\end{array}}\right]}](./_assets_/b97e342d8e986576cb3f2e170e4becbbc03d417c.svg) .
. 
 
 
Note that this model makes no distributional assumptions about  other than the first and second moments.[3]
 other than the first and second moments.[3] 
![{\displaystyle \mathbb {E} [\mathbf {Y} ]=\mathbf {X} {\boldsymbol {\beta }}}](./_assets_/e015d48dcf172a349b5e6f7ac9d67c2d1e31e86f.svg) 
 
![{\displaystyle \mathbb {V} [\mathbf {Y} ]=\sigma _{1}^{2}\mathbf {U} _{1}\mathbf {U} _{1}^{\top }+\cdots +\sigma _{k}^{2}\mathbf {U} _{k}\mathbf {U} _{k}^{\top }\equiv \sigma _{1}^{2}\mathbf {V} _{1}+\cdots +\sigma _{k}^{2}\mathbf {V} _{k}}](./_assets_/df5a5d4fae453919637007d6da01ebdc229734f7.svg) 
 
The goal in MINQUE is to estimate  using a quadratic form
 using a quadratic form  . MINQUE estimators are derived by identifying a matrix
. MINQUE estimators are derived by identifying a matrix  such that the estimator has some desirable properties,[2][3] described below.
 such that the estimator has some desirable properties,[2][3] described below. 
 Optimal Estimator Properties to Constrain MINQUE
 Invariance to translation of the fixed effects
 Consider a new fixed-effect parameter  , which represents a translation of the original fixed effect. The new, equivalent model is now the following.
, which represents a translation of the original fixed effect. The new, equivalent model is now the following. 
 
 
Under this equivalent model, the MINQUE estimator is now  . Rao argued that since the underlying models are equivalent, this estimator should be equal to
. Rao argued that since the underlying models are equivalent, this estimator should be equal to  .[2][3] This can be achieved by constraining
.[2][3] This can be achieved by constraining  such that
 such that  , which ensures that all terms other than
, which ensures that all terms other than  in the expansion of the quadratic form are zero.
 in the expansion of the quadratic form are zero. 
 Unbiased estimation
 Suppose that we constrain  , as argued in the section above. Then, the MINQUE estimator has the following form
, as argued in the section above. Then, the MINQUE estimator has the following form 
 
 
To ensure that this estimator is unbiased, the expectation of the estimator ![{\displaystyle \mathbb {E} [{\hat {\theta }}]}](./_assets_/e099502047c963d65c845b1702a24f96846afa34.svg) must equal the parameter of interest,
 must equal the parameter of interest,  . Below, the expectation of the estimator can be decomposed for each component since the components are uncorrelated with each other. Furthermore, the cyclic property of the trace is used to evaluate the expectation with respect to
. Below, the expectation of the estimator can be decomposed for each component since the components are uncorrelated with each other. Furthermore, the cyclic property of the trace is used to evaluate the expectation with respect to  .
. 
![{\displaystyle {\begin{aligned}\mathbb {E} [{\hat {\theta }}]&=\mathbb {E} [{\boldsymbol {\xi }}^{\top }\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} {\boldsymbol {\xi }}]\\&=\sum _{i=1}^{k}\mathbb {E} [{\boldsymbol {\xi }}_{i}^{\top }\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}{\boldsymbol {\xi }}_{i}]\\&=\sum _{i=1}^{k}\sigma _{i}^{2}\mathrm {Tr} [\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}]\end{aligned}}}](./_assets_/7f8ef44766ef0c6d6986c7f0c24e2926723977af.svg) 
 
To ensure that this estimator is unbiased, Rao suggested setting ![{\displaystyle \sum _{i=1}^{k}\sigma _{i}^{2}\mathrm {Tr} [\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}]=\sum _{i=1}^{k}p_{i}\sigma _{i}^{2}}](./_assets_/ad06fc470e86fed5fdde0ec2b7b1bb6c9b274f84.svg) , which can be accomplished by constraining
, which can be accomplished by constraining  such that
 such that ![{\displaystyle \mathrm {Tr} [\mathbf {U} _{i}^{\top }\mathbf {A} \mathbf {U} _{i}]=\mathrm {Tr} [\mathbf {A} \mathbf {V} _{i}]=p_{i}}](./_assets_/1ee98dcd4a0bd17070c24e2d19c627cec9b60af0.svg) for all components.[3]
 for all components.[3] 
 Minimum Norm
 Rao argues that if  were observed, a "natural" estimator for
 were observed, a "natural" estimator for  would be the following[2][3] since
 would be the following[2][3] since ![{\displaystyle \mathbb {E} [{\boldsymbol {\xi }}_{i}^{\top }{\boldsymbol {\xi }}_{i}]=c_{i}\sigma _{i}^{2}}](./_assets_/06d71d5b18359a674d2060e395468873c5d5e8bf.svg) . Here,
. Here,  is defined as a diagonal matrix.
 is defined as a diagonal matrix. 
![{\displaystyle {\frac {p_{1}}{c_{1}}}{\boldsymbol {\xi }}_{1}^{\top }{\boldsymbol {\xi }}_{1}+\cdots +{\frac {p_{k}}{c_{k}}}{\boldsymbol {\xi }}_{k}^{\top }{\boldsymbol {\xi }}_{k}={\boldsymbol {\xi }}^{\top }\left[\mathrm {diag} \left({\frac {p_{1}}{c_{i}}},\cdots ,{\frac {p_{k}}{c_{k}}}\right)\right]{\boldsymbol {\xi }}\equiv {\boldsymbol {\xi }}^{\top }{\boldsymbol {\Delta }}{\boldsymbol {\xi }}}](./_assets_/40316298b3f9e9306ca5c2a91d1667b1f747d9d2.svg) 
 
The difference between the proposed estimator and the natural estimator is  . This difference can be minimized by minimizing the norm of the matrix
. This difference can be minimized by minimizing the norm of the matrix  .
. 
 Procedure
 Given the constraints and optimization strategy derived from the optimal properties above, the MINQUE estimator  for
 for  is derived by choosing a matrix
 is derived by choosing a matrix  that minimizes
 that minimizes  , subject to the constraints
, subject to the constraints 
  , and , and
![{\displaystyle \mathrm {Tr} [\mathbf {A} \mathbf {V} _{i}]=p_{i}}](./_assets_/db639fa2951b50467de55103ac2ac4d351ab3bd4.svg) . .
Examples of Estimators
 Standard Estimator for Homoscedastic Error
 In the Gauss-Markov model, the error variance  is estimated using the following.
 is estimated using the following. 
 
 
This estimator is unbiased and can be shown to minimize the Euclidean norm of the form  .[1] Thus, the standard estimator for error variance in the Gauss-Markov model is a MINQUE estimator.
.[1] Thus, the standard estimator for error variance in the Gauss-Markov model is a MINQUE estimator. 
 Random Variables with Common Mean and Heteroscedastic Error
 For random variables  with a common mean and different variances
 with a common mean and different variances  , the MINQUE estimator for
, the MINQUE estimator for  is
 is  , where
, where  and
 and  .[1]
.[1] 
 Estimator for Variance Components
 Rao proposed a MINQUE estimator for the variance components model based on minimizing the Euclidean norm.[2] The Euclidean norm  is the square root of the sum of squares of all elements in the matrix. When evaluating this norm below,
 is the square root of the sum of squares of all elements in the matrix. When evaluating this norm below,  . Furthermore, using the cyclic property of traces,
. Furthermore, using the cyclic property of traces, ![{\displaystyle \mathrm {Tr} [\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} {\boldsymbol {\Delta }}]=\mathrm {Tr} [\mathbf {A} \mathbf {U} {\boldsymbol {\Delta }}\mathbf {U} ^{\top }]=\mathrm {Tr} \left[\sum _{i=1}^{k}{\frac {p_{i}}{c_{i}}}\mathbf {A} \mathbf {V} _{i}\right]=\mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]}](./_assets_/888e54d9c5ac59f4d41ee8cbd6ed3095e5dba81f.svg) .
. 
![{\displaystyle {\begin{aligned}\lVert \mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }}\rVert _{2}^{2}&=(\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }})^{\top }(\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} -{\boldsymbol {\Delta }})\\&=\mathrm {Tr} [\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} \mathbf {U} \mathbf {A} \mathbf {U} ^{\top }]-\mathrm {Tr} [2\mathbf {U} ^{\top }\mathbf {A} \mathbf {U} {\boldsymbol {\Delta }}]+\mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]\\&=\mathrm {Tr} [\mathbf {A} \mathbf {V} \mathbf {A} \mathbf {V} ]-\mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]\end{aligned}}}](./_assets_/bf7ca3c4c301e4738f681b7bd9688bfc1ab8e096.svg) 
 
Note that since ![{\displaystyle \mathrm {Tr} [{\boldsymbol {\Delta }}{\boldsymbol {\Delta }}]}](./_assets_/8b115119a018fa254c548a7807853ef7123bf12c.svg) does not depend on
 does not depend on  , the MINQUE with the Euclidean norm is obtained by identifying the matrix
, the MINQUE with the Euclidean norm is obtained by identifying the matrix  that minimizes
 that minimizes ![{\displaystyle \mathrm {Tr} [\mathbf {A} \mathbf {V} \mathbf {A} \mathbf {V} ]}](./_assets_/e0659ef351cb0f54c40359996b2aa1cae45eab0c.svg) , subject to the MINQUE constraints discussed above.
, subject to the MINQUE constraints discussed above. 
Rao showed that the matrix  that satisfies this optimization problem is
 that satisfies this optimization problem is 
 ,
, 
where  ,
,  is the projection matrix into the column space of
 is the projection matrix into the column space of  , and
, and  represents the generalized inverse of a matrix.
 represents the generalized inverse of a matrix. 
Therefore, the MINQUE estimator is the following, where the vectors  and
 and  are defined based on the sum.
 are defined based on the sum. 
 
 
The vector  is obtained by using the constraint
 is obtained by using the constraint ![{\displaystyle \mathrm {Tr} [\mathbf {A} _{\star }\mathbf {V} _{i}]=p_{i}}](./_assets_/3023eaf21ca9502a45980054f85771955f5bf383.svg) . That is, the vector represents the solution to the following system of equations
. That is, the vector represents the solution to the following system of equations  .
. 
![{\displaystyle {\begin{aligned}\mathrm {Tr} [\mathbf {A} _{\star }\mathbf {V} _{j}]&=p_{j}\\\mathrm {Tr} \left[\sum _{i=1}^{k}\lambda _{i}\mathbf {R} \mathbf {V} _{i}\mathbf {R} \mathbf {V} _{j}\right]&=p_{j}\\\sum _{i=1}^{k}\lambda _{i}\mathrm {Tr} [\mathbf {R} \mathbf {V} _{i}\mathbf {R} \mathbf {V} _{j}]&=p_{j}\end{aligned}}}](./_assets_/404befef48ea899db521f4b538bd221654db4a86.svg) 
 
This can be written as a matrix product  , where
, where ![{\displaystyle \mathbf {p} =[p_{1}\,\cdots \,p_{k}]^{\top }}](./_assets_/577c47965f33b39a55dcf525ddfc1e626e014df2.svg) and
 and  is the following.
 is the following. 
![{\displaystyle \mathbf {S} ={\begin{bmatrix}\mathrm {Tr} [\mathbf {R} \mathbf {V} _{1}\mathbf {R} \mathbf {V} _{1}]&\cdots &\mathrm {Tr} [\mathbf {R} \mathbf {V} _{k}\mathbf {R} \mathbf {V} _{1}]\\\vdots &\ddots &\vdots \\\mathrm {Tr} [\mathbf {R} \mathbf {V} _{1}\mathbf {R} \mathbf {V} _{k}]&\cdots &\mathrm {Tr} [\mathbf {R} \mathbf {V} _{k}\mathbf {R} \mathbf {V} _{k}]\end{bmatrix}}}](./_assets_/bd9f01a05af5a24601f2953a6417e99a8f3074e1.svg) 
 
Then,  . This implies that the MINQUE is
. This implies that the MINQUE is  . Note that
. Note that  , where
, where ![{\displaystyle {\boldsymbol {\sigma }}=[\sigma _{1}^{2}\,\cdots \,\sigma _{k}^{2}]^{\top }}](./_assets_/e2a5d8816db964dfc571bab557ad2b3a66727830.svg) . Therefore, the estimator for the variance components is
. Therefore, the estimator for the variance components is  .
. 
 Extensions
 MINQUE estimators can be obtained without the invariance criteria, in which case the estimator is only unbiased and minimizes the norm.[2] Such estimators have slightly different constraints on the minimization problem. 
The model can be extended to estimate covariance components.[3] In such a model, the random effects of a component are assumed to have a common covariance structure ![{\displaystyle \mathbb {V} [{\boldsymbol {\xi }}_{i}]={\boldsymbol {\Sigma }}}](./_assets_/d540ba6be8b73893ec2f3e60d0baf7f31b7783e8.svg) . A MINQUE estimator for a mixture of variance and covariance components was also proposed.[3] In this model,
. A MINQUE estimator for a mixture of variance and covariance components was also proposed.[3] In this model, ![{\displaystyle \mathbb {V} [{\boldsymbol {\xi }}_{i}]={\boldsymbol {\Sigma }}}](./_assets_/d540ba6be8b73893ec2f3e60d0baf7f31b7783e8.svg) for
 for  and
 and ![{\displaystyle \mathbb {V} [{\boldsymbol {\xi }}_{i}]=\sigma _{i}^{2}\mathbf {I} _{c_{i}}}](./_assets_/9fabefa4e017ef43ab1a896ca7fbf13dd0a24d22.svg) for
 for  .
. 
 References