Misplaced Pages

Matrix t-distribution

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Matrix t-distribution" – news · newspapers · books · scholar · JSTOR (April 2016) (Learn how and when to remove this message)
Matrix t
Notation T n , p ( ν , M , Σ , Ω ) {\displaystyle {\rm {T}}_{n,p}(\nu ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})}
Parameters

M {\displaystyle \mathbf {M} } location (real n × p {\displaystyle n\times p} matrix)
Ω {\displaystyle {\boldsymbol {\Omega }}} scale (positive-definite real p × p {\displaystyle p\times p} matrix)
Σ {\displaystyle {\boldsymbol {\Sigma }}} scale (positive-definite real n × n {\displaystyle n\times n} matrix)

ν > 0 {\displaystyle \nu >0} degrees of freedom (real)
Support X R n × p {\displaystyle \mathbf {X} \in \mathbb {R} ^{n\times p}}
PDF

Γ p ( ν + n + p 1 2 ) ( π ) n p 2 Γ p ( ν + p 1 2 ) | Ω | n 2 | Σ | p 2 {\displaystyle {\frac {\Gamma _{p}\left({\frac {\nu +n+p-1}{2}}\right)}{(\pi )^{\frac {np}{2}}\Gamma _{p}\left({\frac {\nu +p-1}{2}}\right)}}|{\boldsymbol {\Omega }}|^{-{\frac {n}{2}}}|{\boldsymbol {\Sigma }}|^{-{\frac {p}{2}}}}

× | I n + Σ 1 ( X M ) Ω 1 ( X M ) T | ν + n + p 1 2 {\displaystyle \times \left|\mathbf {I} _{n}+{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right|^{-{\frac {\nu +n+p-1}{2}}}}
CDF No analytic expression
Mean M {\displaystyle \mathbf {M} } if ν > 1 {\displaystyle \nu >1} , else undefined
Mode M {\displaystyle \mathbf {M} }
Variance c o v ( v e c ( X ) ) = Σ Ω ν 2 {\displaystyle \mathrm {cov} (\mathrm {vec} (\mathbf {X} ))={\frac {{\boldsymbol {\Sigma }}\otimes {\boldsymbol {\Omega }}}{\nu -2}}} if ν > 2 {\displaystyle \nu >2} , else undefined
CF see below

In statistics, the matrix t-distribution (or matrix variate t-distribution) is the generalization of the multivariate t-distribution from vectors to matrices.

The matrix t-distribution shares the same relationship with the multivariate t-distribution that the matrix normal distribution shares with the multivariate normal distribution: If the matrix has only one row, or only one column, the distributions become equivalent to the corresponding (vector-)multivariate distribution. The matrix t-distribution is the compound distribution that results from an infinite mixture of a matrix normal distribution with an inverse Wishart distribution placed over either of its covariance matrices, and the multivariate t-distribution can be generated in a similar way.

In a Bayesian analysis of a multivariate linear regression model based on the matrix normal distribution, the matrix t-distribution is the posterior predictive distribution.

Definition

For a matrix t-distribution, the probability density function at the point X {\displaystyle \mathbf {X} } of an n × p {\displaystyle n\times p} space is

f ( X ; ν , M , Σ , Ω ) = K × | I n + Σ 1 ( X M ) Ω 1 ( X M ) T | ν + n + p 1 2 , {\displaystyle f(\mathbf {X} ;\nu ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})=K\times \left|\mathbf {I} _{n}+{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right|^{-{\frac {\nu +n+p-1}{2}}},}

where the constant of integration K is given by

K = Γ p ( ν + n + p 1 2 ) ( π ) n p 2 Γ p ( ν + p 1 2 ) | Ω | n 2 | Σ | p 2 . {\displaystyle K={\frac {\Gamma _{p}\left({\frac {\nu +n+p-1}{2}}\right)}{(\pi )^{\frac {np}{2}}\Gamma _{p}\left({\frac {\nu +p-1}{2}}\right)}}|{\boldsymbol {\Omega }}|^{-{\frac {n}{2}}}|{\boldsymbol {\Sigma }}|^{-{\frac {p}{2}}}.}

Here Γ p {\displaystyle \Gamma _{p}} is the multivariate gamma function.

Properties

If X T n × p ( ν , M , Σ , Ω ) {\displaystyle \mathbf {X} \sim {\mathcal {T}}_{n\times p}(\nu ,\mathbf {M} ,\mathbf {\Sigma } ,\mathbf {\Omega } )} , then we have the following properties:

Expected values

The mean, or expected value is, if ν > 1 {\displaystyle \nu >1} :

E [ X ] = M {\displaystyle E=\mathbf {M} }

and we have the following second-order expectations, if ν > 2 {\displaystyle \nu >2} :

E [ ( X M ) ( X M ) T ] = Σ tr ( Ω ) ν 2 {\displaystyle E={\frac {\mathbf {\Sigma } \operatorname {tr} (\mathbf {\Omega } )}{\nu -2}}}
E [ ( X M ) T ( X M ) ] = Ω tr ( Σ ) ν 2 {\displaystyle E={\frac {\mathbf {\Omega } \operatorname {tr} (\mathbf {\Sigma } )}{\nu -2}}}

where tr {\displaystyle \operatorname {tr} } denotes trace.

More generally, for appropriately dimensioned matrices A,B,C:

E [ ( X M ) A ( X M ) T ] = Σ tr ( A T Ω ) ν 2 E [ ( X M ) T B ( X M ) ] = Ω tr ( B T Σ ) ν 2 E [ ( X M ) C ( X M ) ] = Σ C T Ω ν 2 {\displaystyle {\begin{aligned}E&={\frac {\mathbf {\Sigma } \operatorname {tr} (\mathbf {A} ^{T}\mathbf {\Omega } )}{\nu -2}}\\E&={\frac {\mathbf {\Omega } \operatorname {tr} (\mathbf {B} ^{T}\mathbf {\Sigma } )}{\nu -2}}\\E&={\frac {\mathbf {\Sigma } \mathbf {C} ^{T}\mathbf {\Omega } }{\nu -2}}\end{aligned}}}

Transformation

Transpose transform:

X T T p × n ( ν , M T , Ω , Σ ) {\displaystyle \mathbf {X} ^{T}\sim {\mathcal {T}}_{p\times n}(\nu ,\mathbf {M} ^{T},\mathbf {\Omega } ,\mathbf {\Sigma } )}

Linear transform: let A (r-by-n), be of full rank r ≤ n and B (p-by-s), be of full rank s ≤ p, then:

A X B T r × s ( ν , A M B , A Σ A T , B T Ω B ) {\displaystyle \mathbf {AXB} \sim {\mathcal {T}}_{r\times s}(\nu ,\mathbf {AMB} ,\mathbf {A\Sigma A} ^{T},\mathbf {B} ^{T}\mathbf {\Omega B} )}

The characteristic function and various other properties can be derived from the re-parameterised formulation (see below).

Re-parameterized matrix t-distribution

Re-parameterized matrix t
Notation T n , p ( α , β , M , Σ , Ω ) {\displaystyle {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})}
Parameters

M {\displaystyle \mathbf {M} } location (real n × p {\displaystyle n\times p} matrix)
Ω {\displaystyle {\boldsymbol {\Omega }}} scale (positive-definite real p × p {\displaystyle p\times p} matrix)
Σ {\displaystyle {\boldsymbol {\Sigma }}} scale (positive-definite real n × n {\displaystyle n\times n} matrix)
α > ( p 1 ) / 2 {\displaystyle \alpha >(p-1)/2} shape parameter

β > 0 {\displaystyle \beta >0} scale parameter
Support X R n × p {\displaystyle \mathbf {X} \in \mathbb {R} ^{n\times p}}
PDF

Γ p ( α + n / 2 ) ( 2 π / β ) n p 2 Γ p ( α ) | Ω | n 2 | Σ | p 2 {\displaystyle {\frac {\Gamma _{p}(\alpha +n/2)}{(2\pi /\beta )^{\frac {np}{2}}\Gamma _{p}(\alpha )}}|{\boldsymbol {\Omega }}|^{-{\frac {n}{2}}}|{\boldsymbol {\Sigma }}|^{-{\frac {p}{2}}}}

× | I n + β 2 Σ 1 ( X M ) Ω 1 ( X M ) T | ( α + n / 2 ) {\displaystyle \times \left|\mathbf {I} _{n}+{\frac {\beta }{2}}{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right|^{-(\alpha +n/2)}}
CDF No analytic expression
Mean M {\displaystyle \mathbf {M} } if α > p / 2 {\displaystyle \alpha >p/2} , else undefined
Variance 2 ( Σ Ω ) β ( 2 α p 1 ) {\displaystyle {\frac {2({\boldsymbol {\Sigma }}\otimes {\boldsymbol {\Omega }})}{\beta (2\alpha -p-1)}}} if α > ( p + 1 ) / 2 {\displaystyle \alpha >(p+1)/2} , else undefined
CF see below

An alternative parameterisation of the matrix t-distribution uses two parameters α {\displaystyle \alpha } and β {\displaystyle \beta } in place of ν {\displaystyle \nu } .

This formulation reduces to the standard matrix t-distribution with β = 2 , α = ν + p 1 2 . {\displaystyle \beta =2,\alpha ={\frac {\nu +p-1}{2}}.}

This formulation of the matrix t-distribution can be derived as the compound distribution that results from an infinite mixture of a matrix normal distribution with an inverse multivariate gamma distribution placed over either of its covariance matrices.

Properties

If X T n , p ( α , β , M , Σ , Ω ) {\displaystyle \mathbf {X} \sim {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})} then

X T T p , n ( α , β , M T , Ω , Σ ) . {\displaystyle \mathbf {X} ^{\rm {T}}\sim {\rm {T}}_{p,n}(\alpha ,\beta ,\mathbf {M} ^{\rm {T}},{\boldsymbol {\Omega }},{\boldsymbol {\Sigma }}).}

The property above comes from Sylvester's determinant theorem:

det ( I n + β 2 Σ 1 ( X M ) Ω 1 ( X M ) T ) = {\displaystyle \det \left(\mathbf {I} _{n}+{\frac {\beta }{2}}{\boldsymbol {\Sigma }}^{-1}(\mathbf {X} -\mathbf {M} ){\boldsymbol {\Omega }}^{-1}(\mathbf {X} -\mathbf {M} )^{\rm {T}}\right)=}
det ( I p + β 2 Ω 1 ( X T M T ) Σ 1 ( X T M T ) T ) . {\displaystyle \det \left(\mathbf {I} _{p}+{\frac {\beta }{2}}{\boldsymbol {\Omega }}^{-1}(\mathbf {X} ^{\rm {T}}-\mathbf {M} ^{\rm {T}}){\boldsymbol {\Sigma }}^{-1}(\mathbf {X} ^{\rm {T}}-\mathbf {M} ^{\rm {T}})^{\rm {T}}\right).}

If X T n , p ( α , β , M , Σ , Ω ) {\displaystyle \mathbf {X} \sim {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {M} ,{\boldsymbol {\Sigma }},{\boldsymbol {\Omega }})} and A ( n × n ) {\displaystyle \mathbf {A} (n\times n)} and B ( p × p ) {\displaystyle \mathbf {B} (p\times p)} are nonsingular matrices then

A X B T n , p ( α , β , A M B , A Σ A T , B T Ω B ) . {\displaystyle \mathbf {AXB} \sim {\rm {T}}_{n,p}(\alpha ,\beta ,\mathbf {AMB} ,\mathbf {A} {\boldsymbol {\Sigma }}\mathbf {A} ^{\rm {T}},\mathbf {B} ^{\rm {T}}{\boldsymbol {\Omega }}\mathbf {B} ).}

The characteristic function is

ϕ T ( Z ) = exp ( t r ( i Z M ) ) | Ω | α Γ p ( α ) ( 2 β ) α p | Z Σ Z | α B α ( 1 2 β Z Σ Z Ω ) , {\displaystyle \phi _{T}(\mathbf {Z} )={\frac {\exp({\rm {tr}}(i\mathbf {Z} '\mathbf {M} ))|{\boldsymbol {\Omega }}|^{\alpha }}{\Gamma _{p}(\alpha )(2\beta )^{\alpha p}}}|\mathbf {Z} '{\boldsymbol {\Sigma }}\mathbf {Z} |^{\alpha }B_{\alpha }\left({\frac {1}{2\beta }}\mathbf {Z} '{\boldsymbol {\Sigma }}\mathbf {Z} {\boldsymbol {\Omega }}\right),}

where

B δ ( W Z ) = | W | δ S > 0 exp ( t r ( S W S 1 Z ) ) | S | δ 1 2 ( p + 1 ) d S , {\displaystyle B_{\delta }(\mathbf {WZ} )=|\mathbf {W} |^{-\delta }\int _{\mathbf {S} >0}\exp \left({\rm {tr}}(-\mathbf {SW} -\mathbf {S^{-1}Z} )\right)|\mathbf {S} |^{-\delta -{\frac {1}{2}}(p+1)}d\mathbf {S} ,}

and where B δ {\displaystyle B_{\delta }} is the type-two Bessel function of Herz of a matrix argument.

See also

Notes

  1. ^ Zhu, Shenghuo and Kai Yu and Yihong Gong (2007). "Predictive Matrix-Variate t Models." In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, NIPS '07: Advances in Neural Information Processing Systems 20, pages 1721–1728. MIT Press, Cambridge, MA, 2008. The notation is changed a bit in this article for consistency with the matrix normal distribution article.
  2. ^ Gupta, Arjun K and Nagar, Daya K (1999). Matrix variate distributions. CRC Press. pp. Chapter 4.{{cite book}}: CS1 maint: multiple names: authors list (link)
  3. ^ Iranmanesh, Anis, M. Arashi and S. M. M. Tabatabaey (2010). "On Conditional Applications of Matrix Variate Normal Distribution". Iranian Journal of Mathematical Sciences and Informatics, 5:2, pp. 33–43.

External links


Probability distributions (list)
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)
Directional
Univariate (circular) directional
Circular uniform
Univariate von Mises
Wrapped normal
Wrapped Cauchy
Wrapped exponential
Wrapped asymmetric Laplace
Wrapped Lévy
Bivariate (spherical)
Kent
Bivariate (toroidal)
Bivariate von Mises
Multivariate
von Mises–Fisher
Bingham
Degenerate
and singular
Degenerate
Dirac delta function
Singular
Cantor
Families
Categories:
Matrix t-distribution Add topic