Misplaced Pages

Ball covariance: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 21:06, 6 November 2024 editOAbot (talk | contribs)Bots442,414 editsm Open access bot: arxiv updated in citation with #oabot.← Previous edit Revision as of 11:17, 27 December 2024 edit undo5.178.188.143 (talk) BackgroundTags: Reverted Visual editNext edit →
Line 5: Line 5:
Ball covariance uses permutation tests to calculate the p-value. This involves first computing the ball covariance for two sets of samples, then comparing this value with many permutation values. Ball covariance uses permutation tests to calculate the p-value. This involves first computing the ball covariance for two sets of samples, then comparing this value with many permutation values.
== Background == == Background ==
Correlation, as a fundamental concept of dependence in statistics, has been extensively developed in Hilbert spaces, exemplified by Pearson correlation coefficient,<ref>{{Cite journal |date=1895-12-31 |title=VII. Note on regression and inheritance in the case of two parents |url=http://dx.doi.org/10.1098/rspl.1895.0041 |journal=Proceedings of the Royal Society of London |volume=58 |issue=347–352 |pages=240–242 |doi=10.1098/rspl.1895.0041 |issn=0370-1662}}</ref> Spearman correlation coefficient,<ref>{{Cite journal |last=Spearman |first=C. |date=January 1904 |title=The Proof and Measurement of Association between Two Things |url=http://dx.doi.org/10.2307/1412159 |journal=The American Journal of Psychology |volume=15 |issue=1 |pages=72–101 |doi=10.2307/1412159 |jstor=1412159 |issn=0002-9556}}</ref> and Hoeffding's dependence measure.<ref>{{Cite journal |last=Hoeffding |first=Wassily |date=1948 |title=A Non-Parametric Test of Independence |url=https://www.jstor.org/stable/2236021 |journal=The Annals of Mathematical Statistics |volume=19 |issue=4 |pages=546–557 |doi=10.1214/aoms/1177730150 |jstor=2236021 |issn=0003-4851}}</ref> However, with the advancement of time, many fields require the measurement of dependence or independence between complex objects, such as in medical imaging, computational biology, and computer vision. Examples of complex objects include Grassmann manifolds, planar shapes, tree-structured data, matrix Lie groups, deformation fields, symmetric positive definite (SPD) matrices, and shape representations of cortical and subcortical structures. These complex objects mostly exist in non-Hilbert spaces and are inherently nonlinear and high-dimensional (or even infinite-dimensional). Traditional statistical techniques, developed in Hilbert spaces, may not be directly applicable to such complex objects. Therefore, analyzing objects that may reside in non-Hilbert spaces poses significant mathematical and computational challenges. Correlation, as a fundamental concept of dependence in statistics, has been extensively developed in Hilbert spaces, exemplified by ],<ref>{{Cite journal |date=1895-12-31 |title=VII. Note on regression and inheritance in the case of two parents |url=http://dx.doi.org/10.1098/rspl.1895.0041 |journal=Proceedings of the Royal Society of London |volume=58 |issue=347–352 |pages=240–242 |doi=10.1098/rspl.1895.0041 |issn=0370-1662}}</ref> ],<ref>{{Cite Q|Q50368932}}</ref> and ].<ref>{{Cite Q|Q100715828}}</ref> However, with the advancement of time, many fields require the measurement of dependence or independence between complex objects, such as in medical imaging, computational biology, and computer vision. Examples of complex objects include Grassmann manifolds, planar shapes, tree-structured data, matrix Lie groups, deformation fields, symmetric positive definite (SPD) matrices, and shape representations of cortical and subcortical structures. These complex objects mostly exist in non-Hilbert spaces and are inherently nonlinear and high-dimensional (or even infinite-dimensional). Traditional statistical techniques, developed in Hilbert spaces, may not be directly applicable to such complex objects. Therefore, analyzing objects that may reside in non-Hilbert spaces poses significant mathematical and computational challenges.


Previously, a groundbreaking work in metric space independence tests was the distance covariance in metric spaces proposed by Lyons (2013).<ref name=":0" /> This statistic equals zero if and only if random variables are independent, provided the metric space is of strong negative type. However, testing the independence of random variables in spaces that do not meet the strong negative type condition requires new explorations. Previously, a groundbreaking work in metric space independence tests was the distance covariance in metric spaces proposed by Lyons (2013).<ref name=":0" /> This statistic equals zero if and only if random variables are independent, provided the metric space is of strong negative type. However, testing the independence of random variables in spaces that do not meet the strong negative type condition requires new explorations.

Revision as of 11:17, 27 December 2024

This article is an orphan, as no other articles link to it. Please introduce links to this page from related articles; try the Find link tool for suggestions. (September 2024)
Nonparametric independence test methods

Ball covariance is a statistical measure that can be used to test the independence of two random variables defined on metric spaces. The ball covariance is zero if and only if two random variables are independent, making it a good measure of correlation. Its significant contribution lies in proposing an alternative measure of independence in metric spaces. Prior to this, distance covariance in metric spaces could only detect independence for distance types with strong negative type. However, ball covariance can determine independence for any distance measure.

Ball covariance uses permutation tests to calculate the p-value. This involves first computing the ball covariance for two sets of samples, then comparing this value with many permutation values.

Background

Correlation, as a fundamental concept of dependence in statistics, has been extensively developed in Hilbert spaces, exemplified by Pearson correlation coefficient, Spearman's correlation coefficient, and Hoeffding's dependence measure. However, with the advancement of time, many fields require the measurement of dependence or independence between complex objects, such as in medical imaging, computational biology, and computer vision. Examples of complex objects include Grassmann manifolds, planar shapes, tree-structured data, matrix Lie groups, deformation fields, symmetric positive definite (SPD) matrices, and shape representations of cortical and subcortical structures. These complex objects mostly exist in non-Hilbert spaces and are inherently nonlinear and high-dimensional (or even infinite-dimensional). Traditional statistical techniques, developed in Hilbert spaces, may not be directly applicable to such complex objects. Therefore, analyzing objects that may reside in non-Hilbert spaces poses significant mathematical and computational challenges.

Previously, a groundbreaking work in metric space independence tests was the distance covariance in metric spaces proposed by Lyons (2013). This statistic equals zero if and only if random variables are independent, provided the metric space is of strong negative type. However, testing the independence of random variables in spaces that do not meet the strong negative type condition requires new explorations.

Definition

Ball covariance

Next, we will introduce ball covariance in detail, starting with the definition of a ball. Suppose two Banach spaces: ( X , ρ ) {\displaystyle (\mathbf {X} ,\rho )} and ( Y , ζ ) {\displaystyle (\mathbf {Y} ,\zeta )} , where the norms ρ {\displaystyle \rho } and ζ {\displaystyle \zeta } also represent their induced distances. Let θ {\displaystyle \theta } be a Borel probability measure on X × Y , μ , ν {\displaystyle \mathbf {X} \times \mathbf {Y} ,\mu ,\nu } be two Borel probability measures on X , Y {\displaystyle \mathbf {X} ,\mathbf {Y} } , and ( X , Y ) {\displaystyle (X,Y)} be a B {\displaystyle B} -valued random variable defined on a probability space such that ( X , Y ) θ , X μ {\displaystyle (X,Y)\sim \theta ,X\sim \mu } , and Y ν {\displaystyle Y\sim \nu } . Denote the closed ball with the center x 1 {\displaystyle x_{1}} and the radius ρ ( x 1 , x 2 ) {\displaystyle \rho \left(x_{1},x_{2}\right)} in X {\displaystyle \mathbf {X} } as B ¯ ( x 1 , ρ ( x 1 , x 2 ) ) {\displaystyle {\bar {B}}\left(x_{1},\rho \left(x_{1},x_{2}\right)\right)} or B ¯ ρ ( x 1 , x 2 ) {\displaystyle {\bar {B}}_{\rho }\left(x_{1},x_{2}\right)} , and the closed ball with the center y 1 {\displaystyle y_{1}} and the radius ζ ( y 1 , y 2 ) {\displaystyle \zeta \left(y_{1},y_{2}\right)} in Y {\displaystyle \mathbf {Y} } as B ¯ ( y 1 , ζ ( y 1 , y 2 ) ) {\displaystyle {\bar {B}}\left(y_{1},\zeta \left(y_{1},y_{2}\right)\right)} or B ¯ ζ ( y 1 , y 2 ) {\displaystyle {\bar {B}}_{\zeta }\left(y_{1},y_{2}\right)} . Let { W i = ( X i , Y i ) , i = 1 , 2 , } {\displaystyle \left\{W_{i}=\left(X_{i},Y_{i}\right),i=1,2,\ldots \right\}} be an infinite sequence of iid samples of ( X , Y ) {\displaystyle (X,Y)} , and ω = ( ω 1 , ω 2 ) {\displaystyle \omega =\left(\omega _{1},\omega _{2}\right)} be the positive weight function on the support set of θ {\displaystyle \theta } . Then, the population ball covariance can be defined as follows:

BCov ω 2 ( X , Y ) = [ θ μ ν ] 2 ( B ¯ ρ ( x 1 , x 2 ) × B ¯ ζ ( y 1 , y 2 ) ) ω 1 ( x 1 , x 2 ) ω 2 ( y 1 , y 2 ) θ ( d x 1 , d y 1 ) θ ( d x 2 , d y 2 ) {\displaystyle \operatorname {BCov} _{\omega }^{2}(X,Y)=\int ^{2}\left({\bar {B}}_{\rho }\left(x_{1},x_{2}\right)\times {\bar {B}}_{\zeta }\left(y_{1},y_{2}\right)\right)\omega _{1}\left(x_{1},x_{2}\right)\omega _{2}\left(y_{1},y_{2}\right)\theta \left(dx_{1},dy_{1}\right)\theta \left(dx_{2},dy_{2}\right)}

where [ θ μ ν ] 2 ( A × B ) := [ θ ( A × B ) μ ( A ) v ( B ) ] {\displaystyle ^{2}(A\times B):=} for A X {\displaystyle A\in \mathbf {X} } and B Y {\displaystyle B\in \mathbf {Y} } .

Next, we will introduce another form of population ball covariance. Suppose δ i j , k X := I ( X k B ¯ ρ ( X i , X j ) ) {\displaystyle \delta _{ij,k}^{X}:=I\left(X_{k}\in {\bar {B}}_{\rho }\left(X_{i},X_{j}\right)\right)} which indicates whether X k {\displaystyle X_{k}} is located in the closed ball B ¯ ρ ( X i , X j ) {\displaystyle {\bar {B}}_{\rho }\left(X_{i},X_{j}\right)} . Then, let δ i j , k l X = δ i j , k X δ i j , l X {\displaystyle \delta _{ij,kl}^{X}=\delta _{ij,k}^{X}\delta _{ij,l}^{X}} means whether both X k {\displaystyle X_{k}} and X l {\displaystyle X_{l}} is located in B ¯ ρ ( X i , X j ) {\displaystyle {\bar {B}}_{\rho }\left(X_{i},X_{j}\right)} , and ξ i j , k l s t X = ( δ i j , k l X + δ i j , s t X δ i j , k s X δ i j , l t X ) / 2 {\displaystyle \xi _{ij,klst}^{X}=\left(\delta _{ij,kl}^{X}+\delta _{ij,st}^{X}-\delta _{ij,ks}^{X}-\delta _{ij,lt}^{X}\right)/2} . So does δ i j , k Y {\displaystyle \delta _{ij,k}^{Y}} , δ i j , k l Y {\displaystyle \delta _{ij,kl}^{Y}} and ξ i j , k l s t Y {\displaystyle \xi _{ij,klst}^{Y}} for Y {\displaystyle Y} . Then, let ( X i , Y i ) {\displaystyle (X_{i},Y_{i})} , i = 1 , 2 , , 6 {\displaystyle i=1,2,\dots ,6} be iid samples from θ {\displaystyle \theta } . Another form of population ball covariance can be shown as

BCov ω 2 ( X , Y ) = E { ξ 12 , 3456 X ξ 12 , 3456 Y ω 1 ( X 1 , X 2 ) ω 2 ( Y 1 , Y 2 ) } {\displaystyle \operatorname {BCov} _{\omega }^{2}(X,Y)=E\left\{\xi _{12,3456}^{X}\xi _{12,3456}^{Y}\omega _{1}\left(X_{1},X_{2}\right)\omega _{2}\left(Y_{1},Y_{2}\right)\right\}}

Now, we can finally express the sample ball covariance. Consider the random sample ( X , Y = X k , Y k , k = 1 , , n ) {\displaystyle (\mathbf {X} ,\mathbf {Y} ={X_{k},Y_{k}},k=1,\ldots ,n)} . Let ω ^ 1 , n {\displaystyle {\hat {\omega }}_{1,n}} and ω ^ 2 , n {\displaystyle {\hat {\omega }}_{2,n}} be the estimate of ω 1 {\displaystyle {\omega }_{1}} and ω 2 {\displaystyle {\omega }_{2}} . Denote Δ i j , n X Y = 1 n k = 1 n δ i j , k X δ i j , k Y , Δ i j , n X = 1 n k = 1 n δ i j , k X , Δ i j , n Y = 1 n k = 1 n δ i j , k Y , {\displaystyle \Delta _{ij,n}^{XY}={\frac {1}{n}}\sum _{k=1}^{n}\delta _{ij,k}^{X}\delta _{ij,k}^{Y},\Delta _{ij,n}^{X}={\frac {1}{n}}\sum _{k=1}^{n}\delta _{ij,k}^{X},\Delta _{ij,n}^{Y}={\frac {1}{n}}\sum _{k=1}^{n}\delta _{ij,k}^{Y},} the sample ball covariance is B C o v ω , n 2 ( X , Y ) := 1 n 2 i , j = 1 n ( Δ i j , n X Y Δ i j , n X Δ i j , n Y ) 2 × ω ^ 1 , n ( X i , X j ) ω ^ 2 , n ( Y i , Y j ) . {\displaystyle \mathbf {BCov} _{\omega ,n}^{2}(\mathbf {X} ,\mathbf {Y} ):={\frac {1}{n^{2}}}\sum _{i,j=1}^{n}\left(\Delta _{ij,n}^{XY}-\Delta _{ij,n}^{X}\Delta _{ij,n}^{Y}\right)^{2}\times {\hat {\omega }}_{1,n}\left(X_{i},X_{j}\right){\hat {\omega }}_{2,n}\left(Y_{i},Y_{j}\right).}

Ball correlation

Just like the relationship between the Pearson correlation coefficient and covariance, we can define the ball correlation coefficient through ball covariance. The ball correlation is defined as the square root of

BCor ω 2 ( X , Y ) := BCov ω 2 ( X , Y ) / B C o v ω 2 ( X ) B C o v ω 2 ( Y ) , {\displaystyle \operatorname {BCor} _{\omega }^{2}(X,Y):=\operatorname {BCov} _{\omega }^{2}(X,Y)/{\sqrt {\mathbf {BCov} _{\omega }^{2}(X)\mathbf {BCov} _{\omega }^{2}(Y)}},}

where B C o v ω 2 ( X ) = B C o v ω 2 ( X , X ) = E ( ξ 12 , 3456 X ω 1 ( X 1 , X 2 ) ) 2 , {\displaystyle \mathbf {BCov} _{\omega }^{2}(X)=\mathbf {BCov} _{\omega }^{2}(X,X)=E\left(\xi _{12,3456}^{X}\omega _{1}\left(X_{1},X_{2}\right)\right)^{2},} and B C o v ω 2 ( Y ) = B C o v ω 2 ( Y , Y ) = E ( ξ 12 , 3456 Y ω 1 ( Y 1 , Y 2 ) ) 2 . {\displaystyle \mathbf {BCov} _{\omega }^{2}(Y)=\mathbf {BCov} _{\omega }^{2}(Y,Y)=E\left(\xi _{12,3456}^{Y}\omega _{1}\left(Y_{1},Y_{2}\right)\right)^{2}.} And the sample ball correlation is defined similarly, BCor ω , n 2 ( X , Y ) := BCov ω , n 2 ( X , Y ) / B C o v ω , n 2 ( X ) B C o v ω , n 2 ( Y ) , {\displaystyle \operatorname {BCor} _{\omega ,n}^{2}(X,Y):=\operatorname {BCov} _{\omega ,n}^{2}(X,Y)/{\sqrt {\mathbf {BCov} _{\omega ,n}^{2}(X)\mathbf {BCov} _{\omega ,n}^{2}(Y)}},} where B C o v ω , n 2 ( X ) = B C o v ω , n 2 ( X , X ) , {\displaystyle \mathbf {BCov} _{\omega ,n}^{2}(X)=\mathbf {BCov} _{\omega ,n}^{2}(X,X),} and B C o v ω , n 2 ( Y ) = B C o v ω , n 2 ( Y , Y ) . {\displaystyle \mathbf {BCov} _{\omega ,n}^{2}(Y)=\mathbf {BCov} _{\omega ,n}^{2}(Y,Y).}

Properties

1.Independence-zero equivalence property: Let S θ {\displaystyle S_{\theta }} , S μ {\displaystyle S_{\mu }} and S ν {\displaystyle S_{\nu }} denote the support sets of θ {\displaystyle \theta } , μ {\displaystyle \mu } and ν {\displaystyle \nu } , respectively. BCov ω ( X , Y ) = 0 {\displaystyle \operatorname {BCov} _{\omega }(X,Y)=0} implies θ = μ ν {\displaystyle \theta =\mu \otimes \nu } if one of the following conditions establish:

(a). X × Y {\displaystyle \mathbf {X} \times \mathbf {Y} } is a finite dimensional Banach space with S θ = S μ × S ν {\displaystyle S_{\theta }=S_{\mu }\times S_{\nu }} .

(b). θ = a 1 θ d + a 2 θ a {\displaystyle \theta =a_{1}\theta _{d}+a_{2}\theta _{a}} , where a 1 {\displaystyle a_{1}} and a 2 {\displaystyle a_{2}} are positive constants, θ d {\displaystyle \theta _{d}} is a discrete measure, and θ a {\displaystyle \theta _{a}} is an absolutely continuous measure with a continues Radon–Nikodym derivative with respect to the Gaussian measure.

2.Cauchy–Schwarz type inequality: BCov ω 2 ( X , Y ) BCov ω ( X ) BCov ω ( X ) {\displaystyle \operatorname {BCov} _{\omega }^{2}(X,Y)\leq \operatorname {BCov} _{\omega }(X)\operatorname {BCov} _{\omega }(X)}

3.Consistence: If ω ^ 1 , n {\displaystyle {\hat {\omega }}_{1,n}} and ω ^ 2 , n {\displaystyle {\hat {\omega }}_{2,n}} uniformly converge ω 1 {\displaystyle {\omega }_{1}} and ω 2 {\displaystyle {\omega }_{2}} with E ( ω 1 ω 2 ) < {\displaystyle E(\omega _{1}\omega _{2})<\infty } respectively, we have BCov ω , n ( X , Y ) a . s . n BCov ω ( X , Y ) {\displaystyle \operatorname {BCov} _{\omega ,n}(\mathbf {X} ,\mathbf {Y} ){\underset {n\rightarrow \infty }{\stackrel {a.s.}{\longrightarrow }}}\operatorname {BCov} _{\omega }(X,Y)} and BCor ω , n ( X , Y ) a . s . n BCor ω ( X , Y ) {\displaystyle \operatorname {BCor} _{\omega ,n}(\mathbf {X} ,\mathbf {Y} ){\underset {n\rightarrow \infty }{\stackrel {a.s.}{\longrightarrow }}}\operatorname {BCor} _{\omega }(X,Y)} .

4.Asymptotics: If ω ^ 1 , n {\displaystyle {\hat {\omega }}_{1,n}} and ω ^ 2 , n {\displaystyle {\hat {\omega }}_{2,n}} uniformly converge ω 1 {\displaystyle {\omega }_{1}} and ω 2 {\displaystyle {\omega }_{2}} with E ( ω 1 ω 2 ) < {\displaystyle E(\omega _{1}\omega _{2})<\infty } respectively, (a)under the null hypothesis, we have n B C o v ω , n 2 ( X , Y ) n d v = 1 λ v Z v 2 {\displaystyle n\mathbf {BCov} {}_{\omega ,n}^{2}(\mathbf {X} ,\mathbf {Y} ){\xrightarrow{d}}\sum _{v=1}^{\infty }\lambda _{v}Z_{v}^{2}} , where Z v {\displaystyle Z_{v}} are independent standard normal random variables.

(b)under the alternative hypothesis, we have n ( B C o v ω , n 2 ( X , Y ) B C o v ω 2 ( X , Y ) ) n d N ( 0 , Σ ) {\displaystyle {\sqrt {n}}\left(\mathbf {BCov} _{\omega ,n}^{2}(\mathbf {X} ,\mathbf {Y} )-\mathbf {BCov} _{\omega }^{2}(X,Y)\right){\xrightarrow{d}}N(0,\Sigma )} .

References

  1. Pan, Wenliang; Wang, Xueqin; Zhang, Heping; Zhu, Hongtu; Zhu, Jin (2019-04-11). "Ball Covariance: A Generic Measure of Dependence in Banach Space". Journal of the American Statistical Association. 115 (529): 307–317. doi:10.1080/01621459.2018.1543600. ISSN 0162-1459. PMC 7720858. PMID 33299261.
  2. ^ Lyons, Russell (2013-09-01). "Distance covariance in metric spaces". The Annals of Probability. 41 (5). arXiv:1106.5758. doi:10.1214/12-AOP803. ISSN 0091-1798.
  3. "VII. Note on regression and inheritance in the case of two parents". Proceedings of the Royal Society of London. 58 (347–352): 240–242. 1895-12-31. doi:10.1098/rspl.1895.0041. ISSN 0370-1662.
  4. C. Spearman (January 1904). "The Proof and Measurement of Association between Two Things" (PDF). American Journal of Psychology. 15 (1): 72–101. doi:10.2307/1412159. ISSN 0002-9556. JSTOR 1412159. Wikidata Q50368932.
  5. Wassily Hoeffding (December 1948). "A Non-Parametric Test of Independence". Annals of Mathematical Statistics. 19 (4): 546–557. doi:10.1214/AOMS/1177730150. ISSN 0003-4851. JSTOR 2236021. MR 0029139. Zbl 0032.42001. Wikidata Q100715828.
Category:
Ball covariance: Difference between revisions Add topic