从统计量到不确定性原理

新用户注册 | 用户登陆 | 刷新
论坛嘉宾: sage

gauge


发表文章数: 596
内力值: 375/375
贡献度: 8310
人气: 1396

论坛嘉宾学术成员

从统计量到不确定性原理 [文章类型: 原创]

统计的主要目的是由一部分数据,即样本,来推断总体分布情况。简单的说,一般来说,数据总是来源于同一个概率分布的密度函数,而我们的任务是由数据推断这个分布。在做出推断时,我们总是要做一些假设。比如,身高的分布被假定为服从正态分布,粒子的放射服从Poisson分布等等。对于这些特别的分布,我们只需要确定这些分布的特征量即可。比如,对于正态分布,只需确定平均值和方差即可。而对于Poisson分布,确定方差和平均值都可以用来估计其中的待定参数。这一类统计推断通常称为参数的点估计。

在所有的点估计中,有一类特别重要的统计量,称为无偏估计。大概含义是指,将数据看作一个随机变量,当然这个随机变量服从某个待定的分布,不妨假定它来自某个参数族。如果统计量平均来说,也即是统计量的平均值--亦即数学期望--等于其本来的参数,那么这样的统计量即称为无偏估计量。

对统计量的优劣进行衡量是一个很重要的问题,虽然并没有完全统一的标准,但是无偏性是一个重要的原则。另一方面,方差也是一个极为重要的统计量。因而,我们可以讨论具有最小方差的无偏估计量。但是容易举出例子,对某些参数族,甚至并不存在无偏估计量。对于无偏估计量的方差的有一个估计,称为Cramer-Rao定理。这个定理是说,无偏估计量有一个几何下界。这个下界就是参数族的Fisher信息的倒数。如果,我们讨论的是随机向量的分布,那么,这个定理是说,参数族的协方差矩阵大于等于Fisher信息矩阵的逆。这里两个矩阵的大小关系的比较是指其差是半正定的。

对于量子系统,简单点,单个粒子的量子态由波函数u描述。众所周知,|u|^2表示一个概率密度。而Fisher信息,或者说沿着x方向的Fisher信息,自然而然的定义为$\int |du/dx|^2 dxdydz$,这和通常的概率密度的Fisher信息非常的类似。而此时Cramer-Rao不等式就成为Heisenberg不确定性关系。

从不等式的角度看,这并没有改进Heisenberg不等式。而我们知道,Heisenberg不等式可以搞得更加精细。但是,因为Cramer-Rao不等式的性质,使得我们能够对不确定性原理有更多的认识。前面我们提到过,Cramer-Rao不等式中的Fisher信息是一个有几何意义的量。而几何的最大的优点在于可以方便的得到不依赖于坐标系的量。对于量子体系的测量而言,我们可以如下解释这个几何性质。

设想我们改变我们的标尺,使得测量更加倾向于粒子出现的几率更大的区域,也就是说,在粒子出现的几率大的区域,我们的标尺选得小一些,这样可以使得其平均值也更加的集中,从而减小测量误差。当然,我们可以预料动量的不确定程度会因此而增大,而且使得动量和位置之间依然满足Heisenberg不等式。我们由Cramer-Rao不等式直接得到这个结果,也可以证明改变标尺并不改变Heisenberg不等式。这并不意味着Cramer-Rao不等式对不确定性原理是无用的。事实上,Cramer-Rao不等式告诉我们,不管我们使用那种计量方式都不可能突破Heisenberg的不等式给出的限制。而这并不能从Heisenberg不等式自身得到。因为在Heisenberg不等式中,我们只讨论了一种计量方式,就是对所有的对于位置或动量的测量数据取平均值。

当然,其实很多人都是这样认为的,即Heisenberg不等式给出了一个绝对的下限。但是实际上其中有着微妙的差别。用一个专门一点的说法就是,Heisenberg不等式不是在坐标变换下不变的。而只有不依赖于坐标系的量才是几何上有意义的。这里的坐标变换可以是弯曲的,或者说不一定是线性变换。

前面提到的Fisher信息实际上是Shannon信息或者熵的一个无穷小形式,说得更准确一点Fisher信息是指两个概率分布之间的信息的差异程度的无穷小形式。具体的定义可以参考任何一本信息论的书籍。当然也可以google.

发表时间: 2006-11-02, 11:20:10 个人资料

Omni


发表文章数: 280
内力值: 263/263
贡献度: 4868
人气: 688

论坛嘉宾学术成员

Re: 从统计量到不确定性原理 [文章类型: 原创]

You've touched an interesting topic, I'm more interested in the statistics part than the physics part. Here are some of my quick comments, I'm not very familiar with "Fisher Information", so I'll conduct some reading this weekend to fully understand what you wrote. It's a pity that you didn't mention Kullback-Leibler Distance (Divergence) and its mathematical relations to Shannon Entropy and Fisher Information.

>>数据总是来源于同一个概率分布的密度函数

I know your discussion is limited to parametric statistics here. These statements don't apply to the branch of nonparametric statistics. Plus your statement here is a probabilistic rather than statistical one, it's not rigorous to say that "data come from a probability density function (p.d.f.)".

>>而Fisher信息,或者说沿着x方向的Fisher信息,自然而然的定义为∫|dΨ/dx|^2 dxdydz,这和通常的概率密度的Fisher信息非常的类似。而此时Cramer-Rao不等式就成为Heisenberg不确定性关系。

I never thought about this connection, it's great to learn this from you. For people without statistics background, it's not easy to understand Fisher Information without first knowing the concept of "score":

http://en.wikipedia.org/wiki/Score_%28statistics%29

In statistics, the score is the partial derivative, with respect to some parameter set θ, of the logarithm (commonly the natural logarithm) of the likelihood function. Then the Fisher information is simply the variance of the score.

I didn't spend enough time on mathematical statistics before, your post provides me some motivation to dig a little deep on the theoretical side of statistical estimation. The Cramer-Rao inequality could be an interesting entry point for me to jump in.

For people interested in the link between statistics and information theory, I highly recommend the careful reading of the Wikipedia entry on "K-L Distance":

http://en.wikipedia.org/wiki/Kullback-Leibler_divergence

发表时间: 2006-11-02, 23:32:10 个人资料

gauge


发表文章数: 596
内力值: 375/375
贡献度: 8310
人气: 1396

论坛嘉宾学术成员

Re: 从统计量到不确定性原理 [文章类型: 原创]

Omni兄能否将wiki上与Kullback-Leibler divergence相关的内容拷过来?
在统计和信息论中,至少有不下20个divergence.我想其中最自然的肯定是Kullback-Leibler.比如,Kullback-Leibler在统计下表现很好。这一点对于Fisher信息也适用。比如热力学第二定律的一个数学上的证明可以这样叙述。首先,热力学体系可以看作一个Markov链或者Markov过程。然后可以证明给定时刻的Kullback-Leibler divergence是永远增加的。最后由信息论或者凸函数的性质可以知道,当体系达到均匀状态时具有最大熵。而Kullback-Leibler divergence是相对熵,它和绝对熵有一点差别。但是如果在其中作为比较标准的那一个概率分布为均匀分布时,则相对熵和绝对熵之间相差一个常数。由此合起来就得到热力学第二定律的证明。
另外,在统计理论的研究中,我想大量使用的也是Kullback-Leibler divergence.

发表时间: 2006-11-03, 21:11:03 个人资料

星空浩淼


发表文章数: 799
内力值: 423/423
贡献度: 8426
人气: 1826

客栈长老学术成员

Re: 从统计量到不确定性原理 [文章类型: 原创]

到底是学数学的,谈论到的概率统计大大超出我们理工科学生的程度

One may view the world with the p-eye and one may view it with the q-eye but if one opens both eyes simultaneously then one gets crazy

发表时间: 2006-11-03, 21:16:54 个人资料

Omni


发表文章数: 280
内力值: 263/263
贡献度: 4868
人气: 688

论坛嘉宾学术成员

Re: 从统计量到不确定性原理 [文章类型: 混合]

>>Omni兄能否将wiki上与Kullback-Leibler divergence相关的内容拷过来?

Sure. But there are too many formulae in the page as embedded PNG images, so I have to omit those parts with mathematical formulae. But I'm pretty sure you can fill in the math details yourself.:-)

http://en.wikipedia.org/wiki/Kullback-Leibler_divergence

Kullback–Leibler divergence

In probability theory and information theory, the Kullback–Leibler divergence (or information divergence, or information gain, or relative entropy) is a natural distance measure from a "true" probability distribution P to an arbitrary probability distribution Q. Typically P represents data, observations, or a precise calculated probability distribution. The measure Q typically represents a theory, a model, a description or an approximation of P.

It can be interpreted as the expected extra message-length per datum that must be communicated if a code that is optimal for a given (wrong) distribution Q is used, compared to using a code based on the true distribution P.

...

Motivation, properties and terminology

In information theory, the Kraft-McMillan theorem establishes that any directly-decodable coding scheme for coding a message to identify one value xi out of a set of possibilities X can be seen as representing an implicit probability distribution q(xi) = 2-li over X, where li is the length of the code for xi in bits.

...

Originally introduced by Solomon Kullback and Richard Leibler in 1951 as the directed divergence between two distributions, it is not the same as a divergence in calculus: the term "divergence" in the terminology should not be misinterpreted. One might be tempted to call it a "distance metric" on the space of probability distributions, but this would not be correct as the Kullback-Leibler divergence is not symmetric,

...

Following Renyi (1961), the term is sometimes also called the information gain about X achieved if P can be used instead of Q. It is also called the relative entropy, for using Q instead of P.

The Kullback–Leibler divergence remains well-defined for continuous distributions, and furthermore is invariant under parameter transformations. It can therefore be seen as in some ways a more fundamental quantity than some other properties in information theory (such as self-information or Shannon entropy), which can become undefined or negative for non-discrete probabilities.

Principle of minimum discrimination information

The idea of Kullback–Leibler divergence as discrimination information led Kullback to propose the Principle of Minimum Discrimination Information (MDI): given new facts, a new distribution f should be chosen which is as hard to discriminate from the original distribution f0 as possible; so that the new data produces as small an information gain DKL( f || f0 ) as possible.

...

MDI can be seen as an extension of Laplace's Principle of Insufficient Reason, and the Principle of Maximum Entropy of E.T. Jaynes. In particular, it is the natural extension of the principle of maximum entropy from discrete to continuous distributions, for which Shannon entropy ceases to be so useful (see differential entropy), but the KL divergence continues to be just as relevant.

Other probability-distance measures

Other measures of probability distance are the histogram intersection, χ2- statistic, quadratic form distance, match distance, Kolmogorov-Smirnov distance, and earth mover's distance (Rubner et al. 2000).

See also

Akaike information criterion
Deviance information criterion
Bayesian information criterion
Quantum relative entropy

References

Fuglede B, and Topsøe F., 2004, Jensen-Shannon Divergence and Hilbert Space Embedding, IEEE Int Sym Information Theory.

Kullback, S., and Leibler, R. A., 1951, On information and sufficiency, Annals of Mathematical Statistics 22: 79-86.

Rubner, Y., Tomasi, C., and Guibas, L. J., 2000. The Earth Mover's distance as a metric for image retrieval. International Journal of Computer Vision, 40(2): 99-121.

发表时间: 2006-11-04, 06:52:13 个人资料

gauge


发表文章数: 596
内力值: 375/375
贡献度: 8310
人气: 1396

论坛嘉宾学术成员

Re: 从统计量到不确定性原理 [文章类型: 原创]

信息论可以用来描述我们的知识量有多大。因而Heisenberg不确定性原理,除非也能够这样解释,否则总是不够完美,而恰好可以这样做。这使得我坚信Heisenberg原理是不可能被违背的。比如说当我们面对这样一个选择时:或者放弃光速不变,或者改变Heisenberg原理。那么我会毫不犹豫的改变光速。当然这个选择题仅仅是一个假想。

发表时间: 2006-11-12, 02:56:45 个人资料
您尚未登陆 | 用户登陆