Misplaced Pages

Talk:Standard deviation: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 22:53, 2 November 2006 editCoppertwig (talk | contribs)Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers17,272 edits Derivation: added suggestions for improving this derivation← Previous edit Revision as of 23:19, 2 November 2006 edit undoCoppertwig (talk | contribs)Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers17,272 edits N-1 sentenceNext edit →
Line 326: Line 326:


Does anyone get the point of this: "''The necessity of the N − 1 (instead of N) can be rationalized if one realizes that the vector lies in an N − 1 dimensional space.''"? It is true that these N points lie in an N-1 dimensional space, but what does that have to do with the denominator? The reason for using N-1 is that the estimator becomes unbiased for the variance, which is already stated. ] 01:49, 30 October 2006 (UTC) Does anyone get the point of this: "''The necessity of the N − 1 (instead of N) can be rationalized if one realizes that the vector lies in an N − 1 dimensional space.''"? It is true that these N points lie in an N-1 dimensional space, but what does that have to do with the denominator? The reason for using N-1 is that the estimator becomes unbiased for the variance, which is already stated. ] 01:49, 30 October 2006 (UTC)

You're right, that's not a good explanation. Also I don't like the N-1 being referred to as a "convention" in the example with actual numbers: it's not just a "convention", there's a good
reason for it (as shown in the derivation on the discussion page). I suggest adding a plain-language
explanation for the reason for the N-1 rather than N, something like this: "If you knew the actual
mean of the population, you could estimate the standard deviation of the population by seeing how
much the sample values deviate from that mean. But you don't know the mean of the population; you
only know the sample mean, and if you use that as an estimate of the population mean, it will tend
to be slightly closer to each value in your sample, on average, since those are the values you calculated it from, so the standard deviation will seem to be smaller than it actually is.
Using N-1 rather than N corrects for this by making your estimate of the standard deviation a
little bigger again, and there is a proof that this is just the right amount, at least in the
sense of making sure the estimate of variance is unbiased. For example, suppose the population
mean is 100 and you take 3 samples which happen to be 99, 110 and 111. If you knew the
population mean you could estimate the standard deviation based on how these samples
differ from the mean, and get an estimate of sqrt(1-squared plus 10-squared plus 11-squared)
or about 8.6. But you don't know the population mean, so you would estimate the population
mean by taking the mean of your samples, approx. 106.7. But this number is quite a bit
closer to most of your samples -- after all, it's calculated from them. So for the standard
deviation you'd only get 5.43 if you use N rather than N-1 in the formula. In other words,
the sample values tend to be closer to the sample mean, on average, than they are to the
population mean." Maybe that's too long, just take the last sentence? Silverbranch 2006 Nov 2 23:18UT

Revision as of 23:19, 2 November 2006

History=

This should talk about where the concept of standard deviation came from as well (astronomy)

Accuracy of this article?

This paragraph is incorrect:

For example, in the population {4, 8}, the mean is 6 and the standard deviation is 2. This may be written: {4, 8} ≈ 6±2. In this case 100% of the values in the population are within one standard deviation of the mean.

The variance is 2, but the standard deviation is the square root of 2. I would update it myself, but I don't know how to embed all those nifty symbols.


AAAAAAAAAHHHHHHHHHHHHH Seriously - this is way too comlicated. I am trying to use SD in a school project and none of these formulas make sense. Sure, for a more advanced maths student or a mathematician, these formulas are great - but as is a problem with many maths and science articles on wikipedia, this is totaly inaccesibe to your average person. I suggest that in each page you have a simple explanation that is understandable. For example 1 - Determine the mean of a set of scores 2 - Determine the difference between each score snd the mean; this difference is called the deviation. (Score - Mean) 3 - Square each deviation 4 - Calculate the mean of the squared deviations 5 - This result is called the variance 6 - Standard deviation equals the square root of the variance




The opening paragraph states "it is a measure of the average difference between the values of the data in the set." Shouldn't this be "it is a measure of the average difference between the values of the data in the set from the mean?

Correct me if I'm wrong, but I always thought the std dev equation had (n-1) in the denominator, not n: where n is the number of data pts.

This is kind of explained in the article, but to put it differently: The standard deviation of a (finite or infinite) population is defined as the square root of the population variance. The variance of a finite population is defined as the sum of the squared mean deviations, divided by the population size. If one is trying to estimate the population variance based on a (small) sample, then it would be wrong in general to equate the sample with the population. In particular, estimating the population variance as the sum of the squared mean deviations divided by the sample size yields a biased estimator of the population variance. Dividing by n−1 instead of n yields an unbiased estimator. However, in the extreme case where n equals the population size (of a finite population), it would be wrong to divide by n−1, since one has seen the entire population, hence there is no uncertainty about its variance and one can simply calculate it instead of estimating it. It's the distinction between the variance of a finite population and its unbiased estimator that leads to the terms n and n−1, respectively, in the denominator (and to widespread confusion). --MarkSweep 07:23, 19 May 2005 (UTC)
The std dev example and its answer in the article were incorrect, as was the equation. I edited it to reflect that indeed the std dev equation should be (n-1) in the denominator. Therefore, now the equation, math and example answer are now correct. I'm not sure if the original creator of the example had it correct or not or whether it was edited. I'm new to using wikipedia, and it's somewhat suprising that material can be easily edited. Anyway, the example equation for std dev is currently correct (although I didn't take the time to go through the equation to derive std dev as the panel above the example did. I don't know whether it is correct, but I did edit it as well since its derived equation for std dev also did NOT include n-1 in the denominator). I'm no statistics GIANT LEMONS, but I know how to use excel and a calculator. The answer was incorrect but please correct me if I'm wrong. --renius 9 Sept 2005

I thought this deserved clarification, so I put some in there --Pdbailey 01:43, 16 September 2005 (UTC)


why dont u just use a online calculator??? :D it worked for me

But what does the result of the standard deviation tell us?

Why perform the equation? What do the results say? Kingturtle 07:43, 6 Nov 2003 (UTC)
  • To say it differently.....this article needs a section explaining the application of standard deviation. When is it used? And what can the results tell us? Kingturtle 06:19, 10 Feb 2004 (UTC)

sigma level

when you read a quote like this "To see a very light Higgs (say, 115 GeV) at the 5 σ level will require a year of running", does that mean that the signal is within 5 standard deviations or something? I hear that usage a lot in experimental physics, and I am unsure what it means. is a higher sigma confidence level good or bad? I once heard that things at the 3σ level are suspect, and one can't trust the result. So what's 5 σ? anyone can explain this? i think it would be a nice addition to this article... Lethe

If I understand correctly, such quotes mean "if we assume that this data is the result of random statistical fluctuation rather than an actual signal, this is how many sigmas away from the mean our results are." In other words, the sigma level gives a measure of how likely it is that you are just seeing a fluke. In your Higgs example, the meaning is that if they ran the detector for a year, a false detection would require a statistical fluctuation of five sigmas away from the mean. Higher sigma means more confidence, because it means your signal would be an increasingly large fluke if it weren't real. Isomorphic 03:54, 12 Jul 2004 (UTC)

Thanks for the explanation dude. I'm going to try adding it to the article. -Lethe

What about underlying assumptions?

This is quite a short and incomplete treatment of standard deviation.

It should be mentioned that a Normal distribution about the true mean is assumed. The article doesn't differentiate between the true mean and the sample mean, which in turn explains the difference between the true standard deviation and the sample standard deviation (and why the normalisation constant is n-1 and not n, i.e. the sample mean takes away one degree of freedom as it is computed from the data set and only estimates the true mean).

Nothing is said about confidence intervals, which is what standard deviations are useful for: 1-sigma is the 67% interval, i.e. 67% of the data is within 1 standard deviation, 2-sigma is 90%, 3-sigma is 95% and so on.

Again, the standard deviation can only be interpreted if the data is distributed normally about the mean. Before I add more criticism, let me have a look at the article(s) for mean and normal distributions.

The standard deviation exists for any population, and always has meaning regardless of whether the distribution is normal. The confidence interval interpretation you're giving only works for normal distributions, but nothing about the standard deviation itself assumes normality. Isomorphic 15:01, 26 Oct 2004 (UTC)
It is nonsense to say "a Normal distribution about the true mean is assumed" or that confidence intervals are the only thing that standard deviations are useful for, or that "the standard deviation can only be interpreted if the data is distributed normally about the mean". If you're doing certain things with confidence intervals, then "the standard deviation can only be interpreted if the data is distributed normally about the mean" if you're doing certain other things with confidence intervals, or many other things not involving confidence intervals but for which the standard deviation is useful, then normality should not be assumed. Michael Hardy 20:31, 26 Oct 2004 (UTC)

Making it easier to read

26 Oct 2004 : FROM

Simply put, the standard deviation tells us how far a typical member of a sample or population is from the mean value of that sample or population.

TO

Simply put, the standard deviation tells us how far a typical member of the population (or sample) is from the mean value of that population (or sample).

Interpretation and application

An anon just changed the 5 to a 4 in the sentence "For example, the three samples (0, 0, 14, 14), (0, 6, 8, 14), and (6, 6, 8, 8) each have an average of 7. Their standard deviations are 7, 5 and 1, respectively" as his first edit. I'm paranoid against vandalism, so is he right? -- Kizor 10:46, 21 May 2005 (UTC)

No he is not - I have just reverted it.--Niels Ø 14:19, May 21, 2005 (UTC)
Thank you, my good fellow! -- Kizor 19:24, 21 May 2005 (UTC)
It is indeed 5 and I can imagine why the anon changed it to 4: he simply calculated the average difference to the mean. Unfortunately this naive calculation delivers by coincidence the correct result for the first and the third data set! Maybe you better choose different sets. BTW: You write "For example, the three samples (0, 0, 14, 14), ..." but those are not samples but complete data sets. -- Steve Miller 31 Aug 2005

Always begin with the simplest example

The mean of two numbers, A and B, is (A+B)/2. The standard deviation is |A-B|/2. The mean is center and the standard deviation is radius. The clue for generalizing this definition to more than just two numbers is symmetry. The definitions are invariant against switching A and B: (A+B)/2=(B+A)/2 and |A-B|/2=|B-A|/2. Now, any symmetric function can be written in terms of the sums of powers: u=A^0+B^0, v=A^1+B^1, w=A^2+B^2, &c. The mean is simply (A+B)/2=v/u. The standard deviation is (sqrt(uw-v^2))/u, because |A-B|=sqrt((A-B)^2)=sqrt(A^2+B^2-2AB)=sqrt(2(A^2+B^2)-(A+B)^2)=sqrt(uw-v^2). The generalized definitions, mean=v/u, standard deviation =(sqrt(uw-v^2))/u, satisfy nice conditions: symmetry to permutation of the numbers by construction, homogeneity to multiplying all numbers by the same factor. Bo Jacoby 09:50, 9 September 2005 (UTC)

Unbiased SD estimate

The following comment was added to the article by 149.169.176.38 (talk · contribs). --MarkSweep 13:18, 14 September 2005 (UTC)

Comment: I think the formula for unbiased estimate of the standard deviation is:

s = 1 n 1.5 i = 1 n ( x i x ¯ ) 2 . {\displaystyle s={\sqrt {{\frac {1}{n-1.5}}\sum _{i=1}^{n}(x_{i}-{\overline {x}})^{2}}}.}

the (n-1) is unbiased for variance, I don't know how to derive this, but a statistician should be able to do so.

motivation for std deviation

There is a motivation for using this measure of dispersion instead of the mean deviation. Can somebody add this? --Pdbailey 01:41, 16 September 2005 (UTC)

I've wondered this myself, anyone considering adding this should check this link out: . Intangir 06:01, 24 November 2005 (UTC)

Thanks to Intangir for that link. It leads to a lengthy discussion, some of it is nonsense, much of it repetition, but it also contains some good points, of which I rephrase two below. So this is how I'd answer the question without becoming too technical:
The reason we use the standard deviation (squareroot of the mean of the squared deviations) rather than the mean deviation (mean of the absolute values of the deviations) is that the ensuing math is simpler and more useful if dispersion is measured by standard deviation (or by variance, the mean of the square deviations, i.e. the square of the standard deviation), instead of by mean deviation. The square function involved in the variance has nice mathematical properties, as compared to the absolute value function involved in the mean deviation.
A key example is this: Let A {\displaystyle A} and B {\displaystyle B} be independent random variates with unknown (or arbitrary) distributions, but with known dispersions. E.g., A {\displaystyle A} is the weight of a man, and B {\displaystyle B} is the weight of a woman. What is the dispersion of A + B {\displaystyle A+B} , i.e., of the total weight of a couple (assuming independence)?
  • If disperion is measured by mean deviation, this question cannot be answered in general.
  • If measured by variance, the answer is simple: V a r ( A + B ) = V a r ( A ) + V a r ( B ) {\displaystyle Var(A+B)=Var(A)+Var(B)} .
  • If measured by standard deviation, the answer is still fairly simple: σ A + B = σ A 2 + σ B 2 {\displaystyle \sigma _{A+B}={\sqrt {\sigma _{A}{}^{2}+\sigma _{B}{}^{2}}}} .
Similarly, if X 1 , X 2 , X 3 , , X n {\displaystyle X_{1},X_{2},X_{3},\ldots ,X_{n}} are independent variates, what is the dispersion of the mean of all the X i {\displaystyle X_{i}} 's, x ¯ = X 1 + X 2 + X 3 + + X n n {\displaystyle {\overline {x}}={\frac {X_{1}+X_{2}+X_{3}+\ldots +X_{n}}{n}}} ?
  • Mean deviation: Cannot be answered in general.
  • Variance: V a r ( x ¯ ) = V a r ( X 1 ) + V a r ( X 2 ) + V a r ( X 3 ) + + V a r ( X n ) n = V a r ( x ) ¯ {\displaystyle Var({\overline {x}})={\frac {Var(X_{1})+Var(X_{2})+Var(X_{3})+\ldots +Var(X_{n})}{n}}={\overline {Var(x)}}} .
The above is incorrect. You would need to divide by n. Michael Hardy 01:59, 21 March 2006 (UTC)
  • Standard deviation: σ x ¯ = σ 1 2 + σ 2 2 + σ 3 2 + + σ n 2 n {\displaystyle \sigma _{\overline {x}}={\sqrt {\frac {\sigma _{1}{}^{2}+\sigma _{2}{}^{2}+\sigma _{3}{}^{2}+\ldots +\sigma _{n}{}^{2}}{n}}}} .
...and the above is incorrect for the same reason. Michael Hardy 02:00, 21 March 2006 (UTC)
Here follows another, more technical, reason. We have hitherto assumed that deviations always are measured from the mean value, but if we allow ourselves to measure them from any other suggested central value, different values for our measures of dispersion will follow. The mean of the squared deviations is minimized by taking the central value to be the mean; the same is not the case for the mean deviation. Thus, the standard deviation is a natural companion to the mean. The mean deviation may be more naturally associated with the median.
--Niels Ø 13:03, 28 November 2005 (UTC)

Hello 140.110.227.89

What are you trying to do ? Your edits seem senseless to me, and some of them are wrong. (You cannot sum from 1 to N+1 when there are only N members in the set). Please explain your intentions here at the discussion page before editing the article. Otherwise it seems to be vandalism. Bo Jacoby 06:49, 20 October 2005 (UTC)

How to estimate the probability to be away from the mean.

We have Chebyshev's inequality if the distribution is unknown. In the article there is also info about that issue for the normal distribution.

I ask: how can we do the same for other distribution? Like Poisson distribution, binomial distribution, and so on... --Yochai Twitto 15:19, 31 December 2005 (UTC)

A mis-type mistake and N versus N-1

On the "Standard Deviation" article, I note that the last two equations in the first section are identical where, as I read it, the second is supposed to be a simplification of the first. I'm not sure what simplification the author intended, but I doubt it's right the way it is.

The discussion commenters seem to have confusion between standard deviation and standard *error*. The former is the tendency of a random process to deviate from its mean (average) while the latter is an *estimate* of the former from observed data. The standard deviation of a collection of observations is

             sqrt(sum of squared deviations/N) 

where N is the number in the entire set while the standard error is

             sqrt(sum of squared deviations/(N-1))

where N is the number of observations. In the standard-error case, N is presumed much smaller than the number of occurrances or potential occurances (this last is where I'm on soft ground but a real statistician could clear it up). In some mathematical-statistical sense, the other one of the N-1 explains the difference between the observed sample mean and the true distribution mean, if that helps.

The Adam 20:53, 27 January 2006 (UTC) The Adam

important detail

In order to obtain the results shown in the page, there is one detail missing, in the set, the seven is missing, the length of the set should be five, not four. i don't know how to use formulas, but i am pretty sure of this. try doing it with seven and the result you will obtain, if you do it the way it is, the deviation will be sqrt(10/3), instead of sqrt(10/4)

Shortcut?

Regarding the first heading, "Definition and shortcut calculation of standard deviation"... What does the shortcut refer to? This is unclear.

Also, the equation s = 1 N 1 i = 1 N ( x i x ¯ ) 2 {\displaystyle s={\sqrt {{\frac {1}{N-1}}\sum _{i=1}^{N}(x_{i}-{\overline {x}})^{2}}}} is repeated at the end of that section. Is it supposed to be different the second time?

And now the simple way without the formulas

Standard deviation can be calculated in five easy steps

1. Calculate the average value
2. Calculate the difference between each value and the average value
3. Calculate the square for each value you got
4. Add them together and then calculate the average of that number
5. Now get the square root out of that number and you are done

Simple enough


~ Booyabazooka 05:03, 27 February 2006 (UTC)

You are perfectly right. Go ahead and make improvements. Bo Jacoby 07:27, 27 February 2006 (UTC)
I applaud the "simple way" entry. I feel that Misplaced Pages has a problem with mathematics. If it is possible to throw formulas into a page, the math guys will do everything they can to replace all the English text with math formulas. In my opinion, it is a purposeful attempt to hide information from people who don't know what how to read the formulas. At least in this article, a person with barely a high school math background can read the "simple way" and calculate a standard deviation. Without this section, they would be left wondering what the hell those formulas mean and how to enter them into their spreadsheet. --Kainaw 15:40, 7 March 2006 (UTC)
Careful. Assume good faith. I find it highly doubtful that people are deliberately trying to 'hide' knowledge by rendering it in formulas. More likely, mathematically literate people are trying to make articles more clear, concise and specific by rendering long explanations as simple (to them) formulae. For example, the 'simple way' given above is vague and unclear because it simply says "average" without specifying which one (mean, mode, median). Also, there are some points (such as possibly averages) where the language conveys different mathematical concepts to different people. By contrast, the maths means only one thing. SAnd if you wish to remember how to calculate the SD, and do it repeatedly, it is much easier and quicker with the formulae. So by all means include simple explanations and methods, but remember great care is needed. It isn't as black-and-white as all that. 84.43.94.121 17:37, 15 March 2006 (UTC)
I assume good faith until I am told "take more math you idiot" when asking for an English description of a formula. For example, in Zipf's law, I was treated as adding a simple description in bad faith for regurgitating Zipf's own description of his law as a simple 1/f series (without tossing in a complicated formula). I understand the warm and fuzzy feeling a person gets when they can wrap up three or four paragraphs in a single formula. I won a $100 bet by writing a bubble sort in Java with one loop (instead of the 2 loops expected by the definition). But, there is not valid excuse in my mind for restricting access to knowledge by expecting a Misplaced Pages reader to be able to read any formula beyond high school math. --Kainaw 00:37, 21 March 2006 (UTC)
I understand that the so-called "simple" way will be easier to understand for those who do not know the mathematical notation. But I don't think it's any simpler than the other way, nor is it a different way; it's just expressed in a different language. For those who do know the notation, the form involving the notation is easier to read, since you don't have to plow through that vast and complicated collection of words. The notation is simpler, but not easier for those who don't know it. Michael Hardy 02:13, 21 March 2006 (UTC)
OK, I can see how "take more math you idiot" when asking for an English description of a formula would rile you up, and it wasn't appropriate behaviour of the other wikipedian. I do understand the need to include easily understandable explanations and methods for these things, but I really don't think it's true that people are trying to make it harder for others to access the information; more likely, they don't feel the average reader of the article knows as little maths as you feel they do. Whoever is right, this article now contains a simpler form of the method. Yay! Skittle 12:47, 31 March 2006 (UTC)

Re-inserted the simple part at the end. I took care of the vagueness which motivated the last edit. I argue that this section is not redundant, but merely explains the procedure of obtaining standard deviation for a variable(one of the most popular ways for small populations, albeit) in non-technical English.

A comment

Lord love us and save us! I've just put in a link to this article from the one on the Pyrometric cone, so I though I'd better have a look at it. Gentlemen please, do try and write something that a reasonable bright non-statistician might understand (the statisticians already know all about this stuff, think about your intended audience). Regards, Nick. Nick 12:24, 29 March 2006 (UTC) Question Please include a simple explanation that lets me know the importance of the standard deviation in regards to its relationship to zero. The article states that a number of more than zero increases the variance. How high does the number have to go before the vairiance is significant?

It depends on the data you're looking at; you can only really compare standard deviations of similar data sets. For example, if you are looking at the age of school children in a single class, a standard deviation of a year is usually very large. If you're looking at the population of a country, a standard deviation of a year is very small. I'll try and add something to the article if it isn't there. Skittle 12:44, 31 March 2006 (UTC)

Formulas in 'human' readable form

Not being a TeX user, I can't parse the formulas given in my head. It would be a great help if there were bitmaps, SVGs or some other rendering of the formulas in addition to the nice ones already there.

Date Error

Presumably, regarding Leonardo da Vinci, the date should read 1494 not 1894?

Derivation

Could someone kindly add a section about the derivation of the SD formulae? I'm particularly interested in the meaning of N-1. Thank you.

Here is a proof. I will assume E ( X ) = 0 {\displaystyle E(X)=0} for convenyance, the proof becomes somewhat more involved without this, but is essentially unchanged. Then, if the ith and jth draws are not correlated (but not necassarily independant), or more formally C o v ( x i , x j ) = δ ( i , j ) s i g m a 2 {\displaystyle Cov(x_{i},x_{j})=\delta (i,j)sigma^{2}}


E ( 1 n 1 i n ( x i x ¯ ) 2 ) {\displaystyle E\left({1 \over n-1}\sum _{i}^{n}(x_{i}-{\bar {x}})^{2}\right)} = 1 n 1 i n E ( ( x i x ¯ ) 2 ) {\displaystyle ={1 \over n-1}\sum _{i}^{n}E((x_{i}-{\bar {x}})^{2})}
= 1 n 1 i n E ( x i 2 ) 2 E ( x i x ¯ ) + E ( x ¯ 2 ) {\displaystyle ={1 \over n-1}\sum _{i}^{n}E(x_{i}^{2})-2E(x_{i}{\bar {x}})+E({\bar {x}}^{2})}
= 1 n 1 i n σ 2 2 n E ( j n x i x j ) + 1 n 2 E ( j n k n x j x k ) {\displaystyle ={1 \over n-1}\sum _{i}^{n}\sigma ^{2}-{2 \over n}E\left(\sum _{j}^{n}x_{i}x_{j}\right)+{1 \over n^{2}}E\left(\sum _{j}^{n}\sum _{k}^{n}x_{j}x_{k}\right)}
= 1 n 1 i n σ 2 2 n E ( x i 2 ) + 1 n 2 j n E ( x j 2 ) {\displaystyle ={1 \over n-1}\sum _{i}^{n}\sigma ^{2}-{2 \over n}E(x_{i}^{2})+{1 \over n^{2}}\sum _{j}^{n}E(x_{j}^{2})}
= 1 n 1 i n σ 2 2 n σ 2 + 1 n 2 j n σ 2 {\displaystyle ={1 \over n-1}\sum _{i}^{n}\sigma ^{2}-{2 \over n}\sigma ^{2}+{1 \over n^{2}}\sum _{j}^{n}\sigma ^{2}}
= 1 n 1 i n σ 2 2 n σ 2 + 1 n σ 2 {\displaystyle ={1 \over n-1}\sum _{i}^{n}\sigma ^{2}-{2 \over n}\sigma ^{2}+{1 \over n}\sigma ^{2}}
= 1 n 1 i n σ 2 1 n σ 2 {\displaystyle ={1 \over n-1}\sum _{i}^{n}\sigma ^{2}-{1 \over n}\sigma ^{2}}
= n σ 2 σ 2 n 1 {\displaystyle ={n\sigma ^{2}-\sigma ^{2} \over n-1}}
= n 1 n 1 σ 2 {\displaystyle ={n-1 \over n-1}\sigma ^{2}}
= σ 2 {\displaystyle =\sigma ^{2}}

Many of these steps used E ( x i x j ) = E ( x i x j ) E ( x i ) E ( x j ) = C o v ( x i , x j ) = 0 {\displaystyle E(x_{i}x_{j})=E(x_{i}x_{j})-E(x_{i})E(x_{j})=Cov(x_{i},x_{j})=0} . i would include this on the main page, but there isn't much of a prescident for including long proofs, perhaps at the end? and it would need to look nicer too. Note that this theorem was suggested to me by Michael Hardy on my talk page. Pdbailey 03:25, 1 June 2006 (UTC)

I would like to see this proof included. You would need to state clearly what you are proving. I think it needs some fixing up: in line 2, I think you need parentheses to indicate that the summation symbol applies to all three terms (I think); there may be other similar changes needed. The last few lines of the proof could possibly be shortened, but some of the earlier lines could be expanded to show more clearly how you're using that Cov=0 thingy you mention at the bottom; maybe a few English words to break up the derivation and explain things a bit, and/or including a term that works out to zero and then on the next line putting an actual zero where that term was. Where summation symbols are used, I prefer to see either parentheses that tell you that all the following terms are included in the summation, or parentheses that tell you they are not. Silverbranch 2006 Nov 2 22:47UT

reason for article for standard deviation and one for variance

Does anybody know a good reason to have separate articles for standard deviation and variance? Pdbailey 04:44, 14 June 2006 (UTC)

No good reason. Merging the articles is a good idea. Bo Jacoby 08:02, 11 July 2006 (UTC)

An axiomatic approach

I find this section pretty useless. Does anyone like it? McKay 16:26, 5 July 2006 (UTC)

There is otherwise no motivation for the complex formula defining the standard deviation. It is a nice fact that the mean value μ and the standard deviation σ is completely characterized by the simple algebraic properties a+(μ±σ) = (a+μ)±σ and a(μ±σ) = aμ±aσ , together with the symmetry condition and the initial condition (+1,−1) ≈ ±1 . But the section can certainly be improved. Bo Jacoby 07:50, 11 July 2006 (UTC)

notation confusion?

For the equations on this page the standard deviation is notated as S while on other wiki pages ( http://en.wikipedia.org/Bias_%28statistics%29 ) it is notated as S^2. I don't want to make any edits myself because I am not sure what the general notation for this is, but I believe that one of these pages may need to be changed. Also as a side note, a more in-depth explination as to how the standard deviation relates to the the gaussian curve (ie 68.27% in +/- 1 stdev) would be useful. 65.89.12.2 19:49, 11 July 2006 (UTC)

Statistics is for historical reasons a very messy branch of mathematics, unlike geometry which for historical reasons is a very clean branch of mathematics. Euclid was a greater mathematician than Ronald Fisher. That cannot be helped by a minor edit in wikipedia. The square σ of the standard deviation σ is called the variance. A statistical population is a multiset of numbers, and a statistical sample is a submultiset of the population. The mean and standard deviation of the population is called by the greek letters μ and σ, and the mean and standard deviation of the sample is then often called by the corresponding latin letters M and S. Deriving sample information from the population is called deductive reasoning, and deriving population information from the sample is called inductive reasoning or inferential statistics. The standard deviation of the gaussian curve is descibed in the normal distribution article. Bo Jacoby 08:20, 12 July 2006 (UTC)

Why This Methodology?

I have always wondered this about the standard deviation (I assume the derivation would answer this question): Why not use a formula that takes the average of the (absolute value of (the differences between the samples and the mean)). Taking the square root of a sum of squares does not "undo" the original squares - it introduces some factor of difference. (unsigned comment by 205.228.12.194)

You could use any norm really to measure dispersion. There is the convenience aspects of squaring, but only with this definition can we use the standard form . Pdbailey 13:58, 31 August 2006 (UTC)

Error?

The first example where (4,8) is the population with mean 6. The standard deviation is 2. Isn't one standard deviation 100% of this population, not two standard deviations? 6 +/- z(std. dev) where z=# of standard deviations.


There is a more glaring error in this section. For the population (4,8), the mean is indeed 6 but the standard deviation is root(2) and not 2.

2 is the variance of the distribution (4,8) and the standard deviation is defined as being the square root of the variance.

Is there an error in my calculation?
Var = 1/2 = 2^2
--Pdbailey 04:22, 22 September 2006 (UTC)

no error

The standard deviation of (4,8) is 2 as stated. The variance is 2^2=4 as computed above, and the standard deviation is the square root of the variance, which is =2. It is also correct that 100% of this population {4,8} is within one standard deviation from the mean value. This extreme case does not apply to every population. Bo Jacoby 16:04, 23 September 2006 (UTC)

World Record

I do not know why there is a reference to world record values in the Interpretation and application section, this comment does not seem appropriate, if this creates a problem with the example I would propose changing the text from "distances traveled by four athletes in 2 minutes" to "distances traveled by four athletes in 3 minutes" and dropping the reference to world records (I don't recall 1000m event, 1500m yes). Dcorrin 14:58, 6 October 2006 (UTC)

Example from larger population

Just above the heading "Interpretation and application" there is a comment for the gernalization to the entire population for changing N to 3 for the example, which seems simple enough, however the sum limit is also N, but the set is size 4, so which 3 values should be taken from the set? By strict formula we would exclude x4, however nothing was stated about the organization of the set which just happens to be in ascending order. So I see that there are two formulsa, which I didn't notice on first reading, so I would propose changing the text from "convention would replace the N (or 4) here with N−1 (or 3)." to "convention would replace the 1/N (or 1/4) with 1/(N-1) (or 1/3) giving a result of 1.8257." Dcorrin 15:03, 6 October 2006 (UTC)

N-1 sentence

Does anyone get the point of this: "The necessity of the N − 1 (instead of N) can be rationalized if one realizes that the vector lies in an N − 1 dimensional space."? It is true that these N points lie in an N-1 dimensional space, but what does that have to do with the denominator? The reason for using N-1 is that the estimator becomes unbiased for the variance, which is already stated. McKay 01:49, 30 October 2006 (UTC)

You're right, that's not a good explanation. Also I don't like the N-1 being referred to as a "convention" in the example with actual numbers: it's not just a "convention", there's a good reason for it (as shown in the derivation on the discussion page). I suggest adding a plain-language explanation for the reason for the N-1 rather than N, something like this: "If you knew the actual mean of the population, you could estimate the standard deviation of the population by seeing how much the sample values deviate from that mean. But you don't know the mean of the population; you only know the sample mean, and if you use that as an estimate of the population mean, it will tend to be slightly closer to each value in your sample, on average, since those are the values you calculated it from, so the standard deviation will seem to be smaller than it actually is. Using N-1 rather than N corrects for this by making your estimate of the standard deviation a little bigger again, and there is a proof that this is just the right amount, at least in the sense of making sure the estimate of variance is unbiased. For example, suppose the population mean is 100 and you take 3 samples which happen to be 99, 110 and 111. If you knew the population mean you could estimate the standard deviation based on how these samples differ from the mean, and get an estimate of sqrt(1-squared plus 10-squared plus 11-squared) or about 8.6. But you don't know the population mean, so you would estimate the population mean by taking the mean of your samples, approx. 106.7. But this number is quite a bit closer to most of your samples -- after all, it's calculated from them. So for the standard deviation you'd only get 5.43 if you use N rather than N-1 in the formula. In other words, the sample values tend to be closer to the sample mean, on average, than they are to the population mean." Maybe that's too long, just take the last sentence? Silverbranch 2006 Nov 2 23:18UT

Talk:Standard deviation: Difference between revisions Add topic