Talk:Statistics theory

From Citizendium
Revision as of 07:20, 9 December 2007 by imported>Michael J. Formica (→‎Readability: new section)
Jump to navigation Jump to search
This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
Advanced [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition A branch of mathematics that specializes in enumeration, or counted, data and their relation to measured data. [d] [e]
Checklist and Archives
 Workgroup category mathematics [Please add or review categories]
 Talk Archive none  English language variant American English

Definition of a statistic

The modified sentence:

"More generally, a statistic can be any measure within a data sample. This would be some quantification of a random variable, or variables, of interest, such as a height, weight, polling results, test performance, and so on"

does not have the same meaning as the original

"More generally, a statistic can be any measurable function of the data samples, the latter being realizations of the random variables which are of interest such as the height of people, polling results, students' performance on a test, and so on."

In particular, a measure and a measurable function are not the same thing and the new sentence obfuscates the definition of a statistic. The point is that there is a precise definition of a statistic in mathematical statistics which is based on measure theoretic probability theory. For this purpose I provide a reference for this definition. An intuitive definition as given in the second paragraph of the article is fine as a gentle introduction, but it should also be complemented by a more rigorous mathematical definition.

I agree that my original sentence may not have been very readable, so to strike a compromise I combined the good parts of both sentences and produced what now appears in the article. Cheers, --Hendra I. Nurdin 17:25, 10 November 2007 (CST)

Outstanding edit! --Michael J. Formica 19:17, 10 November 2007 (CST)


"A data sample is regarded as instances of a random variable of interest..."
I think referring to "random variable" here narrows the focus a little too much.
Statistics is largely about extracting concise info from large piles of data. Sometimes, the data set is best described without reference to a numerical random variable, f.i. the fact that the most common 1st name in this or that town is "Billy" is a perfectly good statistic, ditto that "I" is the most commonly used word in English.
Ragnar Schroder 18:09, 8 December 2007 (CST)
It should be noted that a random variable need not be numerical, but of course numerics is important for quantitative analysis . For example, one can have a random variable X take values on the discrete set {'Billy', 'James', 'Agnes', 'Jill'} endowed with the discrete topology and then take the Borel set to be that generated by the open sets of that discrete topology. But ultimately this set can be mapped to a numerical value, e.g., by the 1-to-1 assignment 'Billy'->0, 'James'->1, 'Agnes'->2, 'Jill'->3.
I really have no idea how you would manage to extricate statistics from random variables and, more generally, probability theory, for what would then be the theoretical basis (if any) for explaining your data and justifying your methods? Are there examples of notions in statistics that cannot be given a firm footing with mathematical statistics? Hendra I. Nurdin 00:56, 9 December 2007
Not 'extricate', but rather 'deemphasize'. Rvs are just an ad hoc artifact of the mathematical model of the situation at hand - after all, not even coin-flipping has a unique a priori given random variable associated with it.
Like in your example above, there's an infinity of functions to choose from, with no formal reason to prefer one to the other.
Sometimes, like when the statistic in question is the population mode, they're not really called upon.
Of course, your point that one ultimately can't live without them is well taken.
Btw. thanks for informing me that rvs need not be numbers, I didnt realize that. I appreciate the enlightenment.
Ragnar Schroder 19:57, 8 December 2007 (CST)

Readability

Ragnar, Hendra: I am reading your discussion about random variables with much interest. I have a concern about the readability of the article, and I am wondering if we could address it. I have a Masters degree in Stats, and, yet, I am struggling with the language that we are using to present the initial concepts here. Both the NY and the London Times are written on a 5th grade (by American standards) reading level. Do you think we could tone the article down to be more readable? Blessings... --Michael J. Formica 06:20, 9 December 2007 (CST)