Percentile: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>David E. Volk
(fixed 100(1-p)% to (100-p)%)
imported>Peter Schmitt
(added quintiles)
 
(22 intermediate revisions by 4 users not shown)
Line 1: Line 1:
{{subpages}}
{{subpages}}


==Definition==
'''Percentiles''' are statistical parameters which describe the distribution
In [[descriptive statistics]], using the '''percentile''' is a way of providing estimation of proportions of the data that should fall above and below a given value. The <math>p</math>th percentile is a value such that at most <math>(100p)%</math> of the observations are less than this value and that at most <math>(100-p)%</math> are greater.
of a (real) value in a population (or a sample).
Roughly speaking, the ''k''-th percentile separates the smallest ''k'' percent
of values from the largest (100-''k'') percent.


Thus:
Special percentiles are the [[median]] (50th percentile),
* The 1st percentile cuts off lowest 1% of data
the quartiles (25th and 75th percentile),
* The 98th percentile cuts off lowest 98% of data
the quintiles (20th, 40th, 60th and 80th percentile),
and the deciles (the ''k''-th decile is the (10''k'')-th percentile).
Percentiles are special cases of [[quantile]]s:
The ''k''-th percentile is the same as the (''k''/100)-quantile.


The 25th percentile is the first [[quartile]]; the 50th percentile is the [[median]].
== Definition ==


One definition is that the pth percentile of n ordered values is obtained by first calculating the rank <math> k = \frac{p(n+1)}{100} </math>, rounded to the nearest integer and then taking the value that corresponds to that rank.<ref>{{cite web
The value ''x'' is ''k''-th percentile (for a given ''k'' = 1,2,...,99) if
|url=http://www.mis.coventry.ac.uk/~nhunt/pottel.pdf
:    <math> P(\omega\le x) \ge {k\over100}    \textrm{\ \ and \ \ }
|title=Statistical flaws in Excel
            P(\omega\ge x) \ge 1-{k\over100} \, \textrm{.} </math>
|last=Pottel
|first=Hans
|accessdate=2006-03-22}}</ref> This formula only holds for large ''n'' (i.e. assuming that we are looking for the 95th percentile over 15 datas, we will have to find the 15,2th...).
One alternative method, used in many application, is to use a [[linear interpolation]] between the two nearest ranks instead of the rounding.


Linked with the percentile function, there is also a weighted percentile, where the percentage in the total weight is counted instead of the total number. In most spreadsheet applications there is no standard function for a weighted percentile.
In this definition, ''P'' is a probability distribution on the real numbers.
It may be obtained either
* from a (theoretical) probability measure (such as the [[normal distribution|normal]] or [[Poisson distribution]]), or
* from a finite population where it expresses the probability of a random element to have the property,<br>i.e., it is the relative frequency of elements with this property (number of elements with the property divided by the size of the population),or
* from a sample of size ''N'' where it also is the relative frequency which is used to estimate the corresponding percentile for the population from which it was taken.


==Examples==
== Special cases ==


Educational institutions (i.e. universities, schools...) frequently report admission test scores in terms of percentiles. For instance, assume that a candidate obtained 85 on her verbal test. The question is how did this student compared to all others students? If she is told that her score correspond to the 80th percentile, we know that approximately 80% of the other candidates scored lower than him and that approximately 20% of the students had higher score than her.
For most standard continuous distributions (like the [[normal distribution]]) the
''k''-th percentile ''x'' is uniquely determined by
:    <math> P(\omega\le x) = {k\over100}    \textrm{\ \ and \ \ }
            P(\omega\ge x) = 1-{k\over100}  </math>


When [[Internet Service Provider|ISPs]] bill [[Burstable billing |"Burstable" Internet bandwidth]], the 95th or 98th percentile usually cuts off the top 5% or 2% of bandwidth peaks in each month, and then bills at the nearest rate. In this way infrequent peaks are ignored, and the customer is charged in a fairer way.
In the general case (e.g., for discrete distributions, or for finite samples)
it may happen that the separating value has positive probability:
:    <math> P(\omega = x) > 0 \Rightarrow
            P(\omega\le x) > {k\over100}    \textrm{\ \ or \ \ }
            P(\omega\ge x) > 1-{k\over100}  </math>
or that there is a gap in the range of the variable such that, for two distinct
<math> x_1 < x_2 </math>, equality holds:
:    <math> P(\omega\le x_1) = {k\over100}    \textrm{\ \ and \ \ }
            P(\omega\ge x_2) = 1-{k\over100}  </math>
Then every value in the ([[closed interval|closed]]) interval between the smallest and the largest such value
: <math> \left [ \min \left\{ x \Bigl\vert P(\omega\le x) = {k\over100} \right\},
                \max \left\{ x \Bigl\vert P(\omega\ge x) = 1-{k\over100} \right\} \right]</math>
is a ''k''-th percentile.


Physicians will often use infant and children's [[weight and height percentile]] as a gauge of relative health.
== Examples ==


==See also==
The following examples illustrate this:


*[[Quantile]]
* Take a sample of 101 values, ordered according to their size:
*[[Quartile]]
::  <math> x_1 \le x_2 \le \dots \le x_{100} \le x_{101} </math>.
*[[Summary statistics]]
: Then the unique ''k''-th percentile is <math>x_{k+1}</math>.


The Persian equivalent is صدك
* If there are only 100 values
Compare with decile: دهك
::  <math> x_1 \le x_2 \le \dots \le x_{99} \le x_{100} </math>.
: Then any value between <math>x_k</math> and <math>x_{k+1}</math> is a ''k''-th percentile.


==References==
'''Example from the praxis:'''
<references />
<br>
 
Educational institutions (i.e. universities, schools...) frequently report admission test scores in terms of percentiles.
== External links ==
For instance, assume that a candidate obtained 85 on her verbal test.
* [http://www.wessa.net/perc.wasp Free Online Software (Calculator)] computes Percentiles for any dataset according to 8 different percentile definitions.
The question is: How did this student compared to all other students?
If she is told that her score correspond to the 80th percentile,
we know that approximately 80% of the other candidates scored lower than she
and that approximately 20% of the students had a higher score than she had.

Latest revision as of 08:41, 21 January 2010

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Percentiles are statistical parameters which describe the distribution of a (real) value in a population (or a sample). Roughly speaking, the k-th percentile separates the smallest k percent of values from the largest (100-k) percent.

Special percentiles are the median (50th percentile), the quartiles (25th and 75th percentile), the quintiles (20th, 40th, 60th and 80th percentile), and the deciles (the k-th decile is the (10k)-th percentile). Percentiles are special cases of quantiles: The k-th percentile is the same as the (k/100)-quantile.

Definition

The value x is k-th percentile (for a given k = 1,2,...,99) if

In this definition, P is a probability distribution on the real numbers. It may be obtained either

  • from a (theoretical) probability measure (such as the normal or Poisson distribution), or
  • from a finite population where it expresses the probability of a random element to have the property,
    i.e., it is the relative frequency of elements with this property (number of elements with the property divided by the size of the population),or
  • from a sample of size N where it also is the relative frequency which is used to estimate the corresponding percentile for the population from which it was taken.

Special cases

For most standard continuous distributions (like the normal distribution) the k-th percentile x is uniquely determined by

In the general case (e.g., for discrete distributions, or for finite samples) it may happen that the separating value has positive probability:

or that there is a gap in the range of the variable such that, for two distinct , equality holds:

Then every value in the (closed) interval between the smallest and the largest such value

is a k-th percentile.

Examples

The following examples illustrate this:

  • Take a sample of 101 values, ordered according to their size:
.
Then the unique k-th percentile is .
  • If there are only 100 values
.
Then any value between and is a k-th percentile.

Example from the praxis:
Educational institutions (i.e. universities, schools...) frequently report admission test scores in terms of percentiles. For instance, assume that a candidate obtained 85 on her verbal test. The question is: How did this student compared to all other students? If she is told that her score correspond to the 80th percentile, we know that approximately 80% of the other candidates scored lower than she and that approximately 20% of the students had a higher score than she had.