Citizendium - a community developing a quality, comprehensive compendium of knowledge, online and free.Click here to join and contribute |

CZ thanks our previous donors. Donate here. Treasurer's Financial Report |

# Recursive partitioning

In statistics, **recursive partitioning** is a method for the multivariable analysis of diagnostic tests.^{[1]}^{[2]}^{[3]}^{[4]}. Recursive partitioning creates a decision tree that strives to correctly classify members of the population based on a dichotomous dependent variable. Compared to other multivariable:

- Advantages are:
- Generates clinically more intuitive models that do not require the user to perform calculations.
^{[5]} - Allows "adjustable misclassification penalties" or misclassification costs in order to create a decision rule that has more sensitivity or specificity. This has also been called the "diversity index" which is the sum of the false-negatives and false-positives. Either the false-negatives and false-positives can be weighted in order to preferentially reduce their occurrence.
^{[6]}^{[7]}

- Generates clinically more intuitive models that do not require the user to perform calculations.

- Disadvantages are:
- Does not work well for continuous variables
^{[8]} - May overfit data.

- Does not work well for continuous variables

Studies comparing the predictive accuracy of recursive partitioning with logistic regression have reported no difference^{[5]}, recursive partitioning to be more accurate^{[9]} and logistic regression to be more accurate.^{[10]}

A variation is 'Cox linear recursive partitioning'.^{[6]}

Examples are available of using recursive partitioning in research of diagnostic tests.^{[10]}^{[11]}^{[12]}^{[13]}^{[14]}^{[15]} Goldman used recursive partitioning to prioritize sensitivity in the diagnosis of myocardial infarction among patients with chest pain in the emergency room.^{[14]}

Additional alternatives to recursive partitioning include latent class analysis.^{[16]}^{[17]}

## Calculations

### Unweighted analysis

An 'Initial misclassification rate' is calculated by assigning all cases to a single class. Using the example in the CART Manual, in trying to diagnose 59 patients who have a disease of interest among a sample of 189 patients, arbitrarily classifying all patients as normal yields:^{[18]}

**For the top node, also called node 1:**

**For the next nodes:**
The next best independent, classifier variable creates two nodes:

- Left node: 159 patients; 118 have disease and 41 are false positives who do not have the disease
- Right node: 30 patients; 12 have disease and false negatives

### Weighted analysis

In this analysis, the sensitivity is preferenced by weighted it by 2.2, which is the ratio of patients without disease over patients with disease.

**For the top node:**

**For the next nodes:**
The next best independent, classifier variable creates two nodes:

- Left node: 159 patients; 118 have disease and 41 are false positives who do not have the disease
- Right node: 30 patients; 12 have disease and false negatives

### Stopping rules

Various methods have been proposed to avoid overfitting of data.

- One method is to grow the tree as far as possible, then to 'prune' the tree. Pruning is based on a balance between misclassification costs and tree complexity.
^{[2]}^{[19]}

Another method is to grow the tree "until p<0.05 or until one or more of the marginal counts in the two-by-two table is less than 10."^{[5]}

## Software for recursive partitioning

Software available for recursive partitioning includes:

- CART
- R programming language:
- rpart package
- HSAUR2 interactive package with a chapter containing sample demonstrations, "Recursive Partitioning: Predicting Body Fat and Glaucoma Diagnosis" in "A Handbook of Statistical Analyses Using R".
^{[20]}

## References

- ↑ Breiman, Leo (1984).
*Classification and Regression Trees*. Boca Raton: Chapman & Hall/CRC. ISBN 0-412-04841-8. - ↑
^{2.0}^{2.1}Lewis RJ (2000). An introduction to classification and regression tree (CART) analysis. Accessed March 12, 2008 - ↑ Strobl C, Malley J, Tutz G (2009). "An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.".
*Psychol Methods***14**(4): 323-48. DOI:10.1037/a0016973. PMID 19968396. PMC PMC2927982. Research Blogging. - ↑ Yohannes Y, Hoddinott J (1999). Classification and regression trees: an introduction. Accessed March 12, 2008
- ↑
^{5.0}^{5.1}^{5.2}James KE, White RF, Kraemer HC (2005). "Repeated split sample validation to assess logistic regression and recursive partitioning: an application to the prediction of cognitive impairment".*Statistics in medicine***24**(19): 3019-35. DOI:10.1002/sim.2154. PMID 16149128. Research Blogging. - ↑
^{6.0}^{6.1}Cook EF, Goldman L (1984). "Empiric comparison of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis".*Journal of chronic diseases***37**(9-10): 721-31. PMID 6501544.^{[e]} - ↑ Nelson LM, Bloch DA, Longstreth WT, Shi H (March 1998). "Recursive partitioning for the identification of disease risk subgroups: a case-control study of subarachnoid hemorrhage".
*J Clin Epidemiol***51**(3): 199–209. PMID 9495685.^{[e]} - ↑ Lee JW, Um SH, Lee JB, Mun J, Cho H (2006). "Scoring and staging systems using cox linear regression modeling and recursive partitioning".
*Methods of information in medicine***45**(1): 37-43. PMID 16482368.^{[e]} - ↑ Kattan MW, Hess KR, Beck JR (1998). "Experiments to determine whether recursive partitioning (CART) or an artificial neural network overcomes theoretical limitations of Cox proportional hazards regression".
*Comput. Biomed. Res.***31**(5): 363-73. PMID 9790741.^{[e]} - ↑
^{10.0}^{10.1}Fonarow GC, Adams KF, Abraham WT, Yancy CW, Boscardin WJ (2005). "Risk stratification for in-hospital mortality in acutely decompensated heart failure: classification and regression tree analysis".*JAMA***293**(5): 572-80. DOI:10.1001/jama.293.5.572. PMID 15687312. Research Blogging. - ↑ Stiell IG, Wells GA, Vandemheen KL,
*et al*(2001). "The Canadian C-spine rule for radiography in alert and stable trauma patients".*JAMA***286**(15): 1841-8. PMID 11597285.^{[e]} - ↑ Haydel MJ, Preston CA, Mills TJ, Luber S, Blaudeau E, DeBlieux PM (2000). "Indications for computed tomography in patients with minor head injury".
*N. Engl. J. Med.***343**(2): 100-5. PMID 10891517.^{[e]} - ↑ Stiell IG, Greenberg GH, Wells GA,
*et al*(1996). "Prospective validation of a decision rule for the use of radiography in acute knee injuries".*JAMA***275**(8): 611-5. PMID 8594242.^{[e]} - ↑
^{14.0}^{14.1}Goldman L, Weinberg M, Weisberg M,*et al*(1982). "A computer-derived protocol to aid in the diagnosis of emergency room patients with acute chest pain".*N. Engl. J. Med.***307**(10): 588-96. PMID 7110205.^{[e]} - ↑ Heikes KE, Eddy DM, Arondekar B, Schlessinger L (May 2008). "Diabetes Risk Calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes".
*Diabetes Care***31**(5): 1040–5. DOI:10.2337/dc07-1150. PMID 18070993. Research Blogging. - ↑ McCutcheon AL. Latent Class Analysis. Beverly Hills, CA: Sage Publications; 1987.
- ↑ Schur EA, Afari N, Furberg H,
*et al*(2007). "Feeling bad in more ways than one: comorbidity patterns of medically unexplained and psychiatric conditions".*Journal of general internal medicine : official journal of the Society for Research and Education in Primary Care Internal Medicine***22**(6): 818–21. DOI:10.1007/s11606-007-0140-5. PMID 17503107. Research Blogging. - ↑ Steinberg, Dan and Phillip Colla. (1995).
*CART: Tree-Structured Non- Parametric Data Analysis*. San Diego, CA: Salford Systems, 39. - ↑ Yohannes Y, Hoddinott J (1999). Classification and regression trees: an introduction. Accessed March 12, 2008
- ↑ Torsten Hothorn; Everitt, Brian. CRAN - Package HSAUR, 2nd ed.