Hierarchical clustering

From Citizendium, the Citizens' Compendium
Jump to: navigation, search
This article is a stub and thus not approved.
Main Article
Talk
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and not meant to be cited; by editing it you can help to improve it towards a future approved, citable version. These unapproved articles are subject to a disclaimer.

Hierarchical clustering (also known as numerical taxonomy) is a branch of cluster analysis[1] which treats clusters hierarchically, i.e., as a set of levels. The construction of the hierarchy can be performed using two major approaches, or combinations thereof: In agglomerative hierarchical clustering (a bottom-up approach), existing clusters are merged iteratively, while divisive hierarchical clustering (a top-down approach) starts out with all data in one cluster that is then split iteratively. At each step of the process, a mathematical measure of distance or similarity between (agglomerative) or within clusters (divisive) is being computed to determine how to split or merge.

Several different distance and similarity measures can be used, which generally result in different hierarchies (especially for agglomerative ones which start out based on local information only), thus complicating their interpretation. Nonetheless, hierarchical clustering is more intuitively understandable than flat clustering, and so it enjoys considerable popularity for multivariate analysis of data, e.g. of gene or protein sequences.

References and notes

  1. Hierarchical Clustering: Cluster Analysis.
    • "Cluster Analysis, also called data segmentation, has a variety of goals. All relate to grouping or segmenting a collection of objects (also called observations, individuals, cases, or data rows) into subsets or "clusters", such that those within each cluster are more closely related to one another than objects assigned to different clusters. Central to all of the goals of cluster analysis is the notion of degree of similarity (or dissimilarity) between the individual objects being clustered."