Hierarchical clustering

From Citizendium
Jump to navigation Jump to search
This article is a stub and thus not approved.
Main Article
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
This editable Main Article is under development and subject to a disclaimer.

Hierarchical clustering (also known as numerical taxonomy) is a branch of cluster analysis[1] which treats clusters hierarchically, i.e., as a set of levels. The construction of the hierarchy can be performed using two major approaches, or combinations thereof: In agglomerative hierarchical clustering (a bottom-up approach), existing clusters are merged iteratively, while divisive hierarchical clustering (a top-down approach) starts out with all data in one cluster that is then split iteratively. At each step of the process, a mathematical measure of distance or similarity between (agglomerative) or within clusters (divisive) is being computed to determine how to split or merge.

Several different distance and similarity measures can be used, which generally result in different hierarchies (especially for agglomerative ones which start out based on local information only), thus complicating their interpretation. Nonetheless, hierarchical clustering is more intuitively understandable than flat clustering, and so it enjoys considerable popularity for multivariate analysis of data, e.g. of gene or protein sequences.

References and notes

  1. Hierarchical Clustering: Cluster Analysis.
    • "Cluster Analysis, also called data segmentation, has a variety of goals. All relate to grouping or segmenting a collection of objects (also called observations, individuals, cases, or data rows) into subsets or "clusters", such that those within each cluster are more closely related to one another than objects assigned to different clusters. Central to all of the goals of cluster analysis is the notion of degree of similarity (or dissimilarity) between the individual objects being clustered."