Euclidean(欧几里得度量)
欧几里得度量(euclidean metric)(也称欧氏距离)是一个通常采用的距离定义,指在m维空间中两个点之间的真实距离,或者向量的自然长度(即该点到原点的距离)。在二维和三维空间中的欧氏距离就是两点之间的实际距离。
n维空间的公式
闵氏距离又叫做闵可夫斯基距离,是欧氏空间中的一种测度,被看做是欧氏距离的一种推广,欧氏距离是闵可夫斯基距离的一种特殊情况。
定义式:
闵可夫斯基距离公式中,当 时,即为欧氏距离;当p=1时,即为曼哈顿距离;当
时,即为切比雪夫距离。
余弦相似度
余弦相似性通过测量两个向量的夹角的余弦值来度量它们之间的相似性。
卡方距离(Chi-square measure):
The chi squared distance d(x,y) is, as you already know, a distance between two histograms x=[x_1,..,x_n] and y=[y_1,...,y_n] having n bins both. Moreover, both histograms are normalized, i.e. their entries sum up to one.
The distance measure d is usually defined (although alternative definitions exist) as d(x,y) = sum( (xi-yi)^2 / (xi+yi) ) / 2 . It is often used in computer vision to compute distances between some bag-of-visual-word representations of images.
The name of the distance is derived from Pearson's chi squared test statistic X²(x,y) = sum( (xi-yi)^2 / xi) for comparing discrete probability distributions (i.e histograms). However, unlike the test statistic, d(x,y) is symmetric wrt. x and y, which is often useful in practice, e.g., when you want to construct a kernel out of the histogram distances.
利用列联表分析的方法得到一个卡方统计量来衡量两个体之间的差异性
- Euclidean: Arguably the most well known and must used distance metric. The euclidean distance is normally described as the distance between two points “as the crow flies”.
- Manhattan: Also called “Cityblock” distance. Imagine yourself in a taxicab taking turns along the city blocks until you reach your destination.
- Chebyshev: The maximum distance between points in any single dimension.
- Cosine: We won’t be using this similarity function as much until we get into the vector space model, tf-idf weighting, and high dimensional positive spaces, but the Cosine similarity function is extremely important. It is worth noting that the Cosine similarity function is not a proper distance metric — it violates both the triangle inequality and the coincidence axiom.
- Hamming: Given two (normally binary) vectors, the Hamming distance measures the number of “disagreements” between the two vectors. Two identical vectors would have zero disagreements, and thus perfect similarity.