gower_matrix#

gower_matrix(data: ndarray, cat_features: array) ndarray[source]#

Calculate distance matrix for a dataset using Gower’s method.

Gowers distance is a measurement for distance between two samples. It returns the average of their distances per feature. For numeric features it calculates the absolute distance divide by the range of the feature. For categorical features it is an indicator whether the values are the same. See https://www.jstor.org/stable/2528823 for further details. In addition, it can deal with missing values. Note that this method is expensive in memory and requires keeping in memory a matrix of size data*data.

Parameters
data: numpy.ndarray

Dataset matrix.

cat_features: numpy.array

Boolean array of representing which of the columns are categorical features.

Returns
numpy.ndarray

representing the distance matrix.