gower_matrix#
- gower_matrix(data: ndarray, cat_features: array) ndarray [source]#
Calculate distance matrix for a dataset using Gower’s method.
Gowers distance is a measurement for distance between two samples. It returns the average of their distances per feature. For numeric features it calculates the absolute distance divide by the range of the feature. For categorical features it is an indicator whether the values are the same. See https://www.jstor.org/stable/2528823 for further details. In addition, it can deal with missing values. Note that this method is expensive in memory and requires keeping in memory a matrix of size data*data.
- Parameters
- data: numpy.ndarray
Dataset matrix.
- cat_features: numpy.array
Boolean array of representing which of the columns are categorical features.
- Returns
- numpy.ndarray
representing the distance matrix.