python - How to give sns.clustermap a precomputed distance matrix? -


usually when dendrograms , heatmaps, use distance matrix , bunch of scipy stuff. want try out seaborn seaborn wants data in rectangular form (rows=samples, cols=attributes, not distance matrix)?

i want use seaborn backend compute dendrogram , tack on heatmap. possible? if not, can feature in future.

maybe there parameters can adjust can take distance matrix instead of rectangular matrix?

here's usage:

seaborn.clustermap¶ seaborn.clustermap(data, pivot_kws=none, method='average', metric='euclidean',  z_score=none, standard_scale=none, figsize=none, cbar_kws=none, row_cluster=true,  col_cluster=true, row_linkage=none, col_linkage=none, row_colors=none,  col_colors=none, mask=none, **kwargs) 

my code below:

from sklearn.datasets import load_iris iris = load_iris() x, y = iris.data, iris.target df = pd.dataframe(x, index = ["iris_%d" % (i) in range(x.shape[0])], columns = iris.feature_names) 

enter image description here

i don't think method correct below because i'm giving precomputed distance matrix , not rectangular data matrix requests. there's no examples of how use correlation/distance matrix clustermap there https://stanford.edu/~mwaskom/software/seaborn/examples/network_correlations.html ordering not clustered w/ plain sns.heatmap func.

df_corr = df.t.corr() df_dism = 1 - df_corr sns.clustermap(df_dism) 

enter image description here

you can pass precomputed distance matrix linkage clustermap():

import pandas pd, seaborn sns import scipy.spatial sp, scipy.cluster.hierarchy hc sklearn.datasets import load_iris sns.set(font="monospace")  iris = load_iris() x, y = iris.data, iris.target df = pd.dataframe(x, index = ["iris_%d" % (i) in range(x.shape[0])], columns = iris.feature_names)  df_corr = df.t.corr() df_dism = 1 - df_corr   # distance matrix linkage = hc.linkage(sp.distance.squareform(df_dism), method='average') sns.clustermap(df_dism, row_linkage=linkage, col_linkage=linkage) 

for clustermap(distance_matrix) (i.e., without linkage passed), linkage calculated internally based on pairwise distances of rows , columns in distance matrix (see note below full details) instead of using elements of distance matrix directly (the correct solution). result, output different 1 in question: clustermap

note: if no row_linkage passed clustermap(), row linkage determined internally considering each row "point" (observation) , calculating pairwise distances between points. row dendrogram reflects row similarity. analogous col_linkage, each column considered point. explanation should added docs. here docs's first example modified make internal linkage calculation explicit:

import seaborn sns; sns.set() import scipy.spatial sp, scipy.cluster.hierarchy hc flights = sns.load_dataset("flights") flights = flights.pivot("month", "year", "passengers") row_linkage, col_linkage = (hc.linkage(sp.distance.pdist(x), method='average')   x in (flights.values, flights.values.t)) g = sns.clustermap(flights, row_linkage=row_linkage, col_linkage=col_linkage)    # note: produces same plot "sns.clustermap(flights)",   #  clustermap() calculates row , column linkages internally 

Comments