usually when dendrograms , heatmaps, use distance matrix , bunch of scipy
stuff. want try out seaborn
seaborn
wants data in rectangular form (rows=samples, cols=attributes, not distance matrix)?
i want use seaborn
backend compute dendrogram , tack on heatmap. possible? if not, can feature in future.
maybe there parameters can adjust can take distance matrix instead of rectangular matrix?
here's usage:
seaborn.clustermap¶ seaborn.clustermap(data, pivot_kws=none, method='average', metric='euclidean', z_score=none, standard_scale=none, figsize=none, cbar_kws=none, row_cluster=true, col_cluster=true, row_linkage=none, col_linkage=none, row_colors=none, col_colors=none, mask=none, **kwargs)
my code below:
from sklearn.datasets import load_iris iris = load_iris() x, y = iris.data, iris.target df = pd.dataframe(x, index = ["iris_%d" % (i) in range(x.shape[0])], columns = iris.feature_names)
i don't think method correct below because i'm giving precomputed distance matrix , not rectangular data matrix requests. there's no examples of how use correlation/distance matrix clustermap
there https://stanford.edu/~mwaskom/software/seaborn/examples/network_correlations.html ordering not clustered w/ plain sns.heatmap
func.
df_corr = df.t.corr() df_dism = 1 - df_corr sns.clustermap(df_dism)
you can pass precomputed distance matrix linkage clustermap()
:
import pandas pd, seaborn sns import scipy.spatial sp, scipy.cluster.hierarchy hc sklearn.datasets import load_iris sns.set(font="monospace") iris = load_iris() x, y = iris.data, iris.target df = pd.dataframe(x, index = ["iris_%d" % (i) in range(x.shape[0])], columns = iris.feature_names) df_corr = df.t.corr() df_dism = 1 - df_corr # distance matrix linkage = hc.linkage(sp.distance.squareform(df_dism), method='average') sns.clustermap(df_dism, row_linkage=linkage, col_linkage=linkage)
for clustermap(distance_matrix)
(i.e., without linkage passed), linkage calculated internally based on pairwise distances of rows , columns in distance matrix (see note below full details) instead of using elements of distance matrix directly (the correct solution). result, output different 1 in question:
note: if no row_linkage
passed clustermap()
, row linkage determined internally considering each row "point" (observation) , calculating pairwise distances between points. row dendrogram reflects row similarity. analogous col_linkage
, each column considered point. explanation should added docs. here docs's first example modified make internal linkage calculation explicit:
import seaborn sns; sns.set() import scipy.spatial sp, scipy.cluster.hierarchy hc flights = sns.load_dataset("flights") flights = flights.pivot("month", "year", "passengers") row_linkage, col_linkage = (hc.linkage(sp.distance.pdist(x), method='average') x in (flights.values, flights.values.t)) g = sns.clustermap(flights, row_linkage=row_linkage, col_linkage=col_linkage) # note: produces same plot "sns.clustermap(flights)", # clustermap() calculates row , column linkages internally
Comments
Post a Comment