machine learning - Python scikit learn multi-class multi-label performance metrics? -


i ran random forest classifier multi-class multi-label output variable. got below output.

my y_test values        degree  nature 762721       1       7                               548912       0       6 727126       1      12 14880        1      12 189505       1      12 657486       1      12 461004       1       0 31548        0       6 296674       1       7 121330       0      17   predicted output :  [[  1.   7.]  [  0.   6.]  [  1.  12.]  [  1.  12.]  [  1.  12.]  [  1.  12.]  [  1.   0.]  [  0.   6.]  [  1.   7.]  [  0.  17.]] 

now want check performance of classifier. found multiclass multilabel "hamming loss or jaccard_similarity_score" metrics. tried calculate getting value error.

error: valueerror: multiclass-multioutput not supported 

below line tried:

print hamming_loss(y_test, rf_predicted) print jaccard_similarity_score(y_test, rf_predicted) 

thanks,

to calculate unsupported hamming loss multiclass / multilabel, could:

import numpy np y_true = np.array([[1, 1], [2, 3]]) y_pred = np.array([[0, 1], [1, 2]]) np.sum(np.not_equal(y_true, y_pred))/float(y_true.size)  0.75 

you can confusion_matrix each of 2 labels so:

from sklearn.metrics import confusion_matrix, precision_score np.random.seed(42)  y_true = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).t  [[0 4]  [1 4]  [0 4]  [0 4]  [0 2]  [1 4]  [0 3]  [0 2]  [0 3]  [1 3]]  y_pred = np.vstack((np.random.randint(0, 2, 10), np.random.randint(2, 5, 10))).t  [[1 2]  [1 2]  [1 4]  [1 4]  [0 4]  [0 3]  [1 4]  [1 3]  [1 3]  [0 4]]  confusion_matrix(y_true[:, 0], y_pred[:, 0])  [[1 6]  [2 1]]  confusion_matrix(y_true[:, 1], y_pred[:, 1])  [[0 1 1]  [0 1 2]  [2 1 2]] 

you calculate precision_score (or recall_score in similiar way):

precision_score(y_true[:, 0], y_pred[:, 0])  0.142857142857 

Comments