ibm watson cognitive - IBM Bluemix - Visual Recognition. Why low scores? -


i’m using visual recognition service on ibm bluemix.

i have created classifiers, in particular 2 of these objective:

  • first: “generic” classifier has return score of confidence recognition of particular object in image. i’ve trained 50 positive examples of object, , 50 negative examples of similar object (details of it, components, images alike etc.).
  • second: more specific classifier recognize particular type of object identified before, if score of first classification quite high. new classifier has been trained first one: 50 positive examples of type object, 50 negative examples of type b object. second categorization should more specific first one, because images more detailed , similar among them.

the result 2 classifiers work well, , expected results of particular set of images correspond truth in cases, , should mean both have been trained.

but there thing don’t understand.

in both classifiers, if try classify 1 of images have been used in positive training set, expectation confidence score should near 90-100%. instead, obtain score included in range between 0.50 , 0.55. same thing happens when try image similar 1 of positive training set (scaled, reflected, cut out etc.): confidence never goes above 0.55 circa.

i’ve tried create similar classifier 100 positive images , 100 negative images, final result never change.

the question is: why confidence score low? why not near 90-100% images used in positive training set?

the scores visual recognition custom classifiers range 0.0 1.0, unitless , not percentages or probabilities. (they not add 100% or 1.0)

when service creates classifier examples, trying figure out distinguishes features of 1 class of positive_examples other classes of positive_examples (and negative_examples, if given). scores based on distance decision boundary between positive examples class , else in classifier. attempts calibrate score output each class 0.5 decent decision threshold, whether belongs class.

however, given cost-benefit balance of false alarms vs. missed detections in application, may want use higher or lower threshold deciding whether image belongs class.

without knowing specifics of class examples, might guess there significant amount of similarity between classes, maybe in feature space examples not in distinct clusters, , scores reflect closeness boundary.


Comments