in this example of training logisticregression model use rdd[labeledpoint] input fit() method write "// use labeledpoint, case class. spark sql can convert rdds of case classes // schemardds, uses case class metadata infer schema."
where conversion happening? when try code:
val sqlcontext = new sqlcontext(sc) import sqlcontext._ val model = lr.fit(training);
,where training of type rdd[labeledpoint], gives compilation error stating fit expects data frame. when convert rdd data frame exception:
an exception occured while executing java class. null: invocationtargetexception: requirement failed: column features must of type org.apache.spark.mllib.linalg.vectorudt@f71b0bce structtype(structfield(label,doubletype,false), structfield(features,org.apache.spark.mllib.linalg.vectorudt@f71b0bce,true))
but confusing me. why expect vector? needs labels. wondering correct format?
the reason using ml logisticregression , not mllib logisticregressionwithlbfgs because want elasticnet implementation.
the exception says dataframe expects follow structure:
structtype(structfield(label,doubletype,false), structfield(features,org.apache.spark.mllib.linalg.vectorudt@f71b0bce,true))
so prepare training data list of (label, features) tuples this:
val training = sqlcontext.createdataframe(seq( (1.0, vectors.dense(0.0, 1.1, 0.1)), (0.0, vectors.dense(2.0, 1.0, -1.0)), (0.0, vectors.dense(2.0, 1.3, 1.0)), (1.0, vectors.dense(0.0, 1.2, -0.5)) )).todf("label", "features")
Comments
Post a Comment