Bootstrapping spark-avro jar to Amazon EMR cluster -

i want read avro files located in amazon s3 zeppelin notebook. understand databricks has wonderful package spark-avro. steps need take in order bootstrap jar file cluster , make working?

when write in notebook, val df = sqlcontext.read.avro("s3n://path_to_avro_files_in_one_bucket/")

i below error - <console>:34: error: value avro not member of org.apache.spark.sql.dataframereader

i have had @ this. guess solution posted there not work latest version of amazon emr.

if give me pointers, help.

here how associate spark-avro dependencies. method works associating other dependencies spark.

make sure spark version compatible spark-avro. you'll find details of dependencies here.
i put spark-avro file in s3 bucket. can use hdfs or other store.
while launching emr cluster, add following json in configuration, [{"classification":"spark-defaults", "properties":{"spark.files":"/path_to_spark-avro_jar_file", "spark.jars":"/path_to_spark-avro_jar_file"}, "configurations":[]}]

this not way this. please refer link more details.

WIn

Search This Blog

Bootstrapping spark-avro jar to Amazon EMR cluster -

Comments

Post a Comment