Bootstrapping spark-avro jar to Amazon EMR cluster -


i want read avro files located in amazon s3 zeppelin notebook. understand databricks has wonderful package spark-avro. steps need take in order bootstrap jar file cluster , make working?

when write in notebook, val df = sqlcontext.read.avro("s3n://path_to_avro_files_in_one_bucket/")

i below error - <console>:34: error: value avro not member of org.apache.spark.sql.dataframereader

i have had @ this. guess solution posted there not work latest version of amazon emr.

if give me pointers, help.

here how associate spark-avro dependencies. method works associating other dependencies spark.

  1. make sure spark version compatible spark-avro. you'll find details of dependencies here.

  2. i put spark-avro file in s3 bucket. can use hdfs or other store.

  3. while launching emr cluster, add following json in configuration, [{"classification":"spark-defaults", "properties":{"spark.files":"/path_to_spark-avro_jar_file", "spark.jars":"/path_to_spark-avro_jar_file"}, "configurations":[]}]

this not way this. please refer link more details.


Comments