i want read avro files located in amazon s3 zeppelin notebook. understand databricks has wonderful package spark-avro
. steps need take in order bootstrap jar file cluster , make working?
when write in notebook, val df = sqlcontext.read.avro("s3n://path_to_avro_files_in_one_bucket/")
i below error - <console>:34: error: value avro not member of org.apache.spark.sql.dataframereader
i have had @ this. guess solution posted there not work latest version of amazon emr.
if give me pointers, help.
here how associate spark-avro dependencies. method works associating other dependencies spark.
make sure spark version compatible spark-avro. you'll find details of dependencies here.
i put spark-avro file in s3 bucket. can use hdfs or other store.
while launching emr cluster, add following json in configuration,
[{"classification":"spark-defaults", "properties":{"spark.files":"/path_to_spark-avro_jar_file", "spark.jars":"/path_to_spark-avro_jar_file"}, "configurations":[]}]
this not way this. please refer link more details.
Comments
Post a Comment