scala - Classify data using Apache Spark -


i have following ds:

|-- created_at: timestamp (nullable = true) |-- channel_source_id: integer (nullable = true) |-- movie_id: integer (nullable = true) 

i classify movie_id based on conditions:

  • number of times have been played;

    #count ocurrences id select count(createad_at) logs group movie_id   
  • what range (created_at) movie has been played;

    #returns distinct movies_id select distinct(movie_id) logs  #for each movie_id, retrieve hour has been played #when have result, apply filter df extract intervals select created_at logs movie_id = ? 
  • number of differents channel_source_id have played movie;

    #count number of channels have played select count(distinct(channel_source_id)) logs movie_id = ? group movie_id 

i've written simple table me on classification

played 1 5 times, range between 00:00:00 - 03:59:59, 1 3 different channels >> movie type played 6 10 times, range between 04:00:00 - 07:59:59, 4 5 different channels >> movie type b etc 

i'm using spark import file i'm lost how can perform classification. me give me hand on should start?

def run() = {   val sqlcontext = new sqlcontext(sc)   val df = sqlcontext.read     .format("com.databricks.spark.csv")     .options(map("header" -> "true", "inferschema" -> "true"))     .load("/home/plc/desktop/movies.csv")   df.printschema() } 


Comments