i have 3 columns in dataframe, as:
user_id, product_category_1 , corresponding purchase amount.
i trying group based on user_id , product_category_1 , selecting average of purchase amount.
so output dataframe have: user_id,product_category_1 , avg_purchase.
this not working me:
x=train_bk.groupby(["user_id","product_category_1"],as_index=false)['purchase'].transform('mean')
this gives me series of mean value of purchase each row. need keep unique user_id , product_category_1 combination
x1 = train_bk.select(average(train_bk.user_id), train_bk.product_category_1, group_by=(train_bk.user_id,train_bk.product_category_1))
i tried sql package. throws error: "name 'average' not defined". there package in python has sql syntax similar teradata or mysql.
ok, seems working:
x = train_bk.groupby(["user_id","product_category_1"],as_index=false)['purchase'].mean()
Comments
Post a Comment