i have dataframe this, contains passenger id, date, , origin location.
id date origin 1 01/01/2012 1 01/01/2012 b 1 01/01/2012 c 1 01/02/2012 1 01/02/2012 b 1 01/02/2012 c 1 01/03/2012 1 01/03/2012 b 1 01/08/2012 2 01/01/2012 d 2 01/01/2012 c 2 01/01/2012 b 2 01/04/2012 d 2 01/04/2012 c 2 01/06/2012 d 3 01/03/2012 f 3 01/03/2012 g 3 01/09/2012 f 3 01/09/2012 g
i want creat 'daily first boarding record' using datafram shown above
id date origin 1 01/01/2012 1 01/02/2012 1 01/03/2012 1 01/08/2012 2 01/01/2012 d 2 01/04/2012 d 2 01/06/2012 d 3 01/03/2012 f 3 01/09/2012 f
group id , date, taking first value origin in each group.
currently, i'm using code
daily_first_record = aggregate(origin ~ id + date, data=df, fun='[', i=1)
however, code running because original dataset quite large (about 1gb csv). there easy way conduct same job?
we can use dplyr
library(dplyr) df1 %>% group_by(id, date) %>% summarise(origin = first(origin))
Comments
Post a Comment