i have dataframe (dtetags.df) date column has many duplicate dates:
dtetags.df$date "2016-07-22" "2016-07-22" "2016-07-21" "2016-07-21" "2016-07-20" "2016-07-20" "2016-07-19" "2016-07-19" "2016-07-18" "2016-07-18" "2016-07-15" "2016-07-15" "2016-07-15" "2016-07-14" "2016-07-14" "2016-07-13" "2016-07-13" "2016-07-13" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-08" "2016-07-08" "2016-07-08" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-06" "2016-07-06" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-01" "2016-07-01" "2016-06-30" "2016-06-30" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-28" "2016-06-28" "2016-06-28" "2016-06-27" "2016-06-27" "2016-06-27" "2016-06-24" "2016-06-24" "2016-06-23" "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-21" "2016-06-21" "2016-06-20" "2016-06-20" "2016-06-17" "2016-06-17" "2016-06-16" "2016-06-16" "2016-06-15" "2016-06-15" "2016-06-14" "2016-06-13" "2016-06-13" "2016-06-10" "2016-06-10" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-08" "2016-06-08" "2016-06-07" "2016-06-07" "2016-06-06" "2016-06-06" "2016-06-06" "2016-06-01" "2016-06-01" "2016-05-29" "2016-05-29" "2016-05-27" "2016-05-27" "2016-05-26" "2016-05-26" "2016-05-25" "2016-05-25" "2016-05-24" "2016-05-23" "2016-05-23" "2016-05-20" and number of binary tag columns show whether post made tag on date, example:
dtetags.df$technology "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "1" "1" "0" "1" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" and trying use ddply(dtetags.df,"date",numcolwise(sum)) based on this question returns error message <0 rows> (or 0-length row.names). have tried number of different ways format ddply command, cannot work.
the ideal output like:
date technology 1 2016-07-22 0 2 2016-07-21 0 3 2016-07-20 0 4 2016-07-19 0 5 2016-07-18 0 6 2016-07-15 0 7 2016-07-14 0 8 2016-07-13 0 9 2016-07-12 0 10 2016-07-11 0 11 2016-07-08 0 12 2016-07-07 0 13 2016-07-06 1 14 2016-07-05 0 15 2016-07-01 2 16 2016-06-30 1 17 2016-06-29 1 18 2016-06-28 0 19 2016-06-27 0 20 2016-06-24 1 21 2016-06-23 0 22 2016-06-22 0 23 2016-06-21 0 24 2016-06-20 0 25 2016-06-17 0 26 2016-06-16 0 27 2016-06-15 0 28 2016-06-14 1 29 2016-06-13 0 30 2016-06-10 0 31 2016-06-09 0 32 2016-06-08 0 33 2016-06-07 0 34 2016-06-06 0 35 2016-06-01 0 36 2016-05-29 0 37 2016-05-27 0 38 2016-05-26 0 39 2016-05-25 0 40 2016-05-24 0 41 2016-05-23 0 42 2016-05-20 0 is there obvious doing wrong?
conversion factor numeric
i removed date column, applied data.frame(apply(dtetags.df, 2, function(x) as.numeric(as.character(x)))) rest of data frame, , prepended date column in.
dput(dtetags.df) structure(list(date = c("2016-07-22", "2016-07-22", "2016-07-21", "2016-07-21", "2016-07-20", "2016-07-20", "2016-07-19", "2016-07-19", "2016-07-18", "2016-07-18", "2016-07-15", "2016-07-15", "2016-07-15", "2016-07-14", "2016-07-14", "2016-07-13", "2016-07-13", "2016-07-13", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-11", "2016-07-11", "2016-07-11", "2016-07-11", "2016-07-08", "2016-07-08", "2016-07-08", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-06", "2016-07-06", "2016-07-05", "2016-07-05", "2016-07-05", "2016-07-05", "2016-07-01", "2016-07-01", "2016-06-30", "2016-06-30", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-28", "2016-06-28", "2016-06-28", "2016-06-27", "2016-06-27", "2016-06-27", "2016-06-24", "2016-06-24", "2016-06-23", "2016-06-23", "2016-06-22", "2016-06-22", "2016-06-21", "2016-06-21", "2016-06-20", "2016-06-20", "2016-06-17", "2016-06-17", "2016-06-16", "2016-06-16", "2016-06-15", "2016-06-15", "2016-06-14", "2016-06-13", "2016-06-13", "2016-06-10", "2016-06-10", "2016-06-09", "2016-06-09", "2016-06-09", "2016-06-09", "2016-06-08", "2016-06-08", "2016-06-07", "2016-06-07", "2016-06-06", "2016-06-06", "2016-06-06", "2016-06-01", "2016-06-01", "2016-05-29", "2016-05-29", "2016-05-27", "2016-05-27", "2016-05-26", "2016-05-26", "2016-05-25", "2016-05-25", "2016-05-24", "2016-05-23", "2016-05-23", "2016-05-20"), `technology` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .names = c("date", "technology"), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(na, -100l))
to accomplish want, can use dplyr package:
library(dplyr) out <- dtetags.df %>% group_by(date) %>% summarise_each(funs(sum)) %>% arrange(desc(date)) notes:
group_bydate, means subsequent operation on group of rows same date.- use
sumfunction summarize each column (otherdate). - use
arrangesort results in descending order date.
given input data, output expected:
print(out) # tibble: 42 x 2 date technology <chr> <dbl> 1 2016-07-22 0 2 2016-07-21 0 3 2016-07-20 0 4 2016-07-19 0 5 2016-07-18 0 6 2016-07-15 0 7 2016-07-14 0 8 2016-07-13 0 9 2016-07-12 0 10 2016-07-11 0 11 2016-07-08 0 12 2016-07-07 0 13 2016-07-06 1 14 2016-07-05 0 15 2016-07-01 2 16 2016-06-30 1 17 2016-06-29 1 18 2016-06-28 0 19 2016-06-27 0 20 2016-06-24 1 21 2016-06-23 0 22 2016-06-22 0 23 2016-06-21 0 24 2016-06-20 0 25 2016-06-17 0 26 2016-06-16 0 27 2016-06-15 0 28 2016-06-14 1 29 2016-06-13 0 30 2016-06-10 0 31 2016-06-09 0 32 2016-06-08 0 33 2016-06-07 0 34 2016-06-06 0 35 2016-06-01 0 36 2016-05-29 0 37 2016-05-27 0 38 2016-05-26 0 39 2016-05-25 0 40 2016-05-24 0 41 2016-05-23 0 42 2016-05-20 0 caveats: requires rows other date in dtetags.df numeric. if not, should converted prior applying code. can done using answer found here
hope helps.
Comments
Post a Comment