i have dataframe (dtetags.df) date column has many duplicate dates:
dtetags.df$date "2016-07-22" "2016-07-22" "2016-07-21" "2016-07-21" "2016-07-20" "2016-07-20" "2016-07-19" "2016-07-19" "2016-07-18" "2016-07-18" "2016-07-15" "2016-07-15" "2016-07-15" "2016-07-14" "2016-07-14" "2016-07-13" "2016-07-13" "2016-07-13" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-08" "2016-07-08" "2016-07-08" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-06" "2016-07-06" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-01" "2016-07-01" "2016-06-30" "2016-06-30" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-28" "2016-06-28" "2016-06-28" "2016-06-27" "2016-06-27" "2016-06-27" "2016-06-24" "2016-06-24" "2016-06-23" "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-21" "2016-06-21" "2016-06-20" "2016-06-20" "2016-06-17" "2016-06-17" "2016-06-16" "2016-06-16" "2016-06-15" "2016-06-15" "2016-06-14" "2016-06-13" "2016-06-13" "2016-06-10" "2016-06-10" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-08" "2016-06-08" "2016-06-07" "2016-06-07" "2016-06-06" "2016-06-06" "2016-06-06" "2016-06-01" "2016-06-01" "2016-05-29" "2016-05-29" "2016-05-27" "2016-05-27" "2016-05-26" "2016-05-26" "2016-05-25" "2016-05-25" "2016-05-24" "2016-05-23" "2016-05-23" "2016-05-20"
and number of binary tag columns show whether post made tag on date, example:
dtetags.df$technology "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "1" "1" "0" "1" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"
and trying use ddply(dtetags.df,"date",numcolwise(sum))
based on this question returns error message <0 rows> (or 0-length row.names)
. have tried number of different ways format ddply command, cannot work.
the ideal output like:
date technology 1 2016-07-22 0 2 2016-07-21 0 3 2016-07-20 0 4 2016-07-19 0 5 2016-07-18 0 6 2016-07-15 0 7 2016-07-14 0 8 2016-07-13 0 9 2016-07-12 0 10 2016-07-11 0 11 2016-07-08 0 12 2016-07-07 0 13 2016-07-06 1 14 2016-07-05 0 15 2016-07-01 2 16 2016-06-30 1 17 2016-06-29 1 18 2016-06-28 0 19 2016-06-27 0 20 2016-06-24 1 21 2016-06-23 0 22 2016-06-22 0 23 2016-06-21 0 24 2016-06-20 0 25 2016-06-17 0 26 2016-06-16 0 27 2016-06-15 0 28 2016-06-14 1 29 2016-06-13 0 30 2016-06-10 0 31 2016-06-09 0 32 2016-06-08 0 33 2016-06-07 0 34 2016-06-06 0 35 2016-06-01 0 36 2016-05-29 0 37 2016-05-27 0 38 2016-05-26 0 39 2016-05-25 0 40 2016-05-24 0 41 2016-05-23 0 42 2016-05-20 0
is there obvious doing wrong?
conversion factor numeric
i removed date column, applied data.frame(apply(dtetags.df, 2, function(x) as.numeric(as.character(x))))
rest of data frame, , prepended date column in.
dput(dtetags.df) structure(list(date = c("2016-07-22", "2016-07-22", "2016-07-21", "2016-07-21", "2016-07-20", "2016-07-20", "2016-07-19", "2016-07-19", "2016-07-18", "2016-07-18", "2016-07-15", "2016-07-15", "2016-07-15", "2016-07-14", "2016-07-14", "2016-07-13", "2016-07-13", "2016-07-13", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-11", "2016-07-11", "2016-07-11", "2016-07-11", "2016-07-08", "2016-07-08", "2016-07-08", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-06", "2016-07-06", "2016-07-05", "2016-07-05", "2016-07-05", "2016-07-05", "2016-07-01", "2016-07-01", "2016-06-30", "2016-06-30", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-28", "2016-06-28", "2016-06-28", "2016-06-27", "2016-06-27", "2016-06-27", "2016-06-24", "2016-06-24", "2016-06-23", "2016-06-23", "2016-06-22", "2016-06-22", "2016-06-21", "2016-06-21", "2016-06-20", "2016-06-20", "2016-06-17", "2016-06-17", "2016-06-16", "2016-06-16", "2016-06-15", "2016-06-15", "2016-06-14", "2016-06-13", "2016-06-13", "2016-06-10", "2016-06-10", "2016-06-09", "2016-06-09", "2016-06-09", "2016-06-09", "2016-06-08", "2016-06-08", "2016-06-07", "2016-06-07", "2016-06-06", "2016-06-06", "2016-06-06", "2016-06-01", "2016-06-01", "2016-05-29", "2016-05-29", "2016-05-27", "2016-05-27", "2016-05-26", "2016-05-26", "2016-05-25", "2016-05-25", "2016-05-24", "2016-05-23", "2016-05-23", "2016-05-20"), `technology` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .names = c("date", "technology"), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(na, -100l))
to accomplish want, can use dplyr
package:
library(dplyr) out <- dtetags.df %>% group_by(date) %>% summarise_each(funs(sum)) %>% arrange(desc(date))
notes:
group_by
date
, means subsequent operation on group of rows same date.- use
sum
function summarize each column (otherdate
). - use
arrange
sort results in descending order date.
given input data, output expected:
print(out) # tibble: 42 x 2 date technology <chr> <dbl> 1 2016-07-22 0 2 2016-07-21 0 3 2016-07-20 0 4 2016-07-19 0 5 2016-07-18 0 6 2016-07-15 0 7 2016-07-14 0 8 2016-07-13 0 9 2016-07-12 0 10 2016-07-11 0 11 2016-07-08 0 12 2016-07-07 0 13 2016-07-06 1 14 2016-07-05 0 15 2016-07-01 2 16 2016-06-30 1 17 2016-06-29 1 18 2016-06-28 0 19 2016-06-27 0 20 2016-06-24 1 21 2016-06-23 0 22 2016-06-22 0 23 2016-06-21 0 24 2016-06-20 0 25 2016-06-17 0 26 2016-06-16 0 27 2016-06-15 0 28 2016-06-14 1 29 2016-06-13 0 30 2016-06-10 0 31 2016-06-09 0 32 2016-06-08 0 33 2016-06-07 0 34 2016-06-06 0 35 2016-06-01 0 36 2016-05-29 0 37 2016-05-27 0 38 2016-05-26 0 39 2016-05-25 0 40 2016-05-24 0 41 2016-05-23 0 42 2016-05-20 0
caveats: requires rows other date
in dtetags.df
numeric
. if not, should converted prior applying code. can done using answer found here
hope helps.
Comments
Post a Comment