r - Error when consolidating like rows with plyr - what am I doing wrong? -


i have dataframe (dtetags.df) date column has many duplicate dates:

dtetags.df$date  "2016-07-22" "2016-07-22" "2016-07-21" "2016-07-21" "2016-07-20" "2016-07-20" "2016-07-19" "2016-07-19" "2016-07-18" "2016-07-18" "2016-07-15" "2016-07-15" "2016-07-15" "2016-07-14"  "2016-07-14" "2016-07-13" "2016-07-13" "2016-07-13" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-12" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-11" "2016-07-08" "2016-07-08"  "2016-07-08" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-07" "2016-07-06" "2016-07-06" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-05" "2016-07-01" "2016-07-01" "2016-06-30"  "2016-06-30" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-29" "2016-06-28" "2016-06-28" "2016-06-28" "2016-06-27" "2016-06-27" "2016-06-27" "2016-06-24" "2016-06-24"  "2016-06-23" "2016-06-23" "2016-06-22" "2016-06-22" "2016-06-21" "2016-06-21" "2016-06-20" "2016-06-20" "2016-06-17" "2016-06-17" "2016-06-16" "2016-06-16" "2016-06-15" "2016-06-15"  "2016-06-14" "2016-06-13" "2016-06-13" "2016-06-10" "2016-06-10" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-09" "2016-06-08" "2016-06-08" "2016-06-07" "2016-06-07" "2016-06-06"  "2016-06-06" "2016-06-06" "2016-06-01" "2016-06-01" "2016-05-29" "2016-05-29" "2016-05-27" "2016-05-27" "2016-05-26" "2016-05-26" "2016-05-25" "2016-05-25" "2016-05-24" "2016-05-23"  "2016-05-23" "2016-05-20" 

and number of binary tag columns show whether post made tag on date, example:

dtetags.df$technology  "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "1" "1" "0" "1" "0" "1"  "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0"  "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" 

and trying use ddply(dtetags.df,"date",numcolwise(sum)) based on this question returns error message <0 rows> (or 0-length row.names). have tried number of different ways format ddply command, cannot work.

the ideal output like:

               date            technology 1        2016-07-22                     0 2        2016-07-21                     0 3        2016-07-20                     0 4        2016-07-19                     0 5        2016-07-18                     0 6        2016-07-15                     0 7        2016-07-14                     0 8        2016-07-13                     0 9        2016-07-12                     0 10       2016-07-11                     0 11       2016-07-08                     0 12       2016-07-07                     0 13       2016-07-06                     1 14       2016-07-05                     0 15       2016-07-01                     2 16       2016-06-30                     1 17       2016-06-29                     1 18       2016-06-28                     0 19       2016-06-27                     0 20       2016-06-24                     1 21       2016-06-23                     0 22       2016-06-22                     0 23       2016-06-21                     0 24       2016-06-20                     0 25       2016-06-17                     0 26       2016-06-16                     0 27       2016-06-15                     0 28       2016-06-14                     1 29       2016-06-13                     0 30       2016-06-10                     0 31       2016-06-09                     0 32       2016-06-08                     0 33       2016-06-07                     0 34       2016-06-06                     0 35       2016-06-01                     0 36       2016-05-29                     0 37       2016-05-27                     0 38       2016-05-26                     0 39       2016-05-25                     0 40       2016-05-24                     0 41       2016-05-23                     0 42      2016-05-20                      0 

is there obvious doing wrong?

conversion factor numeric

i removed date column, applied data.frame(apply(dtetags.df, 2, function(x) as.numeric(as.character(x)))) rest of data frame, , prepended date column in.

dput(dtetags.df) structure(list(date = c("2016-07-22", "2016-07-22", "2016-07-21",  "2016-07-21", "2016-07-20", "2016-07-20", "2016-07-19", "2016-07-19",  "2016-07-18", "2016-07-18", "2016-07-15", "2016-07-15", "2016-07-15",  "2016-07-14", "2016-07-14", "2016-07-13", "2016-07-13", "2016-07-13",  "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-12", "2016-07-11",  "2016-07-11", "2016-07-11", "2016-07-11", "2016-07-08", "2016-07-08",  "2016-07-08", "2016-07-07", "2016-07-07", "2016-07-07", "2016-07-07",  "2016-07-06", "2016-07-06", "2016-07-05", "2016-07-05", "2016-07-05",  "2016-07-05", "2016-07-01", "2016-07-01", "2016-06-30", "2016-06-30",  "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29", "2016-06-29",  "2016-06-28", "2016-06-28", "2016-06-28", "2016-06-27", "2016-06-27",  "2016-06-27", "2016-06-24", "2016-06-24", "2016-06-23", "2016-06-23",  "2016-06-22", "2016-06-22", "2016-06-21", "2016-06-21", "2016-06-20",  "2016-06-20", "2016-06-17", "2016-06-17", "2016-06-16", "2016-06-16",  "2016-06-15", "2016-06-15", "2016-06-14", "2016-06-13", "2016-06-13",  "2016-06-10", "2016-06-10", "2016-06-09", "2016-06-09", "2016-06-09",  "2016-06-09", "2016-06-08", "2016-06-08", "2016-06-07", "2016-06-07",  "2016-06-06", "2016-06-06", "2016-06-06", "2016-06-01", "2016-06-01",  "2016-05-29", "2016-05-29", "2016-05-27", "2016-05-27", "2016-05-26",  "2016-05-26", "2016-05-25", "2016-05-25", "2016-05-24", "2016-05-23",  "2016-05-23", "2016-05-20"), `technology` = c(0, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0,  1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), .names = c("date",  "technology"), class = c("tbl_df", "tbl", "data.frame" ), row.names = c(na, -100l)) 

to accomplish want, can use dplyr package:

library(dplyr) out <- dtetags.df %>% group_by(date) %>% summarise_each(funs(sum)) %>% arrange(desc(date)) 

notes:

  1. group_by date, means subsequent operation on group of rows same date.
  2. use sum function summarize each column (other date).
  3. use arrange sort results in descending order date.

given input data, output expected:

print(out) # tibble: 42 x 2      date     technology     <chr>          <dbl> 1  2016-07-22          0 2  2016-07-21          0 3  2016-07-20          0 4  2016-07-19          0 5  2016-07-18          0 6  2016-07-15          0 7  2016-07-14          0 8  2016-07-13          0 9  2016-07-12          0 10 2016-07-11          0 11 2016-07-08          0 12 2016-07-07          0 13 2016-07-06          1 14 2016-07-05          0 15 2016-07-01          2 16 2016-06-30          1 17 2016-06-29          1 18 2016-06-28          0 19 2016-06-27          0 20 2016-06-24          1 21 2016-06-23          0 22 2016-06-22          0 23 2016-06-21          0 24 2016-06-20          0 25 2016-06-17          0 26 2016-06-16          0 27 2016-06-15          0 28 2016-06-14          1 29 2016-06-13          0 30 2016-06-10          0 31 2016-06-09          0 32 2016-06-08          0 33 2016-06-07          0 34 2016-06-06          0 35 2016-06-01          0 36 2016-05-29          0 37 2016-05-27          0 38 2016-05-26          0 39 2016-05-25          0 40 2016-05-24          0 41 2016-05-23          0 42 2016-05-20          0 

caveats: requires rows other date in dtetags.df numeric. if not, should converted prior applying code. can done using answer found here

hope helps.


Comments