r - Collapse by the maximum -


i have data frame, (i show tail of df) data frame called conv2

8464   208394_x_at                   esm1                          -1.035878e-01 8468   200858_s_at                snord55                          -1.034971e-01 8469   200858_s_at               snord38b                          -1.034971e-01 8467   200858_s_at                   rps8                          -1.034971e-01 8472     207381_at                   rps8                          -1.034510e-01 8477   211197_s_at                 icoslg                          -1.033752e-01 

what want is, whenever there name repeated in second column such rps8 remove lines containg such name except 1 highest absoulte value third column. in example row 8467 removed.

i have done way

for (d in dup){    conv2 <- rbind(conv2, conv[which(conv$symbol == d),][which.max(abs(conv[which(conv$symbol == d),][,3])),])  } 

is there better , faster way of doing this?

here base r solution uses "split-apply-combine" methodology.

# split data.frame column 2 mylist <- split(conv2, conv2$col2)  # loop through list of data.frames , rbind observations maximum values dfnew <- do.call(rbind, lapply(mylist, function(i) i[which.max(abs(i$col3)),])) 

Comments