dataframe - R + dplyr: specific row selection (first and last element of row with identical value) -
this question has answer here:
- select first , last row grouped data 5 answers
i have dataframe this:
starttime sx sy time <chr> <chr> <chr> <chr> 1 1416924247145 667.75 824.25 1416924247145 2 1416924247145 667.875 824.25 1416924247158 3 1416924247145 668.5 824.5 1416924247198 4 1416924257557 231.25 602.25 1416924257557 5 1416924257557 230.625 602.25 1416924257570 6 1416924257557 229.625 601.875 1416924257597 7 1416924257557 228.75 601.25 1416924257610 8 1416924257557 227.5 600.0 1416924257623 9 1416924257557 216.875 587.75 1416924257717 10 1416924257557 207.125 572.625 1416924257797 11 1416924257600 525.425 525.636 1416924259999
i want subset of dataframe containing rows first , last element equal starttimes. in example these rows 1,3,4,10 , 11. important is, first , last rows included. try dplyr package, because looks suitable this. made use of group_by(), filter(), first() , last() functions, couldn't result wanted. how result should like:
starttime sx sy time <chr> <chr> <chr> <chr> 1 1416924247145 667.75 824.25 1416924247145 3 1416924247145 668.5 824.5 1416924247198 4 1416924257557 231.25 602.25 1416924257557 10 1416924257557 207.125 572.625 1416924257797 11 1416924257600 525.425 525.636 1416924259999
one of ways using dplyr
:
library(dplyr) df %>% group_by(starttime) %>% slice(unique(c(1, n()))) #source: local data frame [5 x 4] #groups: starttime [3] # # starttime sx sy time # <dbl> <dbl> <dbl> <dbl> #1 1.416924e+12 667.750 824.250 1.416924e+12 #2 1.416924e+12 668.500 824.500 1.416924e+12 #3 1.416924e+12 231.250 602.250 1.416924e+12 #4 1.416924e+12 207.125 572.625 1.416924e+12 #5 1.416924e+12 525.425 525.636 1.416924e+12
or using data.table
:
library(data.table) setdt(df)[, .sd[unique(c(1,.n))], starttime]
data
structure(list(starttime = c(1416924247145, 1416924247145, 1416924247145, 1416924257557, 1416924257557, 1416924257557, 1416924257557, 1416924257557, 1416924257557, 1416924257557, 1416924257600), sx = c(667.75, 667.875, 668.5, 231.25, 230.625, 229.625, 228.75, 227.5, 216.875, 207.125, 525.425), sy = c(824.25, 824.25, 824.5, 602.25, 602.25, 601.875, 601.25, 600, 587.75, 572.625, 525.636), time = c(1416924247145, 1416924247158, 1416924247198, 1416924257557, 1416924257570, 1416924257597, 1416924257610, 1416924257623, 1416924257717, 1416924257797, 1416924259999 )), .names = c("starttime", "sx", "sy", "time"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
Comments
Post a Comment