i'm looking quick , efficient solution expand dictionary (df1)
pattern cat1 cat2 1 want [food] b 2 i'm [amplifier] [pos]. b df1 <- data.frame(pattern=c("i want [food]", "i'm [amplifier] [pos]"), cat1=c("a", "c"), cat2=c("b", "d"), stringsasfactors=false)
that has string patterns categories enclosed within square brackets []. these indicate categories appear in additional data frame in dictionary format (df2).
pattern category 1 pizza food 2 hot dog food 3 chips food 4 amplifier 5 amplifier 6 happy pos 7 optimistic pos df2 <- structure(list(pattern = c("pizza", "hot dog", "chips", "very", "very much", "happy", "optimistic"), category = c("food", "food", "food", "amplifier", "amplifier", "pos", "pos")), .names = c("pattern", "category"), row.names = c(na, -7l), class = "data.frame")
i want create extended data.frame takes df 1 , expands df 2 looks this:
pattern cat1 cat2 1 want pizza b 2 want hotdog b 3 want chips b 4 i'm happy c d 5 i'm more happy c d 6 i'm optimistic c d 7 i'm more optimistic c d output <- structure(list(pattern = c("i want pizza", "i want hotdog", "i want chips", "i'm happy", "i'm more happy", "i'm optimistic", "i'm more optimistic"), cat1 = c("a", "a", "a", "c", "c", "c", "c"), cat2 = c("b", "b", "b", "d", "d", "d", "d")), .names = c("pattern", "cat1", "cat2"), row.names = c(na, -7l), class = "data.frame")
here's i'd do:
library(stringi) library(data.table) setdt(df1) setdt(df2) capture_patt = "\\[(\\w+)\\]" df1[, { cats = stri_match_all(pattern, regex = capture_patt)[[1]][, 2] new_patt = gsub(capture_patt, "%s", pattern) subs = do.call(cj, lapply(cats, function(cat) df2[.(category = cat), on="category", pattern] )) .(res = do.call(sprintf, c(.(fmt = new_patt), subs))) }, by=names(df1)] # pattern cat1 cat2 res # 1: want [food] b want chips # 2: want [food] b want hot dog # 3: want [food] b want pizza # 4: i'm [amplifier] [pos]. b i'm happy. # 5: i'm [amplifier] [pos]. b i'm optimistic. # 6: i'm [amplifier] [pos]. b i'm happy. # 7: i'm [amplifier] [pos]. b i'm optimistic.
how works.
the objects are...
cats
categories need grabnew_patt
sprintf
-ready version of patternsubs
table of substitutions must maderes
new column
the trickier functions are...
cj
takes cross product,expand.grid
in mrflick's answer.do.call(f, list_o_args)
passes list of args function.
Comments
Post a Comment