regex - Splitting line with escaped separators in Python -

tl; dr:

line = "one|two|three\|four\|five" fields = line.split(whatever)

for value of whatever does:

fields == ['one', 'two', 'three\|four\|five']

i have file delimited pipe characters. of fields in file include pipes, escaped leading backslash.

i have no control on file. cannot preprocess file. have in single split.

i need split each row of file separate fields, leading backslash proving sorts of trouble. tried using negative look-ahead, there's sort of arcana surrounding python strings , double-escaped characters don't understand, , stopping me figuring out.

explanation of solution appreciated optional.

you can use regex like

re.split(r'([^|]+[^\\])\|', line)

which use character group specify except \ followed | used split

that give empty match @ beginning of list, can work around like

re.split(r'([^|]+[^\\])\|', line)[1:]

this still subject parsing issues wiktor raised though, of course

WIn

Search This Blog

regex - Splitting line with escaped separators in Python -

Comments

Post a Comment