regex - Splitting line with escaped separators in Python -


tl; dr:

line = "one|two|three\|four\|five" fields = line.split(whatever) 

for value of whatever does:

fields == ['one', 'two', 'three\|four\|five'] 

i have file delimited pipe characters. of fields in file include pipes, escaped leading backslash.

for example, single row of data in file might have array representation of ['one', 'two', 'three\|four\|five'], , represented in file one|two|three\|four\|five

i have no control on file. cannot preprocess file. have in single split.

i need split each row of file separate fields, leading backslash proving sorts of trouble. tried using negative look-ahead, there's sort of arcana surrounding python strings , double-escaped characters don't understand, , stopping me figuring out.

explanation of solution appreciated optional.

you can use regex like

re.split(r'([^|]+[^\\])\|', line) 

which use character group specify except \ followed | used split

that give empty match @ beginning of list, can work around like

re.split(r'([^|]+[^\\])\|', line)[1:] 

this still subject parsing issues wiktor raised though, of course


Comments