tl; dr:
line = "one|two|three\|four\|five" fields = line.split(whatever)
for value of whatever
does:
fields == ['one', 'two', 'three\|four\|five']
i have file delimited pipe characters. of fields in file include pipes, escaped leading backslash.
for example, single row of data in file might have array representation of ['one', 'two', 'three\|four\|five']
, , represented in file one|two|three\|four\|five
i have no control on file. cannot preprocess file. have in single split.
i need split each row of file separate fields, leading backslash proving sorts of trouble. tried using negative look-ahead, there's sort of arcana surrounding python strings , double-escaped characters don't understand, , stopping me figuring out.
explanation of solution appreciated optional.
you can use regex like
re.split(r'([^|]+[^\\])\|', line)
which use character group specify except \
followed |
used split
that give empty match @ beginning of list, can work around like
re.split(r'([^|]+[^\\])\|', line)[1:]
this still subject parsing issues wiktor raised though, of course
Comments
Post a Comment