if 1 particular word not end particular word, leave it. here string:
x = 'john got shot dead. john .... ? , john got killed or died in 1990. john wife dead or died'
i want print , count words between john
, dead or death or died.
if john
not end of died or dead or death
words. leave it. start again john word.
my code :
x = re.sub(r'[^\w]', ' ', x) # removed dots, commas, special symbols in re.findall(r'(?<=john)' + '(.*?)' + '(?=dead|died|death)', x): print print len([word word in i.split()])
my output:
got shot 2 john got killed or 6 wife 3
output want:
got shot 2 got killed or 3 wife 3
i don't know doing mistake. sample input. have check 20,000 inputs @ time.
you can use negative lookahead regex:
>>> in re.findall(r'(?<=john)(?:(?!john).)*?(?=dead|died|death)', x): ... print i.strip() ... print len([word word in i.split()]) ... got shot 2 got killed or 3 wife 3
instead of .*?
regex using (?:(?!john).)*?
lazily match 0 or more of characters when john
not present in match.
i suggest using word boundaries make match complete words:
re.findall(r'(?<=\bjohn\b)(?:(?!\bjohn\b).)*?(?=\b(?:dead|died|death)\b)', x)
Comments
Post a Comment