python - print words between two particular words in a given string -


if 1 particular word not end particular word, leave it. here string:

x = 'john got shot dead. john .... ? , john got killed or died in 1990. john wife dead or died' 

i want print , count words between john , dead or death or died. if john not end of died or dead or death words. leave it. start again john word.

my code :

x = re.sub(r'[^\w]', ' ', x)  # removed dots, commas, special symbols  in re.findall(r'(?<=john)' + '(.*?)' + '(?=dead|died|death)', x):     print     print len([word word in i.split()]) 

my output:

 got shot  2           john got killed or  6  wife  3 

output want:

got shot 2 got killed or 3 wife 3 

i don't know doing mistake. sample input. have check 20,000 inputs @ time.

you can use negative lookahead regex:

>>> in re.findall(r'(?<=john)(?:(?!john).)*?(?=dead|died|death)', x): ...     print i.strip() ...     print len([word word in i.split()]) ...  got shot 2 got killed or 3 wife 3 

instead of .*? regex using (?:(?!john).)*? lazily match 0 or more of characters when john not present in match.

i suggest using word boundaries make match complete words:

re.findall(r'(?<=\bjohn\b)(?:(?!\bjohn\b).)*?(?=\b(?:dead|died|death)\b)', x) 

code demo


Comments