python - Obfuscate Data in .csv from a .txt file -


i obfuscate words occur in column of .csv file based on list of data remove in different .txt file.

ideally able ignore case of data , in .csv file, replace matching words "to remove" file '*'. not sure best method replace words in .csv file while ignoring case. have far isn't working , open solutions.

example data file:

this line of text in .csv column want remove word or data such 123 from.  

my .txt file list of data remove:

want remove 123 

output should be:

this line of text in .csv column **** ****** word or data such *** from. 

my code:

import csv  open('myfilename.csv' , 'rb') csvfile, open ('datatoremove.txt', 'r') removetxtfile:     reader = csv.reader(csvfile)     reader.next()     row in reader:         csv_words = row[3].split(" ") #gets word 4th column in .csv file             line in removetxtfile:                 wordtoremove in line.split():                     if csv_words.lower() ==  wordtoremove.lower()                         csv_words = csv_words.replace(wordtoremove.lower(), '*' * len(csv_words)) 

i start constructing set of censor words. input plain text file of newline separated words. if text file different might need parse separately.

other thoughts:

create separate censored file output instead of trying overwrite input file. way if screw algorithm don't lose data.

you .split(" ") on 4th column, necessary if there multiple words, space separated, in column. if not case, can skip for w in csv_words loop, loops on words in 4th column.

import csv import re import string  punctuation_split_regex = re.compile(r'[\s{}]+'.format(re.escape(string.punctuation)))  # construct set of words censor censor_words = set() open ('datatoremove.txt', 'r') removetxtfile:   l in removetxtfile:     words = punctuation_split_regex.split(l)     w in words:         censor_words.add(w.strip().lower())  open('myfilename.csv' , 'rb') csvfile, open('censoredfilename.csv', 'w') f:     reader = csv.reader(csvfile)     # reader.next()     row in reader:         csv_words = row[3].split(' ') #gets word 4th column in .csv file         new_column = []         w in csv_words:             if w.lower() in censor_words:                 new_column.append('*'*len(w))             else:                 new_column.append(w)         row[3] = ' '.join(new_column)         f.write(' '.join(row) + '\n') 

Comments