i have extremely large csv file has more 500 million rows.
but i need few thousand rows based on condition. @ moment using:
with open('/home/documents/1681.csv', 'rb') f: reader = csv.dictreader(f) rows = [row row in reader if row['flag_central'] == 1]
here condition if flag_central == 1
, need row.
however, since file extremely huge, not able perform above code. believe because of for
loop using, causing trouble.
is there anyway can extract these rows csv file based on above condition?
if one-time task, suggest using unix commands first, process extract:
cat file | awk -f , '{ if ($5 == "1") print $0 }' > extract.csv
where -f specifies column delimiter , 5 column number. figure out first
cat file | head -n 1 | tr ',' '\n' | nl | grep flag_central => 5 flag_central ^ field number ($5)
this way not incur cost of converting csv file python objects first. depending on use case ymmv.
Comments
Post a Comment