python - Extracting rows from an extremely large (48GB) CSV file based on condition -


i have extremely large csv file has more 500 million rows.

but i need few thousand rows based on condition. @ moment using:

with open('/home/documents/1681.csv', 'rb') f:     reader = csv.dictreader(f)     rows = [row row in reader if row['flag_central'] == 1] 

here condition if flag_central == 1, need row.

however, since file extremely huge, not able perform above code. believe because of for loop using, causing trouble.

is there anyway can extract these rows csv file based on above condition?

if one-time task, suggest using unix commands first, process extract:

cat file | awk -f , '{ if ($5 == "1") print $0 }' > extract.csv 

where -f specifies column delimiter , 5 column number. figure out first

cat file | head -n 1 | tr ',' '\n' | nl | grep flag_central => 5   flag_central ^ field number ($5) 

this way not incur cost of converting csv file python objects first. depending on use case ymmv.


Comments