map a million strings to ints in c++11 -


i have million ascii strings, without duplicates, each @ 7 bytes long. need map each string positive integer. largest of these ints should not more million. although initialization may slow, lookup should fast: given string, return corresponding int (or -1, if not found). how can 1 implement in c++11?

one solution: accumulate strings std::unordered_map<string,int>; iterate on map, assigning ints incrementing counter. lookup, unordered_map::find("foo")->second. smells other container faster , have less overhead (indices built in, rather hand-coded). maybe unordered_set , pointer arithmetic??

the range restriction seems make perfect hash difficult.

(the int's range restricted, because indexes feature vector passed svm_light. software doesn't use sparse storage, vectors trillions of (mostly zero) elements make run out of memory. string-to-int preprocessing sort of implements sparse data structure.)

what describe looks perfect hashing.

there c++ libraries implement perfect hash, example tiny perfect hash library c, c++, , lua.


Comments