python - Pandas: Set multiple MultiColumns as MultiIndex -


i generate empty data frame follows:

topfields = ['desc', 'desc', 'price', 'price', 'units', 'units'] bottomfields = ['foo', 'bar', 'mean', 'mom_2', 'mean', 'mom_2'] resultsdf = pd.dataframe(columns=pd.multiindex.from_arrays([topfields, bottomfields])) 

now set first 2 columns (with desc top-level value) index (and more general challenge, all columns desc top-level value). i've tried several ways, none of work.

here's intuitive (failure):

>>> test = resultsdf.set_index('desc') >>> test out[4]:  empty dataframe columns: [(price, mean), (price, mom_2), (units, mean), (units, mom_2)] index: [] >>> test.index out[5]: index([], dtype='object', name='desc') 

pandas correctly removes both desc columns (from "columns"), none of these appear in index. instead, have 1 field in index. when try create row based on multiindex, error:

>>> test.loc[pd.indexslice[0, 0], :] = 1 traceback (most recent call last): [...] keyerror: '[0 0] not in index' 

it looks need set_index tuple:

test = resultsdf.set_index(('desc', 'foo')) print (test) empty dataframe columns: [(desc, bar), (price, mean), (price, mom_2), (units, mean), (units, mom_2)] index: []  print (test.index) index([], dtype='object', name=('desc', 'foo')) 

or maybe:

test = resultsdf.set_index([('desc', 'foo'), ('desc', 'bar')]) print (test) columns: [(price, mean), (price, mom_2), (units, mean), (units, mom_2)] index: []  print (test.index) multiindex(levels=[[], []],            labels=[[], []],            names=[('desc', 'foo'), ('desc', 'bar')]) 

Comments