python - Convert pandas dataframe column with xml data to normalised columns? -


i have dataframe in pandas, 1 of columns xml string. want create 1 column each of xml nodes column names in normalised form. example,

    id    xmlcolumn     1     <main attr1='abc' attr2='xyz'><item><prop1>text1</prop1><prop2>text2</prop2></item></main>     2     <main ........</main> 

i want convert data frame so:

id   main.attr1  main.attr2 main.item.prop1  main.item.prop2 1       abc        xyz          text1           text2 2      ..... 

how that, while still keeping existing columns in dataframe?

the first step needs done convert xml string pandas series (under assumption, there same amount of columns in end). need function like:

def convert_xml(raw):     # etree xml mangling 

this can achieved e.g. using etree package in python. returned series must have index, each entry in index new column name appear, e.g. example:

pd.series(['abc', 'xyz'], index=['main.attr1', 'main.attr2']) 

given function, can following pandas (mocking away xml mangling):

frame = pd.dataframe({'keep': [42], 'xml': '<foo></foo>'}) temp = frame['xml'].apply(convert_xml) frame = frame.drop('xml', axis=1) frame = pd.concat([frame, temp], axis=1) 

Comments

Popular posts from this blog

unity3d - Rotate an object to face an opposite direction -

angular - Is it possible to get native element for formControl? -

javascript - Why jQuery Select box change event is now working? -