python - Convert pandas dataframe column with xml data to normalised columns? -

i have dataframe in pandas, 1 of columns xml string. want create 1 column each of xml nodes column names in normalised form. example,

    id    xmlcolumn     1     <main attr1='abc' attr2='xyz'><item><prop1>text1</prop1><prop2>text2</prop2></item></main>     2     <main ........</main>

i want convert data frame so:

id   main.attr1  main.attr2 main.item.prop1  main.item.prop2 1       abc        xyz          text1           text2 2      .....

how that, while still keeping existing columns in dataframe?

the first step needs done convert xml string pandas series (under assumption, there same amount of columns in end). need function like:

def convert_xml(raw):     # etree xml mangling

this can achieved e.g. using etree package in python. returned series must have index, each entry in index new column name appear, e.g. example:

pd.series(['abc', 'xyz'], index=['main.attr1', 'main.attr2'])

given function, can following pandas (mocking away xml mangling):

frame = pd.dataframe({'keep': [42], 'xml': '<foo></foo>'}) temp = frame['xml'].apply(convert_xml) frame = frame.drop('xml', axis=1) frame = pd.concat([frame, temp], axis=1)

Search This Blog

Living

python - Convert pandas dataframe column with xml data to normalised columns? -

Comments

Post a Comment

Popular posts from this blog

elasticsearch python client - work with many nodes - how to work with sniffer -

angular - Is it possible to get native element for formControl? -

Upload file with tags through OwnCloud or NextCloud API -