Spark xml processing in a csv file -


i have csv or tab separated file below:

1001,2016-02-23,req,<xmlstring><user><name>name1</name><addr>address1</addr></user></xmlstring>,20.0  1002,2016-02-24,req,<xmlstring><user><name>name2</name><addr>address1</addr></user></xmlstring>,30.0 

i want read file , convert each line csv file (including xml attributes) can put hive.

like this:

1001,2016-02-23,req,name1,address1,20.0

1002,2016-02-24,req,name2,address1,20.0

how in spark? how read each row , process xml bit , generate output?

thanks

using rdd = sc.textfile(...) read file each row in tuple. process each line, rdd.map(lambda row: row.split(',')).map(lambda row: yourfilterfunction(row))

with new spark api can read csv files , infer schema think hardcode kind of code.

you should learn , read documentation before asking question has been asked thousands of times...


Comments

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

javascript - Why jQuery Select box change event is now working? -