Spark xml processing in a csv file -
i have csv or tab separated file below:
1001,2016-02-23,req,<xmlstring><user><name>name1</name><addr>address1</addr></user></xmlstring>,20.0 1002,2016-02-24,req,<xmlstring><user><name>name2</name><addr>address1</addr></user></xmlstring>,30.0
i want read file , convert each line csv file (including xml attributes) can put hive.
like this:
1001,2016-02-23,req,name1,address1,20.0
1002,2016-02-24,req,name2,address1,20.0
how in spark? how read each row , process xml bit , generate output?
thanks
using rdd = sc.textfile(...)
read file each row in tuple. process each line, rdd.map(lambda row: row.split(',')).map(lambda row: yourfilterfunction(row))
with new spark api can read csv files , infer schema think hardcode kind of code.
you should learn , read documentation before asking question has been asked thousands of times...
Comments
Post a Comment