scala - Key/Value pair RDD -


i have question on key/value pair rdd.

i have 5 files in c:/download/input folder has dialogs in films content of files follows:

movie_horror_conjuring.txt movie_comedy_eurotrip.txt movie_horror_insidious.txt movie_sci-fi_interstellar.txt movie_horror_evildead.txt 

i trying read files in input folder using sc.wholetextfiles() key/value follows

(c:/download/input/movie_horror_conjuring.txt,values) 

i trying operation have group input files of each genre using groupbykey(). values of horror movies , comedy movies , on.

is there way can generate key/value pair way (horror, values) instead of (c:/download/input/movie_horror_conjuring.txt,values)

val ipfile = sc.wholetextfiles("c:/download/input") val output = ipfile.groupbykey().map(t => (t._1,t._2)) 

the above code giving me output follows

(c:/download/input/movie_horror_conjuring.txt,values) (c:/download/input/movie_comedy_eurotrip.txt,values) (c:/download/input/movie_horror_conjuring.txt,values) (c:/download/input/movie_sci-fi_interstellar.txt,values) (c:/download/input/movie_horror_evildead.txt,values) 

where need output follows :

(horror, (values1, values2, values3)) (comedy, (values1)) (sci-fi, (values1)) 

i tried map , split operations remove folder paths of key file name, i'm not able append corresponding values files.

also know how can lines count in values1, values2, values3 etc.

my final output should

(horror, 100)

where 100 sum of count of lines in values1 = 40 lines, values2 = 30 lines , values3 = 30 lines , on..

try this:

 val output = ipfile.map{case (k, v) => (k.split("_")(1),v)}.groupbykey()      output.collect 

let me know if works you!

update:

to output in format of (horror, 100):

val output = ipfile.map{case (k, v) => (k.split("_")(1),v.count(_ == '\n'))}.reducebykey(_ + _)     output.collect 

Comments

Popular posts from this blog

unity3d - Rotate an object to face an opposite direction -

angular - Is it possible to get native element for formControl? -

javascript - Why jQuery Select box change event is now working? -