scala - Key/Value pair RDD -

i have question on key/value pair rdd.

i have 5 files in c:/download/input folder has dialogs in films content of files follows:

movie_horror_conjuring.txt movie_comedy_eurotrip.txt movie_horror_insidious.txt movie_sci-fi_interstellar.txt movie_horror_evildead.txt

i trying read files in input folder using sc.wholetextfiles() key/value follows

(c:/download/input/movie_horror_conjuring.txt,values)

i trying operation have group input files of each genre using groupbykey(). values of horror movies , comedy movies , on.

is there way can generate key/value pair way (horror, values) instead of (c:/download/input/movie_horror_conjuring.txt,values)

val ipfile = sc.wholetextfiles("c:/download/input") val output = ipfile.groupbykey().map(t => (t._1,t._2))

the above code giving me output follows

(c:/download/input/movie_horror_conjuring.txt,values) (c:/download/input/movie_comedy_eurotrip.txt,values) (c:/download/input/movie_horror_conjuring.txt,values) (c:/download/input/movie_sci-fi_interstellar.txt,values) (c:/download/input/movie_horror_evildead.txt,values)

where need output follows :

(horror, (values1, values2, values3)) (comedy, (values1)) (sci-fi, (values1))

i tried map , split operations remove folder paths of key file name, i'm not able append corresponding values files.

also know how can lines count in values1, values2, values3 etc.

my final output should

(horror, 100)

where 100 sum of count of lines in values1 = 40 lines, values2 = 30 lines , values3 = 30 lines , on..

try this:

 val output = ipfile.map{case (k, v) => (k.split("_")(1),v)}.groupbykey()      output.collect

let me know if works you!

update:

to output in format of (horror, 100):

val output = ipfile.map{case (k, v) => (k.split("_")(1),v.count(_ == '\n'))}.reducebykey(_ + _)     output.collect

Search This Blog

Living

scala - Key/Value pair RDD -

Comments

Post a Comment

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

elasticsearch python client - work with many nodes - how to work with sniffer -