Scala - Remove first row of Spark DataFrame -

i know dataframes supposed immutable , , know it's not great idea try change them. however, file i'm receiving has useless header of 4 columns (the whole file has 50+ columns). so, i"m trying rid of top row because throws off.

i've tried number of different solutions (mostly found on here) using .filter() , map replacements, haven't gotten work.

here's example of how data looks:

h | 300 | 23098234 | n d | 399 | 54598755 | y | 09983 | 09823 | 02983 | ... | 0987098 d | 654 | 65465465 | y | 09983 | 09823 | 02983 | ... | 0987098 d | 198 | 02982093 | y | 09983 | 09823 | 02983 | ... | 0987098

any ideas?

the cleanest way i've seen far along lines of filtering out first row

csv_rows           = sc.textfile('path_to_csv') skipable_first_row = csv_rows.first()  useful_csv_rows    = csv_rows.filter(row => row != skipable_first_row)

Search This Blog

Living

Scala - Remove first row of Spark DataFrame -

Comments

Post a Comment

Popular posts from this blog

elasticsearch python client - work with many nodes - how to work with sniffer -

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -