Scala - Remove first row of Spark DataFrame -
i know dataframes supposed immutable , , know it's not great idea try change them. however, file i'm receiving has useless header of 4 columns (the whole file has 50+ columns). so, i"m trying rid of top row because throws off.
i've tried number of different solutions (mostly found on here) using .filter() , map replacements, haven't gotten work.
here's example of how data looks:
h | 300 | 23098234 | n d | 399 | 54598755 | y | 09983 | 09823 | 02983 | ... | 0987098 d | 654 | 65465465 | y | 09983 | 09823 | 02983 | ... | 0987098 d | 198 | 02982093 | y | 09983 | 09823 | 02983 | ... | 0987098
any ideas?
the cleanest way i've seen far along lines of filtering out first row
csv_rows = sc.textfile('path_to_csv') skipable_first_row = csv_rows.first() useful_csv_rows = csv_rows.filter(row => row != skipable_first_row)
Comments
Post a Comment