apache spark - DecimalType Issue - scala.MatchError of class java.lang.String -


i using spark 1.6.1 scala 2.10.5 built in. examining weather data have decimal values. here code:

 val sqlcontext = new org.apache.spark.sql.sqlcontext(sc)  import sqlcontext.implicits._   import org.apache.spark.sql.row  import org.apache.spark.rdd.rdd  import org.apache.spark.sql._  import org.apache.spark.sql.types._  import org.apache.spark.sql.sqlcontext  val rawdata=sc.textfile("example_weather.csv").map(_.split(","))  val header=rawdata.first  val rawdatanoheader=rawdata.filter(_(0)!= header(0))  rawdatanoheader.first  object schema { val weatherdata= structtype(seq( structfield("date", stringtype, true), structfield("region", stringtype, true), structfield("temperature", decimaltype(32,16), true), structfield("solar", integertype, true), structfield("rainfall", decimaltype(32,16), true),  structfield("windspeed", decimaltype(32,16), true)) ) }  val datadf=sqlcontext.createdataframe(rawdatanoheader.map(p=>row(p(0),p(1),p(2),p(3),p(4),p(5))), schema.weatherdata)  datadf.registertemptable("weatherdatasql")  val datasql = sqlcontext.sql("select * weatherdatasql")  datasql.collect().foreach(println) 

when running code, expected schema , sqlcontext:

scala> object schema {  | val weatherdata= structtype(seq(  | structfield("date", stringtype, true),  | structfield("region", stringtype, true),  | structfield("temperature", decimaltype(32,16), true),  | structfield("solar", integertype, true),  | structfield("rainfall", decimaltype(32,16), true),  | structfield("windspeed", decimaltype(32,16), true))  | )  | } 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_2_piece0 on localhost:56288 in memory (size: 4.6 kb, free: 511.1 mb) 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_2_piece0 on localhost:39349 in memory (size: 4.6 kb, free: 2.7 gb) 16/09/24 09:40:58 info contextcleaner: cleaned accumulator 2 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_1_piece0 on localhost in memory (size: 1964.0 b, free: 511.1 mb) 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_1_piece0 on localhost:41412 in memory (size: 1964.0 b, free: 2.7 gb) 16/09/24 09:40:58 info contextcleaner: cleaned accumulator 1 defined module schema  scala> val datadf=sqlcontext.createdataframe(rawdatanoheader.map(p=>row(p(0),p(1),p(2),p(3),p(4),p(5))), schema.weatherdata) datadf: org.apache.spark.sql.dataframe = [date: string, region: string, temperature: decimal(32,16), solar: int, rainfall: decimal(32,16), windspeed: decimal(32,16)] 

however, last line of code gives me following:

16/09/24 09:41:03 warn tasksetmanager: lost task 0.0 in stage 2.0 (tid 2, localhost): scala.matcherror: 20.21666667 (of class java.lang.string) 

the number 20.21666667 indeed first temperature observed specific geographical region. thought had specified temperature decimaltype(32,16). there problem code or sqlcontext calling on?

as recommended changed datadf follows:

val datadf= sqlcontext.createdataframe(rawdatanoheader.map(p=>row(p(0),p(1),bigdecimal(p(2)),p(3),bigdecimal(p(4)),bigdecimal(p(5)))), schema.weatherdata) 

unfortunately, casting problem

16/09/24 10:31:35 warn tasksetmanager: lost task 0.0 in stage 2.0 (tid 2, localhost): java.lang.classcastexception: java.lang.string cannot cast java.lang.integer 

the code in first edit correct - p(3) had converted toint

i created sample csv file without headers:

2016,a,201.222,12,12.1,5.0 2016,b,200.222,13,12.3,6.0 2014,b,200.111,14,12.3,7.0 

the results:

val datadf= sqlcontext.createdataframe(rawdata.map(p=>row(p(0),p(1),bigdecimal(p(2)),p(3).toint,bigdecimal(p(4)),bigdecimal(p(5)))), schema.weatherdata)  datadf.show +----+------+--------------------+-----+-------------------+------------------+ |date|region|         temperature|solar|           rainfall|         windspeed| +----+------+--------------------+-----+-------------------+------------------+ |2016|     a|201.2220000000000000|   12|12.1000000000000000|5.0000000000000000| |2016|     b|200.2220000000000000|   13|12.3000000000000000|6.0000000000000000| |2014|     b|200.1110000000000000|   14|12.3000000000000000|7.0000000000000000| +----+------+--------------------+-----+-------------------+------------------+ 

Comments

Post a Comment

Popular posts from this blog

elasticsearch python client - work with many nodes - how to work with sniffer -

unity3d - Rotate an object to face an opposite direction -

angular - Is it possible to get native element for formControl? -