apache spark - DecimalType Issue - scala.MatchError of class java.lang.String -
i using spark 1.6.1 scala 2.10.5 built in. examining weather data have decimal values. here code:
val sqlcontext = new org.apache.spark.sql.sqlcontext(sc) import sqlcontext.implicits._ import org.apache.spark.sql.row import org.apache.spark.rdd.rdd import org.apache.spark.sql._ import org.apache.spark.sql.types._ import org.apache.spark.sql.sqlcontext val rawdata=sc.textfile("example_weather.csv").map(_.split(",")) val header=rawdata.first val rawdatanoheader=rawdata.filter(_(0)!= header(0)) rawdatanoheader.first object schema { val weatherdata= structtype(seq( structfield("date", stringtype, true), structfield("region", stringtype, true), structfield("temperature", decimaltype(32,16), true), structfield("solar", integertype, true), structfield("rainfall", decimaltype(32,16), true), structfield("windspeed", decimaltype(32,16), true)) ) } val datadf=sqlcontext.createdataframe(rawdatanoheader.map(p=>row(p(0),p(1),p(2),p(3),p(4),p(5))), schema.weatherdata) datadf.registertemptable("weatherdatasql") val datasql = sqlcontext.sql("select * weatherdatasql") datasql.collect().foreach(println)
when running code, expected schema , sqlcontext:
scala> object schema { | val weatherdata= structtype(seq( | structfield("date", stringtype, true), | structfield("region", stringtype, true), | structfield("temperature", decimaltype(32,16), true), | structfield("solar", integertype, true), | structfield("rainfall", decimaltype(32,16), true), | structfield("windspeed", decimaltype(32,16), true)) | ) | } 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_2_piece0 on localhost:56288 in memory (size: 4.6 kb, free: 511.1 mb) 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_2_piece0 on localhost:39349 in memory (size: 4.6 kb, free: 2.7 gb) 16/09/24 09:40:58 info contextcleaner: cleaned accumulator 2 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_1_piece0 on localhost in memory (size: 1964.0 b, free: 511.1 mb) 16/09/24 09:40:58 info blockmanagerinfo: removed broadcast_1_piece0 on localhost:41412 in memory (size: 1964.0 b, free: 2.7 gb) 16/09/24 09:40:58 info contextcleaner: cleaned accumulator 1 defined module schema scala> val datadf=sqlcontext.createdataframe(rawdatanoheader.map(p=>row(p(0),p(1),p(2),p(3),p(4),p(5))), schema.weatherdata) datadf: org.apache.spark.sql.dataframe = [date: string, region: string, temperature: decimal(32,16), solar: int, rainfall: decimal(32,16), windspeed: decimal(32,16)]
however, last line of code gives me following:
16/09/24 09:41:03 warn tasksetmanager: lost task 0.0 in stage 2.0 (tid 2, localhost): scala.matcherror: 20.21666667 (of class java.lang.string)
the number 20.21666667 indeed first temperature observed specific geographical region. thought had specified temperature decimaltype(32,16). there problem code or sqlcontext calling on?
as recommended changed datadf follows:
val datadf= sqlcontext.createdataframe(rawdatanoheader.map(p=>row(p(0),p(1),bigdecimal(p(2)),p(3),bigdecimal(p(4)),bigdecimal(p(5)))), schema.weatherdata)
unfortunately, casting problem
16/09/24 10:31:35 warn tasksetmanager: lost task 0.0 in stage 2.0 (tid 2, localhost): java.lang.classcastexception: java.lang.string cannot cast java.lang.integer
the code in first edit correct - p(3) had converted toint
i created sample csv file without headers:
2016,a,201.222,12,12.1,5.0 2016,b,200.222,13,12.3,6.0 2014,b,200.111,14,12.3,7.0
the results:
val datadf= sqlcontext.createdataframe(rawdata.map(p=>row(p(0),p(1),bigdecimal(p(2)),p(3).toint,bigdecimal(p(4)),bigdecimal(p(5)))), schema.weatherdata) datadf.show +----+------+--------------------+-----+-------------------+------------------+ |date|region| temperature|solar| rainfall| windspeed| +----+------+--------------------+-----+-------------------+------------------+ |2016| a|201.2220000000000000| 12|12.1000000000000000|5.0000000000000000| |2016| b|200.2220000000000000| 13|12.3000000000000000|6.0000000000000000| |2014| b|200.1110000000000000| 14|12.3000000000000000|7.0000000000000000| +----+------+--------------------+-----+-------------------+------------------+
Thank you!!!!
ReplyDelete