Category Archives: Scala

How to import spark.implicits._ in Spark 2.2: error “value toDS is not a member of org.apache.spark.rdd.RDD”

I wrote about how to import implicits in spark 1.6 more than 2 years ago. But things have changed in Spark 2.2: the first thing you need to do when coding in Spark 2.2 is to set up an SparkSession object. SparkSession is the entry point to programming Spark with DataSet and DataFrame.

Like Spark 1.6, spark.implicits are required to be able to use Spark’s API for DataSets and DataFrames in version 2.2. And like version 1.6, an instance of SparkContext is needed in Spark 2.2 before being able to import spark.implicits. Since each instance of SparkSession comes with and an instance of SparkContext associated with it, all you have to do is to create an object of SparkSession and you’re set.

I have seen other posts that mention bits and pieces of how to do it. Here I give you the full code that works just fine and you can tweek it based on your requirements:

import org.apache.spark.sql._

import org.apache.log4j._

object sparkSQLWithCaseClass {

case class Person (ID: Int, name: String)

def mapper(l: String): Person = {

val fields = l.split(‘,’)

val person: Person = Person(fields(0).toInt, fields(1))

return person

}

 

def main(args: Array[String]){

Logger.getLogger(“org”).setLevel(Level.ERROR)

val spark = SparkSession.builder.appName(“Spark SQL”).getOrCreate()

 

val lines = spark.sparkContext.textFile(“../../people.csv”)

val people = lines.map(mapper)

 

import spark.implicits._

val schemaPeople = people.toDS()

schemaPeople.printSchema()

schemaPeople.createOrReplaceTempView(“people”)

 

val t = spark.sql(“select * from people where age >= 13”)

val res = t.collect()

res.foreach(println)

spark.stop()

}

}

Advertisements

How to import org.apache.spark.sql.SQLContext.implicits in Spark 1.6: error “value toDF is not a member of org.apache.spark.rdd.RDD”

I am doing a mini project for my company using Spark/Scala and have been stuck with the error mentioned in the title for a couple of days. Googling that error suggested to import org.apache.spark.sql.SQLContext.implicits, and that’s what I did:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext.implicits

import org.apache.spark.SparkConf
object TestSQLContext {
[…..]
def main(args:Array[String]) {
[…..]
}
}

And that was the start of the problem: my application started to give a new error:

object SQLContext is not a member of package org.apache.spark.sql
[error] Note: class SQLContext exists, but it has no companion object.

The problem is, none of those online posts mention that we need to create an instance of org.apache.spark.sql.SQLContext before being able to use its members and methods. This is the right way to do it:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.sql._
import org.apache.spark.SparkConf
object Hi {

case class DimC(ID:Int, Name:String, City:String, EffectiveFrom:Int, EffectiveTo:Int)

def main(args:Array[String]) {
val conf = new SparkConf().setAppName(“LoadDW”)
val sc = new SparkContext(conf)
val sqlContext= new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._

val fDimCustomer = sc.textFile(“DimCustomer.txt”)

var dimCustomer1 =   fDimCustomer.map(_.split(‘,’)).map(r=>DimC(r(0).toInt,r(1),r(2),r(3).toInt,r(4).toInt)).toDF

dimCustomer1.registerTempTable(“Cust_1”)

val customers = sqlContext.sql(“select * from Cust_1”)

customers.show()
}
}

Hope this post helps and please do not hesitate to ask your questions in comments section.

Cheers.