Define Case Classes For Spark Schema

This recipe is inspired by and all rights are owned by their respective owners.In Spark SQL, the best way to create SchemaRDD is by using scala case class. Spark uses Java’s reflection API to figure out the fields and build the schema.There are several cases where you would not want to do it. One of them being case class’ limitation that it can only support 22 fields.Let’s look at an alternative approach, i.e., specifying schema programmatically. Let’s take a simple file person.txt which has three fields: firstName, lastName and age. Scala > val sqlContext = new org. SQLContext (sc )scala > import sqlContext.

  1. Infer Schema Spark
  2. Define Case Classes For Spark Schematic
  3. Spark Dataset Case Class
Classes

Infer Schema Spark

scala > val person = sc. TextFile ( 'hdfs://localhost:9000/user/hduser/person' )scala > import org. scala > val schema = StructType (Array (StructField ( 'firstName',StringType, true ),StructField ( 'lastName',StringType, true ),StructField ( 'age',IntegerType, true ) ) )scala > val rowRDD = person.

Define Case Classes For Spark Schematic

Split ( ',' ) ). Map (p => org. Row (p ( 0 ),p ( 1 ),p ( 2 ).

Spark Dataset Case Class

ToInt ) )scala > val personSchemaRDD = sqlContext. ApplySchema (rowRDD, schema )scala > personSchemaRDD. RegisterTempTable ( 'person' )scala > sql ( 'select. from person' ).

Posted on