Thursday, January 11, 2018

work with a custom schema with Spark loading CSV file

Here's how you can work with a custom schema, a complete demo:
Unix shell code,
echo "
Slingo, iOS 
Slingo, Android
" > game.csv
Scala code:
import org.apache.spark.sql.types._

val customSchema = StructType(Array(
  StructField("game_id", StringType, true),
  StructField("os_id", StringType, true)

val csv_df ="csv").schema(customSchema).load("game.csv") 

csv_df.orderBy(asc("game_id"), desc("os_id")).show
val sort_df = sql("select * from game_view order by game_id, os_id desc")