1.sparksqlのcase classによるDataFramesの作成(反射)
3568 ワード
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
object TestDataFrame1 {
def main(args: Array[String]): Unit = {
val conf=new SparkConf().setAppName("RDDToDataFrame").setMaster("local")
val sc=new SparkContext(conf)
val context=new SQLContext(sc)
// RDD, RDD case class
val peopleRDD=sc.textFile("D:\\Users\\shashahu\\Desktop\\work\\spark-2.4.4\\examples\\src\\main\\resources\\people.txt").map(line => People(line.split(",")(0), line.split(",")(1).trim.toInt))
// txt , txt
import context.implicits._
//
val df=peopleRDD.toDF() // RDD DataFrames
// DataFrame :
df.createOrReplaceTempView("people") //
// SQL
context.sql("select * from people").show()
}
case class People(var name:String,var age:Int)
}
/** sparksql txt
* :
*
*
* 19/11/12 20:07:13 INFO FileInputFormat: Total input paths to process : 1
* 19/11/12 20:07:13 INFO SparkContext: Starting job: show at TestDataFrame1.scala:21
* 19/11/12 20:07:13 INFO DAGScheduler: Got job 0 (show at TestDataFrame1.scala:21) with 1 output partitions
* 19/11/12 20:07:13 INFO DAGScheduler: Final stage: ResultStage 0 (show at TestDataFrame1.scala:21)
* 19/11/12 20:07:13 INFO DAGScheduler: Parents of final stage: List()
* 19/11/12 20:07:13 INFO DAGScheduler: Missing parents: List()
* 19/11/12 20:07:13 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[6] at show at TestDataFrame1.scala:21), which has no missing parents
* 19/11/12 20:07:13 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 10.0 KB, free 4.1 GB)
* 19/11/12 20:07:13 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.8 KB, free 4.1 GB)
* 19/11/12 20:07:13 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on dst001825.cn1.global.ctrip.com:53625 (size: 4.8 KB, free: 4.1 GB)
* 19/11/12 20:07:13 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1161
* 19/11/12 20:07:13 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[6] at show at TestDataFrame1.scala:21) (first 15 tasks are for partitions Vector(0))
* 19/11/12 20:07:13 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
* 19/11/12 20:07:13 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 7947 bytes)
* 19/11/12 20:07:13 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
* 19/11/12 20:07:13 INFO HadoopRDD: Input split: file:/D:/Users/shashahu/Desktop/work/spark-2.4.4/examples/src/main/resources/people.txt:0+32
* 19/11/12 20:07:13 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1264 bytes result sent to driver
* 19/11/12 20:07:13 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 123 ms on localhost (executor driver) (1/1)
* 19/11/12 20:07:13 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
* 19/11/12 20:07:13 INFO DAGScheduler: ResultStage 0 (show at TestDataFrame1.scala:21) finished in 0.222 s
* 19/11/12 20:07:13 INFO DAGScheduler: Job 0 finished: show at TestDataFrame1.scala:21, took 0.260352 s
* +-------+---+
* | name|age|
* +-------+---+
* |Michael| 29|
* | Andy| 30|
* | Justin| 19|
* +-------+---+
*
* 19/11/12 20:07:13 INFO SparkContext: Invoking stop() from shutdown hook
*/
公式サイトのtxtを使用してテーブルに変換します.関連コードとプレゼンテーションの結果は上記の通りです.