Spark1.2.0スタンドアロン環境構築
11883 ワード
1、二つの書類を用意する
2、解凍
3、scala環境変数の構成
構成完了、保存、終了、source/etc/profile
検出:
上記のインタフェースが表示されると、scalaのインストールに成功しました.
4、spark環境変数の構成
構成完了、保存、終了、source/etc/profile
5、sparkプロファイルの構成
私のsparkは解凍しています
6、起動
scala-2.11.4.tgz
spark-1.2.0-bin-hadoop1.tgz
spark :wget http://d3kbcqa49mib13.cloudfront.net/spark-1.2.0-bin-hadoop1.tgz
scala :wget http://downloads.typesafe.com/scala/2.11.4/scala-2.11.4.tgz?_ga=1.254444288.920772718.1430024679
2、解凍
tar -zvxf scala-2.11.4.tgz
tar -zvxf spark-1.2.0-bin-hadoop1.tgz
3、scala環境変数の構成
export SCALA_HOME=/usr/local/cdh/spark/scala-2.11.4
export PATH=$PATH:$SCALA_HOME/bin
構成完了、保存、終了、source/etc/profile
検出:
[root@localhost scala-2.11.4]# scala -version
Scala code runner version 2.11.4 -- Copyright 2002-2013, LAMP/EPFL
[root@localhost scala-2.11.4]# scala
Welcome to Scala version 2.11.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67).
Type in expressions to have them evaluated.
Type :help for more information.
scala>
上記のインタフェースが表示されると、scalaのインストールに成功しました.
4、spark環境変数の構成
export SPARK_HOME=/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1
export PATH=$PATH:$SPARK_HOME/bin
構成完了、保存、終了、source/etc/profile
5、sparkプロファイルの構成
私のsparkは解凍しています
/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1
[root@localhost spark-1.2.0-bin-hadoop1]# cd conf/
[root@localhost conf]# ls
fairscheduler.xml.template log4j.properties.template metrics.properties.template slaves.template spark-defaults.conf.template spark-env.
[root@localhost conf]# cp spark-env.sh.template spark-env.sh
[root@localhost conf]# vi spark-env.sh
:
export SCALA_HOME=/home/jifeng/hadoop/scala-2.11.4
export SPARK_MASTER_IP=martin
export SPARK_WORKER_MEMORY=2G
export JAVA_HOME=/usr/local/cdh/jdk1.7
6、起動
master
[root@localhost spark-1.2.0-bin-hadoop1]# sbin/start-master.sh
: :martin:8080
worker:
[root@localhost spark-1.2.0-bin-hadoop1]# sbin/start-slaves.sh park://martin:7077
7、インタラクティブモードに る
master=spark://martin:7077 ./bin/spark-shell
localhost: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/cdh/spark/spark-1.2.0-bin-hadoop1/sbin/../logs/spark-root-org.apache.spark.deploy.worker.Worker-1-localhost.localdomain.out
[root@localhost spark-1.2.0-bin-hadoop1]# master=spark://martin:7077 ./bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/04/25 21:26:54 INFO SecurityManager: Changing view acls to: root
15/04/25 21:26:54 INFO SecurityManager: Changing modify acls to: root
15/04/25 21:26:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/04/25 21:26:54 INFO HttpServer: Starting HTTP Server
15/04/25 21:26:55 INFO Utils: Successfully started service 'HTTP class server' on port 39275.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.2.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_67)
Type in expressions to have them evaluated.
Type :help for more information.
15/04/25 21:27:04 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 192.168.41.190 instead (on interface eth1)
15/04/25 21:27:04 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/04/25 21:27:04 INFO SecurityManager: Changing view acls to: root
15/04/25 21:27:04 INFO SecurityManager: Changing modify acls to: root
15/04/25 21:27:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
15/04/25 21:27:05 INFO Slf4jLogger: Slf4jLogger started
15/04/25 21:27:05 INFO Remoting: Starting remoting
15/04/25 21:27:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@martin:35821]
15/04/25 21:27:06 INFO Utils: Successfully started service 'sparkDriver' on port 35821.
15/04/25 21:27:06 INFO SparkEnv: Registering MapOutputTracker
15/04/25 21:27:06 INFO SparkEnv: Registering BlockManagerMaster
15/04/25 21:27:06 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150425212706-43cd
15/04/25 21:27:06 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/04/25 21:27:07 INFO HttpFileServer: HTTP File server directory is /tmp/spark-005e80bd-70fe-4be9-88c5-a60de6233c68
15/04/25 21:27:07 INFO HttpServer: Starting HTTP Server
15/04/25 21:27:07 INFO Utils: Successfully started service 'HTTP file server' on port 54723.
15/04/25 21:27:07 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/04/25 21:27:07 INFO SparkUI: Started SparkUI at http://martin:4040
15/04/25 21:27:08 INFO Executor: Using REPL class URI: http://192.168.41.190:39275
15/04/25 21:27:08 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@martin:35821/user/HeartbeatReceiver
15/04/25 21:27:08 INFO NettyBlockTransferService: Server created on 57860
15/04/25 21:27:08 INFO BlockManagerMaster: Trying to register BlockManager
15/04/25 21:27:08 INFO BlockManagerMasterActor: Registering block manager localhost:57860 with 267.3 MB RAM, BlockManagerId(, localhost, 57860)
15/04/25 21:27:08 INFO BlockManagerMaster: Registered BlockManager
15/04/25 21:27:09 INFO SparkILoop: Created spark context..
Spark context available as sc.
scala>
8、 カウントテスト
ファイルを してください.txt、 するディレクトリに val file=sc.textFile("/usr/temp/file.txt")
val count=file.flatMap(line=>line.split(" ")).map(word=>(word,1)).reduceByKey(_+_)
count.collect()
15/04/25 21:52:40 INFO SparkContext: Starting job: collect at :17
15/04/25 21:52:40 INFO DAGScheduler: Registering RDD 7 (map at :14)
15/04/25 21:52:40 INFO DAGScheduler: Got job 0 (collect at :17) with 1 output partitions (allowLocal=false)
15/04/25 21:52:40 INFO DAGScheduler: Final stage: Stage 1(collect at :17)
15/04/25 21:52:40 INFO DAGScheduler: Parents of final stage: List(Stage 0)
15/04/25 21:52:40 INFO DAGScheduler: Missing parents: List(Stage 0)
15/04/25 21:52:40 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at :14), which has no missing parents
15/04/25 21:52:40 INFO MemoryStore: ensureFreeSpace(3544) called with curMem=75526, maxMem=280248975
15/04/25 21:52:40 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.5 KB, free 267.2 MB)
15/04/25 21:52:40 INFO MemoryStore: ensureFreeSpace(2501) called with curMem=79070, maxMem=280248975
15/04/25 21:52:40 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KB, free 267.2 MB)
15/04/25 21:52:40 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:51130 (size: 2.4 KB, free: 267.3 MB)
15/04/25 21:52:40 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/04/25 21:52:40 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/04/25 21:52:40 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[7] at map at :14)
15/04/25 21:52:40 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/04/25 21:52:40 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1320 bytes)
15/04/25 21:52:40 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/04/25 21:52:40 INFO HadoopRDD: Input split: file:/usr/local/cdh/spark/spark-1.2.0-bin-hadoop1/Desktop/file.txt:0+61
15/04/25 21:52:41 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1895 bytes result sent to driver
15/04/25 21:52:41 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 355 ms on localhost (1/1)
15/04/25 21:52:41 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/04/25 21:52:41 INFO DAGScheduler: Stage 0 (map at :14) finished in 0.652 s
15/04/25 21:52:41 INFO DAGScheduler: looking for newly runnable stages
15/04/25 21:52:41 INFO DAGScheduler: running: Set()
15/04/25 21:52:41 INFO DAGScheduler: waiting: Set(Stage 1)
15/04/25 21:52:41 INFO DAGScheduler: failed: Set()
15/04/25 21:52:41 INFO DAGScheduler: Missing parents for Stage 1: List()
15/04/25 21:52:41 INFO DAGScheduler: Submitting Stage 1 (ShuffledRDD[8] at reduceByKey at :14), which is now runnable
15/04/25 21:52:41 INFO MemoryStore: ensureFreeSpace(2112) called with curMem=81571, maxMem=280248975
15/04/25 21:52:41 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.1 KB, free 267.2 MB)
15/04/25 21:52:41 INFO MemoryStore: ensureFreeSpace(1544) called with curMem=83683, maxMem=280248975
15/04/25 21:52:41 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 1544.0 B, free 267.2 MB)
15/04/25 21:52:41 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on localhost:51130 (size: 1544.0 B, free: 267.3 MB)
15/04/25 21:52:41 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/04/25 21:52:41 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:838
15/04/25 21:52:41 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (ShuffledRDD[8] at reduceByKey at :14)
15/04/25 21:52:41 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/04/25 21:52:41 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1056 bytes)
15/04/25 21:52:41 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/04/25 21:52:41 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
15/04/25 21:52:41 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 4 ms
15/04/25 21:52:41 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1168 bytes result sent to driver
15/04/25 21:52:41 INFO DAGScheduler: Stage 1 (collect at :17) finished in 0.126 s
15/04/25 21:52:41 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 131 ms on localhost (1/1)
15/04/25 21:52:41 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/04/25 21:52:41 INFO DAGScheduler: Job 0 finished: collect at :17, took 1.103095 s
res0: Array[(String, Int)] = Array((this,1), (is,1), (Hello,1), (haoop,1), (home,1), (book;,1), ("",1), (World,1), (j2ee,1), (JAVA,1), (HADOOP,1), (my,1))
9、サービスを します./sbin/stop-master.sh
[root@localhost spark-1.2.0-bin-hadoop1]# sbin/stop-master.sh
stopping org.apache.spark.deploy.master.Master