Scala版Sparkセットアップ
忘備録
【OS】
今回はCentOS6.6_x86_64版を使用。詳細は以下を参照。
http://centos.server-manual.com/
事前準備
セットアップに必要なパッケージを事前に設定しておく必要がある。以下を全て設定する。
システム変更が発生するので管理者権限が必須。rootにsuしておく事。
【YUMパッケージ管理】
yum -y install yum-plugin-fastestmirror
yum -y update
yum -y groupinstall "Base" "Development tools" "Japanese Support"
[RPMforgeリポジトリ追加]
rpm --import http://apt.sw.be/RPM-GPG-KEY.dag.txt
rpm -ivh http://apt.sw.be/redhat/el6/en/x86_64/rpmforge/RPMS/rpmforge-release-0.5.3-1.el6.rf.x86_64.rpm
[EPELリポジトリ追加]
rpm --import http://dl.fedoraproject.org/pub/epel/RPM-GPG-KEY-EPEL-6
rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
[ELRepoリポジトリ追加]
rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
[Remiリポジトリ追加]
rpm --import http://rpms.famillecollet.com/RPM-GPG-KEY-remi
rpm -ivh http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
【SELinux無効化】
getenforce
Enforcing ←SELinux有効
setenforce 0
getenforce
Permissive ←SELinux無効
vi /etc/sysconfig/selinux
SELINUX=enforcing
SELINUX=disabled ←変更(起動時に無効にする)
【iptablesでHTTPを許可】
vi /etc/sysconfig/iptables
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 80 -j ACCEPT ←追加
-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT
Iptables再起動
service iptables restart
【JAVA】
CentOS構築時にデフォルトでインストールされたバージョンをアンインストール。
yum erase java*
最新版をネットより入手(rpm版)しインストール
rpm –ivh jdk-8u45-linux-x64.rpm
バージョン確認
java –version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
■JAVA_HOME設定
vi /etc/profile
export JAVA_HOME=/usr/java/default
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
【前提条件】
Standalone modeで稼働させるため今回はビルドは行わないものとします
【Scala】
cd /usr/local/src
wget http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
tar -zxvf scala-2.11.7.tgz
chown -R root:root scala-2.11.7
mv scala-2.11.7 ../scala
【Spark】
wget http://ftp.riken.jp/net/apache/spark/spark-1.4.0/spark-1.4.0-bin-cdh4.tgz
tar -zxvf spark-1.4.0-bin-cdh4.tgz
chown -R root:root spark-1.4.0-bin-cdh4
mv spark-1.4.0-bin-cdh4 ../spark
環境変数を追記
vi /etc/profile
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export PATH=$SCALA_HOME/bin:$PATH
source /etc/profile
確認
cd $SPARK_HOME
./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
14/10/01 05:53:08 INFO SecurityManager: Changing view acls to: hdspark,
14/10/01 05:53:08 INFO SecurityManager: Changing modify acls to: hdspark,
14/10/01 05:53:08 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdspark, ); users with modify permissions: Set(hdspark, )
14/10/01 05:53:08 INFO HttpServer: Starting HTTP Server
14/10/01 05:53:09 INFO Utils: Successfully started service 'HTTP class server' on port 33066.
Welcome to
____ __
/ / ___ ____/ /_
\ \/ _ \/ _ `/ _/ '/
// ./_,// //_\ version 1.4.0
//
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
14/10/01 05:53:22 INFO SecurityManager: Changing view acls to: hdspark,
14/10/01 05:53:22 INFO SecurityManager: Changing modify acls to: hdspark,
14/10/01 05:53:22 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdspark, ); users with modify permissions: Set(hdspark, )
14/10/01 05:53:24 INFO Slf4jLogger: Slf4jLogger started
14/10/01 05:53:24 INFO Remoting: Starting remoting
14/10/01 05:53:25 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@localhost:36288]
14/10/01 05:53:25 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@localhost:36288]
14/10/01 05:53:25 INFO Utils: Successfully started service 'sparkDriver' on port 36288.
14/10/01 05:53:25 INFO SparkEnv: Registering MapOutputTracker
14/10/01 05:53:25 INFO SparkEnv: Registering BlockManagerMaster
14/10/01 05:53:25 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20141001055325-22ac
14/10/01 05:53:26 INFO Utils: Successfully started service 'Connection manager for block manager' on port 56196.
14/10/01 05:53:26 INFO ConnectionManager: Bound socket to port 56196 with id = ConnectionManagerId(localhost,56196)
14/10/01 05:53:26 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
14/10/01 05:53:26 INFO BlockManagerMaster: Trying to register BlockManager
14/10/01 05:53:26 INFO BlockManagerMasterActor: Registering block manager localhost:56196 with 267.3 MB RAM
14/10/01 05:53:26 INFO BlockManagerMaster: Registered BlockManager
14/10/01 05:53:26 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a33f43d9-37da-4c9e-a0b8-71b117b37012
14/10/01 05:53:26 INFO HttpServer: Starting HTTP Server
14/10/01 05:53:26 INFO Utils: Successfully started service 'HTTP file server' on port 54714.
14/10/01 05:53:27 INFO Utils: Successfully started service 'SparkUI' on port 4040.
14/10/01 05:53:27 INFO SparkUI: Started SparkUI at http://localhost:4040
14/10/01 05:53:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/10/01 05:53:29 INFO Executor: Using REPL class URI: http://localhost:33066
14/10/01 05:53:29 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@localhost:36288/user/HeartbeatReceiver
14/10/01 05:53:30 INFO SparkILoop: Created spark context..
Spark context available as sc.
scala>
//簡単な行数カウントを実行してみます
scala> val txtFile = sc.textFile("README.md")
14/10/01 05:56:17 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
14/10/01 05:56:17 INFO MemoryStore: ensureFreeSpace(156973) called with curMem=0, maxMem=280248975
14/10/01 05:56:17 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 153.3 KB, free 267.1 MB)
txtFile: org.apache.spark.rdd.RDD[String] = ../README.md MappedRDD[1] at textFile at <console>:12
scala> txtFile.count()
14/10/01 05:56:29 INFO FileInputFormat: Total input paths to process : 1
14/10/01 05:56:29 INFO SparkContext: Starting job: count at <console>:15
14/10/01 05:56:29 INFO DAGScheduler: Got job 0 (count at <console>:15) with 1 output partitions (allowLocal=false)
14/10/01 05:56:29 INFO DAGScheduler: Final stage: Stage 0(count at <console>:15)
14/10/01 05:56:29 INFO DAGScheduler: Parents of final stage: List()
14/10/01 05:56:29 INFO DAGScheduler: Missing parents: List()
14/10/01 05:56:29 INFO DAGScheduler: Submitting Stage 0 (../README.md MappedRDD[1] at textFile at <console>:12), which has no missing parents
14/10/01 05:56:29 INFO MemoryStore: ensureFreeSpace(2384) called with curMem=156973, maxMem=280248975
14/10/01 05:56:29 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.3 KB, free 267.1 MB)
14/10/01 05:56:29 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (../README.md MappedRDD[1] at textFile at <console>:12)
14/10/01 05:56:29 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
14/10/01 05:56:29 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1207 bytes)
14/10/01 05:56:29 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
14/10/01 05:56:29 INFO HadoopRDD: Input split: file:/usr/local/spark/README.md:0+4811
14/10/01 05:56:29 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
14/10/01 05:56:29 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
14/10/01 05:56:29 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
14/10/01 05:56:29 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
14/10/01 05:56:29 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
14/10/01 05:56:30 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1731 bytes result sent to driver
14/10/01 05:56:30 INFO DAGScheduler: Stage 0 (count at <console>:15) finished in 0.462 s
14/10/01 05:56:30 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 423 ms on localhost (1/1)
14/10/01 05:56:30 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/10/01 05:56:30 INFO SparkContext: Job finished: count at <console>:15, took 0.828128221 s
res0: Long = 141
//成功!
Author And Source
この問題について(Scala版Sparkセットアップ), 我々は、より多くの情報をここで見つけました https://qiita.com/nagomu1985/items/82547328becd8badd7a7著者帰属:元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。
Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .