Mac上にApacheSparkを試しに触ってみるための環境を構築する手順


インストール

brewでインストール

brew install apache-spark

パスを通す

  • インストール先を確認
brew info apache-spark

(結果)
apache-spark: stable 1.6.1, HEAD
Engine for large-scale data processing
https://spark.apache.org/
/usr/local/Cellar/apache-spark/1.6.1 (842 files, 312.4M) *
  Built from source on 2016-07-29 at 23:32:40
From: https://github.com/Homebrew/homebrew-core/blob/master/Formula/apache-spark.rb
  • .bash_profileに追記
echo 'export SPARK_HOME=/usr/local/Cellar/apache-spark/1.6.1' >> ~/.bash_profile
echo 'export PATH=${PATH}:${SPARK_HOME}/bin'                  >> ~/.bash_profile
source ~/.bash_profile

確認

which spark-shell

(結果)
/usr/local/bin/spark-shell

起動してみる

spark-shell

  • spark-shell実行
spark-shell

(結果)
spark-shell
/usr/local/Cellar/apache-spark/1.6.1/bin/load-spark-env.sh: line 2: /usr/local/Cellar/apache-spark/1.6.1/libexec/bin/load-spark-env.sh: Permission denied
/usr/local/Cellar/apache-spark/1.6.1/bin/load-spark-env.sh: line 2: exec: /usr/local/Cellar/apache-spark/1.6.1/libexec/bin/load-spark-env.sh: cannot execute: Undefined error: 0
  • 怒られた。。。
  • この記事を参考にコマンド叩いてみる
unset SPARK_HOME && spark-submit
  • spark-shellを再度実行
spark-shell

(結果)
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.properties
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_40)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc.
16/07/30 00:00:20 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/07/30 00:00:20 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/07/30 00:00:24 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/07/30 00:00:24 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/07/30 00:00:26 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/07/30 00:00:26 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
SQL context available as sqlContext.

scala>
  • 確認できたので抜ける
scala> :quit

Stopping spark context.
  • さっき実行した「unset SPARK_HOME && spark-submit」を、/dev/nullリダイレクト付きで~/.bash_profileに追加しておく
echo 'unset SPARK_HOME && spark-submit > /dev/null 2>&1'   >> ~/.bash_profile

pyspark

  • pyspark実行
pyspark

(結果)
Python 2.7.10 (default, Oct 23 2015, 19:19:21)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/07/30 00:02:43 INFO SparkContext: Running Spark version 1.6.1
16/07/30 00:02:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/07/30 00:02:44 INFO SecurityManager: Changing view acls to: LowSE01
16/07/30 00:02:44 INFO SecurityManager: Changing modify acls to: LowSE01
16/07/30 00:02:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(LowSE01); users with modify permissions: Set(LowSE01)
16/07/30 00:02:44 INFO Utils: Successfully started service 'sparkDriver' on port 65126.
16/07/30 00:02:45 INFO Slf4jLogger: Slf4jLogger started
16/07/30 00:02:45 INFO Remoting: Starting remoting
16/07/30 00:02:45 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:65127]
16/07/30 00:02:45 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 65127.
16/07/30 00:02:45 INFO SparkEnv: Registering MapOutputTracker
16/07/30 00:02:45 INFO SparkEnv: Registering BlockManagerMaster
16/07/30 00:02:45 INFO DiskBlockManager: Created local directory at /private/var/folders/pq/m1yrrt652vg03wf5q66xk0m00000gn/T/blockmgr-47e0a926-d889-4514-9be9-c5da7aaaeb63
16/07/30 00:02:45 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/07/30 00:02:45 INFO SparkEnv: Registering OutputCommitCoordinator
16/07/30 00:02:45 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/07/30 00:02:45 INFO SparkUI: Started SparkUI at http://192.168.179.3:4040
16/07/30 00:02:46 INFO Executor: Starting executor ID driver on host localhost
16/07/30 00:02:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 65128.
16/07/30 00:02:46 INFO NettyBlockTransferService: Server created on 65128
16/07/30 00:02:46 INFO BlockManagerMaster: Trying to register BlockManager
16/07/30 00:02:46 INFO BlockManagerMasterEndpoint: Registering block manager localhost:65128 with 511.1 MB RAM, BlockManagerId(driver, localhost, 65128)
16/07/30 00:02:46 INFO BlockManagerMaster: Registered BlockManager
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 1.6.1
      /_/

Using Python version 2.7.10 (default, Oct 23 2015 19:19:21)
SparkContext available as sc, HiveContext available as sqlContext.
>>>
  • 確認できたので抜ける
>>> quit()

16/07/30 00:03:52 INFO SparkUI: Stopped Spark web UI at http://192.168.179.3:4040
16/07/30 00:03:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/07/30 00:03:52 INFO MemoryStore: MemoryStore cleared
16/07/30 00:03:52 INFO BlockManager: BlockManager stopped
16/07/30 00:03:52 INFO BlockManagerMaster: BlockManagerMaster stopped
16/07/30 00:03:52 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/07/30 00:03:52 INFO SparkContext: Successfully stopped SparkContext
16/07/30 00:03:52 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/07/30 00:03:52 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/07/30 00:03:52 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
16/07/30 00:03:52 INFO ShutdownHookManager: Shutdown hook called
16/07/30 00:03:52 INFO ShutdownHookManager: Deleting directory /private/var/folders/pq/m1yrrt652vg03wf5q66xk0m00000gn/T/spark-4f1301d3-b185-4882-b24f-51454a0f575d
16/07/30 00:03:52 INFO ShutdownHookManager: Deleting directory /private/var/folders/pq/m1yrrt652vg03wf5q66xk0m00000gn/T/spark-4f1301d3-b185-4882-b24f-51454a0f575d/pyspark-42c0de51-6381-4400-80f2-392e6aa2f1d0