on Dockerで分散型機械学習を始める - Part 3: ローカルモードでspark-shellを起動する





FROM java:openjdk-7-jdk

RUN curl -s http://ftp.jaist.ac.jp/pub/apache/spark/spark-1.5.1/spark-1.5.1-bin-hadoop2.6.tgz | tar -xz -C /usr/local/
RUN cd /usr/local && ln -s spark-1.5.1-bin-hadoop2.6 spark

WORKDIR /usr/local/spark
RUN cd conf && cp log4j.properties.template log4j.properties && \
    sed -i 's/log4j.rootCategory=INFO/log4j.rootCategory=WARN/' log4j.properties

CMD ["/bin/bash"]


$ docker build -t spark-local .
$ docker run --rm -it spark-local
$ ./bin/spark-shell --master local[*]
15/11/29 16:35:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.2

Using Scala version 2.10.4 (OpenJDK 64-Bit Server VM, Java 1.7.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
15/11/29 16:35:18 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set.
Spark context available as sc.
15/11/29 16:35:23 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/29 16:35:24 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
 (or one of dependencies)
15/11/29 16:35:33 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/11/29 16:35:33 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
15/11/29 16:35:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/11/29 16:35:39 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/29 16:35:39 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
15/11/29 16:35:46 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
15/11/29 16:35:47 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.



scala> val lines = sc.textFile("LICENSE")
lines: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at <console>:21

scala> val count = lines.count
count: Long = 294

scala> val sparks = lines.filter(line => line.contains("Spark"))
sparks: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[8] at filter at <console>:23

scala> sparks.count
res7: Long = 3

scala> sparks.foreach(println)
Apache Spark Subcomponents:
The Apache Spark project contains subcomponents with separate copyright
        except for Main.Scala, SparkHelper.scala and ExecutorClassLoader.scala),
