spark2.2.0クラスタ構成

5780 ワード

Bigdata Spark

1、クラスタモードの紹介

(1)Local
eclipse、ideaでプログラムテストを書くなど、ローカルテストに多く使用されます.
(2)Standalone
StandaloneはSparkが所有するリソーススケジューリングフレームワークで、完全な分散をサポートしています.
(3)Yarn
Hadoop生態圏の中の資源スケジューリングの枠組みで、SparkはYarnに基づいて計算することができて、最も流行しています.
(4) Mesos
dockerをサポートするリソーススケジューリングフレームワークで、将来性が最も優れています.

2、資源配分

ここでは5台のマシン、1つのMasterリソーススケジューリング、3つのWorkerでタスクを処理し、1つのCientでタスクを提出します.

NameNode
DataNode
Zookeeper
DFSZKFC
JournalNode
Master
Worker
Client
node01
1

1

1

node02

1
1

1

1

node03

1
1

1

1

node04

1
1

1

1

node05
1

1

1

3、クラスタ構成

(1)ダウンロード解凍
ダウンロードhttp://spark.apache.org/downloads.html
tar-zxvf spark-2..0-bin-hadoop 2を解凍する.7.tgz
mv spark-2..0-bin-hadoop 2と名前を変更します.7.tgz spark-2.2.0
(2)構成
node 01を例に
/opt/bigdata/spark-2..0/conf/下へ

はspark-envを配置する.sh

spark環境変数cp spark-envをコピーします.sh.template spark-env.sh
vim spark-env.sh,以下の内容を構成し,その他はデフォルトのままでよい

# Options for the daemons used in the standalone deploy mode
# - SPARK_MASTER_HOST, to bind the master to a different IP address or hostname
# - SPARK_MASTER_PORT / SPARK_MASTER_WEBUI_PORT, to use non-default ports for the master
# - SPARK_MASTER_OPTS, to set config properties only for the master (e.g. "-Dx=y")
# - SPARK_WORKER_CORES, to set the number of cores to use on this machine
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
# - SPARK_WORKER_PORT / SPARK_WORKER_WEBUI_PORT, to use non-default ports for the worker
# - SPARK_WORKER_DIR, to set the working directory of worker processes
# - SPARK_WORKER_OPTS, to set config properties only for the worker (e.g. "-Dx=y")
# - SPARK_DAEMON_MEMORY, to allocate to the master, worker and history server themselves (default: 1g).
# - SPARK_HISTORY_OPTS, to set config properties only for the history server (e.g. "-Dx=y")
# - SPARK_SHUFFLE_OPTS, to set config properties only for the external shuffle service (e.g. "-Dx=y")
# - SPARK_DAEMON_JAVA_OPTS, to set config properties for all daemons (e.g. "-Dx=y")
# - SPARK_PUBLIC_DNS, to set the public dns name of the master or workers
export SPARK_MASTER_HOST=node01  # 
export SPARK_MASTER_PORT=7077    #spark-master ， 7077 
exprot SPARK_MASTER_WEBUI_PORT=8080 #master http ， 8080
export SPARK_WORKER_CORES=2      # worker 
export SPARK_WORKER_MEMORY=1g    # worker 
export HADOOP_CONF_DIR=/opt/bigdata/hadoop-2.7.4/etc/hadoop  #hadoop ，master、worker ， client 
export JAVA_HOME=/usr/local/jdk1.8

slaves

の構成
spark環境変数cp slavesをコピーします.template slaves
vim slaves、以下の内容を構成します

# A Spark Worker will be started on each of the machines listed below.
#localhost
node02
node03
node04

spark-defaultsを構成する.conf

cp spark-defaultsをコピーする.conf.template spark-defaults.conf
vim spark-defaults.conf,以下の内容を構成する

spark.yarn.jars = hdfs://mycluster/spark/jars/*

保存jarディレクトリhdfs dfs-mkdir/spark/jarsの作成
jarパッケージhdfs dfs-put/opt/bigdata/spark 2をアップロードします.2.0/jars/*/spark/jars
これは必須ではありません.jarパッケージをアップロードすると、タスクがコミットされるたびにクラスタjarパッケージをアップロードしないため、時間とリソースを節約できます.
この構成はclientノードで構成できます.他のノードは構成する必要はありません.
(3)配布構成
/opt/bigdata/spark 2.2.0,配布2,3,4,5ノード

scp spark2.2.0 node02:`pwd`
scp spark2.2.0 node03:`pwd`
scp spark2.2.0 node04:`pwd`
scp spark2.2.0 node05:`pwd`

4、クラスタの起動と停止

opt/hadoop/spark-2.20/sbinで
(1) ./start-all.sh起動
(2)./stop-all.sh停止

5、任務の提出

クライアントはクラスタになく、クラスタリソースを占有しないため、コミット時にクライアントにコミットする必要があります.
(1)Standalone-client提出(テストに適用)

nohup ./spark-submit --master spark://node01:7077 --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
 
nohup ./spark-submit --master spark://node01:7077 --deploy-mode client --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

(2)Standalone-cluster提出(生産に適用)

nohup ./spark-submit --master spark://node01:7077 --deploy-mode cluster --executor-memory 1G --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

(3)YARN-client提出(テストに適用)
yarnコミットはhadoopクラスタにタスクをコミットしたyarnが管理するのでhadoopクラスタを起動します
この場合、sparkクラスタに依存しないため、sparkクラスタは停止でき、clientマシンでタスクをコミットするだけでよい(以下同)

nohup ./spark-submit --master yarn --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
 
nohup ./spark-submit --master yarn-client --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
 
nohup ./spark-submit --master yarn --deploy-mode client --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

(3)YARN-cluster提出(生産に適用)

nohup ./spark-submit --master yarn-cluster --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &
 
nohup ./spark-submit --master yarn --deploy-mode cluster --executor-memory 1G  --class org.apache.spark.examples.SparkPi ../examples/jars/spark-examples_2.11-2.2.0.jar 20 &

Python辞書dictの基本的な使用

swagger 2フロントページテストインタフェース、パラメータタイプpath時報エラー問題required field is not provided解決方法