MysqlデータのHDFSへのリアルタイム同期


環境準備
クラスタの状況
10.40.10.246 dbtest1
10.40.10.247 dbtest2
10.40.10.248 dbtest3

mysqlの構成
修正my.cnfは、次の構成を追加します.
[mysqld]
server_id=1
log-bin=master
binlog_format=row

mysqlを再起動し、sevice mysql restartデータベースを変更し、maxwellをサポート
mysql> set global binlog_format=ROW;
mysql> set global binlog_row_image=FULL;
mysql> GRANT ALL on maxwell.* to 'maxwell'@'%' identified by 'XXXXXX';
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'%';

# or for running maxwell locally:

mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'localhost' identified by 'XXXXXX';
mysql> GRANT ALL on maxwell.* to 'maxwell'@'localhost';

maxwellのインストールと起動
wget https://github.com/zendesk/maxwell/releases/download/v1.17.1/maxwell-1.17.1.tar.gz
tar -zxvf maxwell-1.17.1.tar.gz
cd maxwell-1.17.1
bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' --producer=stdout

zookeeperとkafkaのインストール
zookeeper https://blog.csdn.net/yuandiyzy1987/article/details/81564267 kafka https://blog.csdn.net/yuandiyzy1987/article/details/81564292
flumeのインストール
flume https://blog.csdn.net/yuandiyzy1987/article/details/81564322
開始
maxwellの起動
maxwellを起動しmysqlのリアルタイムデータをkafkaに書き込む
  • kafka topic bin/kafka-topics.sh --create --zookeeper dbtest1:2181,dbtest2:2181,dbtest3:2181/kafka --replication-factor 3 --partitions 5 --topic my-replicated-topic5
  • を作成する
  • topic情報bin/kafka-topics.sh --describe --zookeeper dbtest1:2181,dbtest2:2181,dbtest3:2181/kafka --topic my-replicated-topic5
  • を問い合わせる
  • maxwell bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' \
    --producer=kafka --kafka.bootstrap.servers=localhost:9092 --kafka_topic=my-replicated-topic5
  • を起動する
    mysqlのリアルタイムデータ書き込みkafkaの検証
    別の端末でConsumerを起動し、私たちが作成したmy-replicated-topic 5という名前のTopicで生産されたメッセージを購読し、bin/kafka-console-consumer.sh --zookeeper dbtest1:2181,dbtest2:2181,dbtest3:2181/kafka --from-beginning --topic my-replicated-topic5がmysqlにデータを書き込み、Consumerにデータ出力があるかどうかを確認するスクリプトを実行します.
    Consumer端末を閉じる
    flumeの起動
    kafkaのデータをflume事実により消費する、HDFS bin/flume-ng agent --conf conf --conf-file conf/flume-agent.properties --name a1 -Dflume.root.logger=INFO,console conf/flume-agentに書き込む.propertiesの構成は次のとおりです.
    #  agent , source、channel、sink   
    agent.sources = r1
    agent.channels = c1
    agent.sinks = k1
    
    #    source
    #        
    agent.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
    #   kafka  zk   
    agent.sources.r1.zookeeperConnect = dbtest1:2181,dbtest2:2182,dbtest3:2183
    agent.sources.r1.kafka.bootstrap.servers = dbtest1:9092,dbtest2:9093,dbtest3:9094
    agent.sources.r1.brokerList = dbtest1:9092,dbtest2:9093,dbtest3:9094
    #      kafka topic
    agent.sources.r1.topic = my-replicated-topic5
    #agent.sources.r1.kafka.consumer.timeout.ms = 100
    #        id
    agent.sources.r1.kafka.consumer.group.id = flume
    
    #      
    #agent.sources.r1.interceptors=i1
    #agent.sources.r1.interceptors.i1.type=com.hadoop.flume.FormatInterceptor$Builder
    
    #    channel
    # channel  
    agent.channels.c1.type = memory
    # channel       
    agent.channels.c1.capacity = 10000
    #     
    agent.channels.c1.transactionCapacity = 100
    #    sink
    agent.sinks.k1.type = hdfs
    agent.sinks.k1.hdfs.path = hdfs://dbtest1:8020/test/%Y%m%d 
    agent.sinks.k1.hdfs.fileType = DataStream
    agent.sinks.k1.hdfs.writeFormat = Text
    agent.sinks.k1.hdfs.rollInterval = 3
    agent.sinks.k1.hdfs.rollSize = 1024000
    agent.sinks.k1.hdfs.rollCount = 0
    
    #       
    agent.sinks.k1.hdfs.fileSuffix=.data
    agent.sinks.k1.hdfs.filePrefix = localhost-%Y-%m-%d
    
    agent.sinks.k1.hdfs.useLocalTimeStamp = true
    agent.sinks.k1.hdfs.idleTimeout = 60
    
    #              
    #agent.sinks.k1.hdfs.inUserPrefix=_
    #agent.sinks.k1.hdfs.inUserSuffix=
    
    #  channels
    agent.sources.r1.channels = c1
    agent.sinks.k1.channel = c1

    検証#ケンショウ#
    mysqlデータを書き込み、HDFSでのデータのリアルタイム生成を確認
    [root@dbtest1 hadoop-2.6.0-cdh5.10.2]# bin/hdfs dfs -ls -R /      
    18/08/10 09:44:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    drwxr-xr-x   - root supergroup          0 2018-08-10 09:43 /test
    drwxr-xr-x   - root supergroup          0 2018-08-09 18:58 /test/20180809
    -rw-r--r--   3 root supergroup        440 2018-08-09 18:58 /test/20180809/localhost-2018-08-09.1533812290352
    drwxr-xr-x   - root supergroup          0 2018-08-10 09:44 /test/20180810
    -rw-r--r--   3 root supergroup        440 2018-08-10 09:43 /test/20180810/localhost-2018-08-10.1533865430145
    -rw-r--r--   3 root supergroup        330 2018-08-10 09:44 /test/20180810/localhost-2018-08-10.1533865472399