flink-概要(一)

13334 ワード

flak

1、flink-概要(一)
1.定義:flinkはフレームワークと分散処理エンジンであり、無境界と有界データストリームの状態計算に用いる.特徴:低遅延、高スループット、結果の正確性と許容誤差3.flinkの主なメリット:

イベント駆動:各メッセージまたはレコードはイベント

である.

ストリームの世界観に基づく:すべてはストリームからなり、オフラインデータは境界のあるストリームであり、リアルタイムデータは境界のないストリーム、すなわち境界と境界のないストリーム

である.

階層のAPI:最上位層ほど抽象的で、表現の意味が簡明で、使用が便利である.底層が具体的であればあるほど、表現能力が豊富になり、柔軟に使用できる

4.その他の特徴:

は、イベント時間(event-time)および処理時間(process-time)の意味

をサポートする.

一次(exactly-once)の状態整合性保証

低遅延、毎秒数百万件のイベントを処理、ミリ秒レベルの遅延

と多くの一般的なストレージシステムとのリンク

24,5679172高可用性、ダイナミック拡張、7*24時間24時間365日24,5679182
5.finkとsparkの比較

データモデル

sparkはRDDモデルを採用し、streamingのDStreamも実際には小ロットデータRDDのセット

である.

flink基本データモデルは、データストリームおよびイベントシーケンス

である.

ランタイムアーキテクチャ

sparkはバッチ計算で、DAGを異なるステージに分割し、1つが完了してから次の

を計算することができます.

flinkは標準的なストリーム実行モードであり、1つのイベントは1つのノードの処理が完了すると、次の点に直接送信して処理

を行うことができる.

6.基礎API

pom

 <dependencies> 
        <dependency> 
            <groupId>org.apache.flinkgroupId> 
            <artifactId>flink-scala_2.11artifactId> 
            <version>1.7.2version> 
        dependency> 
         
        <dependency> 
            <groupId>org.apache.flinkgroupId> 
            <artifactId>flink-streaming-scala_2.11artifactId> 
            <version>1.7.2version> 
        dependency> 
    dependencies> 


<build> 
    <plugins> 
     
    <plugin> 
        <groupId>net.alchim31.mavengroupId> 
        <artifactId>scala-maven-pluginartifactId> 
        <version>3.4.6version> 
        <executions> 
            <execution> 
                 
                <goals> 
                    <goal>testCompilegoal> 
                goals> 
            execution> 
        executions> 
    plugin> 
        <plugin> 
            <groupId>org.apache.maven.pluginsgroupId> 
            <artifactId>maven-assembly-pluginartifactId> 
            <version>3.0.0version> 
            <configuration> 
                <descriptorRefs> 
                    <descriptorRef>jar-with-dependenciesdescriptorRef> 
                descriptorRefs> 
            configuration> 
            <executions> 
                <execution> 
                    <id>make-assemblyid> 
                    <phase>packagephase> 
                    <goals> 
                        <goal>singlegoal> 
                    goals> 
                execution> 
            executions> 
        plugin> 
    plugins> 
build> 
project>

バッチwordcount

 def main(args: Array[String]): Unit = { 
    //        
    val env = ExecutionEnvironment.getExecutionEnvironment 
    //          
    val inputPath = "D:\\Projects\\BigData\\TestWC1\\src\\main\\resources\\hello.txt" 
    val inputDS: DataSet[String] = env.readTextFile(inputPath) 
    val wordCountDS: AggregateDataSet[(String, Int)] = inputDS.flatMap(_.split(" 
")).map((_, 1)).groupBy(0).sum(1) 
 
    //      
    wordCountDS.print() 
  } 
}

ストリーム処理

 //            
    val params: ParameterTool =  ParameterTool.fromArgs(args) 
    val host: String = params.get("host") 
    val port: Int = params.getInt("port") 
 
    //         
    val env = StreamExecutionEnvironment.getExecutionEnvironment 
    //    socket
    
    val textDstream: DataStream[String] = env.socketTextStream(host, port) 
 
    // flatMap
  Map
          
    import org.apache.flink.api.scala._ 
    val dataStream: DataStream[(String, Int)] = 
textDstream.flatMap(_.split("\\s")).filter(_.nonEmpty).map((_, 1)).keyBy(0).sum(1) 
  dataStream.print().setParallelism(1) 
 
    //    executor ，     
    env.execute("Socket stream word count") 
  } 
}

TCP/IP　その1

補完1