MRによる二次ソート

1857 ワード

二次ソート、すなわち、入力中に2列のデータが存在し、第1列のデータに優先的にソートされ、第1列が同じ場合は第2列のデータに従ってソートされ、複数の第1列と第2列が同じデータが存在する可能性があり、保持に注意する.

MRのソートメカニズムを利用して、k 2,k 3でソートを実現することができ、このメカニズムを十分に利用して二次ソートを実現することができ、難易度は2列のデータを同時に参照することであり、この場合、1行の2列の値をbeanにカプセル化することができ、beanでcomparTo方法を設計し、比較規則を指定し、二次ソート

を実現する

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

import org.apache.hadoop.io.WritableComparable;

public class NumBean implements WritableComparable{
        private int n1;
        private int n2;
        
        public NumBean() {
        }

        public NumBean(int n1, int n2) {
                this.n1 = n1;
                this.n2 = n2;
        }

        public int getN1() {
                return n1;
        }
        public void setN1(int n1) {
                this.n1 = n1;
        }
        public int getN2() {
                return n2;
        }
        public void setN2(int n2) {
                this.n2 = n2;
        }

        @Override
        public void write(DataOutput out) throws IOException {
                out.writeInt(n1);
                out.writeInt(n2);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
                this.n1 = in.readInt();
                this.n2 = in.readInt();
        }

        @Override
        public int compareTo(NumBean o) {
                //-- ， 
                if(this.n1 != o.n1){
                        return o.n1 - this.n1;
                }else{//--   
                        if(this.n2 != o.n2){
                                return this.n2 - o.n2;
                        }else{//--     ，
                                  //-- 0  reducer   ， 0 
                                return -1;
                        }
                }
        }
        
        
}

バックトラッキングへの究極のガイド

ノード.JS 18 :取得API、テストランナーモジュール、および詳細