MongoDBグループ

23380 ワード

NOSQL

MongoDBの3種類のグループ化方式

group(フィルタリングしてからグループ化し、スライスをサポートせず、データ量に制限があり、効率が高くない)【単純グループ実測150 W 12.5 s】

mapreduce(jsエンジンに基づいて、単一スレッドで実行され、効率が低く、バックグラウンド統計などに適している)【単純パケット実測150 W 28.5 s】

aggregate(推奨)[簡易パケット実測150 W 2.6 s]

group

db.ad_play_log.group({
    // https://docs.mongodb.org/manual/reference/method/db.collection.group/
    // https://docs.mongodb.org/manual/reference/command/group/#dbcmd.group
    key: {
        //      
        ad_position_id: 1
    },
    cond: {
        // WHERE  
        ord_dt: {
            $gt: new Date('01/01/2012')
        }
    },
    reduce: function (curr, result) {
        result.count++;
    },
    initial: {
        count: 0
    }
});

// SELECT ad_play_log, SUM(material_id) as total
// FROM orders
// WHERE ord_dt > '01/01/2012'
// GROUP BY ad_position_id

MapReduce

db.runCommand({
    mapreduce: "ad_play_log",
    map: function Map() {
        var key = {
            ad_position_id: this.ad_position_id
        };
        var value = {
            count: 1
        };

        /**
         * key value   reduce    
         * @param key
         * @param value
         */
        emit(key, value);
    },
    reduce: function Reduce(key, values) {
        var ret = {
            count: 0
        };
        for (var i in values) {
            ret.count += 1;
        }
        return ret;
    },
    out: {
        inline: 1
    }
});

Mongodb公式サイトはMapReduceについて紹介します.
Map/reduce in MongoDB is useful for batch processing of data and aggregation operations. It is similar in spirit to using something like Hadoop with all input coming from a collection and output going to a collection. Often, in a situation where you would have used GROUP BY in SQL, map/reduce is the right tool in MongoDB.
MongodbのMap/reduceは主にデータの一括処理と集約操作に用いられており、Hadoopを用いて集合データを処理するのと少し似ており、すべての入力データは集合から取得され、MapReduce後に出力されたデータも集合に書き込まれる.通常、SQLでGroupBy文を使用するのと同じです.

MapReduceを使用して、MapとReduceの2つの関数を実装します.Map関数はemit(key,value)を呼び出して集合内のすべてのレコードを遍歴し,keyとvalueをReduce関数に渡して処理する.Map関数とReduce関数はJavascriptを用いて記述され、dbを通過することができる.runCommandまたはmapreduceコマンドは、MapReduce操作を実行します.
Aggregate

db.ad_play_log.aggregate(
    {
        //https://docs.mongodb.org/manual/reference/method/db.collection.aggregate/
        //   
        $group: {
            //   ad_position_id  
            _id: "$ad_position_id",
            count: {
                //     count
                $sum: 1
            },
            total: {
                //  material_id  
                $sum: "$material_id"
            }
        }
    },
    {
        $sort: {
            //   ad_position_id  ,-1    
            _id: -1
        }
    },
    {
        //     ,  
        $limit: 10
    },
    {
        //     ,  ,  $group          ,  613   700
        $match: {_id: 613, count: {$lt: 700}}
    }
// == SELECT ad_position_id,count(1) AS count FROM ad_play_log GROUP BY ad_position_id
);

Javaコード実装:
1
2
3
4
5
6
7
8
9
10
11
12 public void test_aggregate() { // https://docs.mongodb.org/getting-started/java/aggregation/ MongoCollection collection = MongoUtil.getCollection( "ad_play_log" ); AggregateIterable iterable = collection.aggregate(asList( new Document( "$group" , new Document( "_id" , "$ad_position_id" ).append( "count" , new Document( "$sum" , 1 ))))); iterable.forEach( new Block() { @Override public void apply( final Document document) { System.out.println(document.toJson()); } }); }
統計aggregateの行数

Use $project to save tag and count into tmp

Use $push or addToSet to store tmp into your data list.

Code:

db.test.aggregate({$unwind:'$tags'},{$group:{_id:'$tags', count:{$sum:1}}},{$project:{tmp:{tag:'$_id', count:'$count'}}},{$group:{_id:null, total:{$sum:1}, data:{$addToSet:'$tmp'}}})

Output:

{"result":[{"_id":null,"total":5,"data":[{"tag":"SOME","count":1},{"tag":"RANDOM","count":2},{"tag":"TAGS1","count":1},{"tag":"TAGS","count":1},{"tag":"SOME1","count":1}]}],"ok":1}

リファレンス
   http://stackoverflow.com/questions/13529323/obtaining-group-result-with-group-count
   http://www.cnblogs.com/shanyou/p/3494854.html
   http://www.cnblogs.com/fx2008/p/3572169.html
コード#コード#
   https://github.com/JeromeSuz/demo_nosql

Leetcode C++『毎日一題』20200707 112.パスの合計

ピーナッツ(一)