eclipseでmapreduceプログラムを記述する

3838 ワード

eclipse mapreduce hadoop import output Parsing

自分のブログが期限切れになりそうなので、まだ役に立つものを運んで避難～
まず、プラグインをダウンロードします
これは別のプラグインです.見てもいいです.
そして、eclipse/pluginの下に置いて、私はfedoraシステムで、私は/usr/lib/eclipse/pluginsの下に置いた.
プラグインの名前をhadoop-eclipse-plugin-1.0.0に変更します.jar,
マイeclipseバージョン:
Eclipse Platform
Version: 3.6.1
Build id: M20100909-0800
名前を変えないとだめだと気づいて、他のプラグインをたくさん試してもだめです.なぜこの名前に変更したのかというと、この名前のプラグインが私のeclipseで発見できることに気づいたが、実行できなかったからです.
次に、eclipseを再起動すると、window-"open perspectiveの下でMapReduceが見えます(見えない場合は、私が言った問題かもしれません)、選択した後、ポップアップダイアログボックスでmyhadoopなどのLocation nameを構成する必要があります.Map/Reduce MasterとDFS Masterもあります.この中のHost、Portはそれぞれあなたのためにmapred-siteです.xml、core-site.xmlで構成されているアドレスとポート.
ここで、通常のプロジェクトのリストを見ることができます.つまり、プロジェクトExplorerの下にDFS Locationsがあります.
私たちの下にはmyhadoopがあります.さっき構成したものです.そうしないと、再構成できます.クリックすると、hadoopホームディレクトリとユーザーディレクトリの2つのディレクトリが表示されます.
まずwordcountを書きます.
newは1つのMapReduce project、それからnewは1つのmapReduce Driver、あなたもmap、reduceとdirverを分けることができて、ここで私は離れません.
コード:

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.util.*;

public class WordCount {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {

        private final static IntWritable one = new IntWritable(1);

        private Text word = new Text();

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

            String line = value.toString();

            StringTokenizer tokenizer = new StringTokenizer(line);

            while (tokenizer.hasMoreTokens()) {

                word.set(tokenizer.nextToken());

                output.collect(word, one);

            }

        }

     }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {

         public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {

             int sum = 0;

             while (values.hasNext()) {

                 sum += values.next().get();

             }

             output.collect(key, new IntWritable(sum));

         }

    }

    public static void main(String[] args) throws Exception {

        JobConf conf = new JobConf(WordCount.class);

         conf.setJobName(“wordcount”);

         conf.setOutputKeyClass(Text.class);

         conf.setOutputValueClass(IntWritable.class);

         conf.setMapperClass(Map.class);

         conf.setCombinerClass(Reduce.class);

         conf.setReducerClass(Reduce.class);

         conf.setInputFormat(TextInputFormat.class);

         conf.setOutputFormat(TextOutputFormat.class);

         FileInputFormat.setInputPaths(conf, new Path(args[0]));

         FileOutputFormat.setOutputPath(conf, new Path(args[1]));

         JobClient.runJob(conf);

    }

}

次に、実行時にコマンドラインパラメータが必要なので、run-"run configurations-"argumentsをクリックします.
プログラムargument入力:input output
inputはhdfsの入力ディレクトリに対応し、統計が必要なファイルがあり、ファイルには単語があります.
outputはhdfsの出力ディレクトリに対応し、hadoopが以前の有用な実行結果を上書きしないために上書きできないため、このディレクトリが存在しないことを保証します.
次に、consoleでプロシージャの情報を出力します.
12/03/27 02:46:13 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/27 02:46:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/27 02:46:13 INFO mapred.FileInputFormat: Total input paths to process : 3
……
最後にhadoopにoutputディレクトリがあり、統計結果ファイルが入っていることがわかります.
hadoop fs-ls outputで見ることができます.
終わります.

Baek Junアルゴリズム|11653号-素数分解

Notificationの高度な使用(8.0対応)