Mahoustのインストールとテスト-hadoopシングルポイントに基づいて疑似分布


JDKの取り付け
私を見る前にJDK 1.7のブログについて:
http://blog.csdn.net/stanely_hwang/articale/detail/18883599
Hadoop単結点の擬似分布式のインストール
以前にHadoop単結点に関する擬似分散式のブログを見ました.
http://blog.csdn.net/stanely_hwang/article/details/18884181
Mahortの設置と配置
1:バイナリドライブのインストールをダウンロードします.
 Mahortダウンロード先:
http://www.apache.org/dyn/closer.cgi/mahout/
Mahotをダウンロードしてから、直接解凍します.Mahototをダウンロードします./opt/hadoopの下で、このディレクトリに入り、解凍操作を行います.
 
   

$ cd /opt/hadoop

$ tar -zxvf mahout-distribution-0.9

2:配置环境变量:

用vim编辑/etc/profile文件, 再文件末尾添加$JHADOOP_HOME, $HADOOP_CONF,$MAHOUT_HOME 环境遍历,
详细配置信息如下所示:

 
  

JAVA_HOME=/opt/java/jdk

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin

JRE_HOME=/opt/java/jdk

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin

export JAVA_HOME

export JRE_HOME

export HADOOP_HOME=/home/andy/hadoop-2.2.0

export HADOOP_CONF_DIR=/home/andy/hadoop-2.2.0/conf

export MAHOUT_HOME=/opt/hadoop/mahout-distribution-0.9

export PATH=$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$PATH

export PATH

export PATH=/sbin:/bin:/usr/sbin:/usr/bin:/sbin

3:启动Hadoop:

到Hadoop安装目录的sbin目录下执行 ~/hadoop-2.2.0/sbin目录下)
 $  ./hadoop-daemon.sh start namenode
 $ ./hadoop-daemon.sh start datanode
$ ./yarn-daemon.sh start resourcemanager
 $ ./yarn-daemon.sh start nodemanager
4:mahoust--help    #Mahoustが完全にインストールされているかどうかを確認して、いくつかのアルゴリズムがリストされているかを確認します.
MAHOUTに入るホーム/binディレクトリ
 $ cd $MAHOUT_HOME/bin
 $ ./mahout --help 
 出力内容は以下の通りです.
マホートLOCAL is not setadding HADOOP_CONF_DIR to classipath.
Running on hadoop、using/home/andy/hadoop-23.0/bin/hadoop and HADOOP_CONF_DIR=/home/andy/hadoop-23.0/conf
MAHOUT-JOB:/opt/hadoop/mahot-distribution-0.9/mahot-examples-09-job.jar
Unknown program'--help'chsen.
Valid program names are:
  arff.vector:Generate Vectors from an ARFF file or directory
  baumwelch:Baum-Welch algorithm for unsupervised HMM trining
  canopy:Canoopy clustering
  cat::Print a file or reource as the logitic regression models would see it
  cleansvd:Cleeanup and verification of SVD out put
  clusterdump:Dump cluster output to text
  clusterpp:Groups Custering Output In Cluusters
  cmdump:Dump confusion matrix in HTML or text formats
  concatmatices:Concatens 2 matraces of same cardinality into a single marix
  cvb:LDA via Collappsed Varation Bayes(0 th derive.approx)
  cvb 0_local:LDA via Collapped Varation Bayes,in memory locally.
  evaluate Factoriazation:comput RMSE and MAE of a rating mactoriation against probes
  fkmeans:Fuzy K-means clustering
  hmmpredict:Generate random sequence of oservations by given HMM
  itemimillarity:Coputte the item-intem-simillarties for item-based collaborative filtering
  kmeans:K-means clustering
  lucene.vector:Generate Vectors from a Lucene index
  lucene 2 seq:Generate Text SequenceFiles from a Lucene index
  matrxdump:Dump matrix in CSV format
  matixmutt:Take the product of two matices
  parallel ALS:ALS-WR factorzation of a rating marix
  qualcluster:Runs clustering experiments and summares reults in a CSV
  recommendfactorized:Compute recommandations the factortization of a rating marix
  recommanditembased:Compute recommandations using item-based collabortive filtering
  regexconverter:Covert text files on a per line based on reglar expressions
  replit:Splity a set of SequenceFiles into a number of equal splity
  rowid::Map SequenceFile to{SequenceFile,SequenceFile}
  rowsmiilarty:Compute the pairwise simillares of the rows of a matrix
  runAdaptive Logistic:スコアnew production data using a probably trined and validated Adaptivelogistic Regression model
  ルンロゴistic:Run a logic regression model against CSV data
  seq 2 encoded:Encocded Sparse Vector generation from Text sequence files
  seq 2 sparse:Sparse Vector generation from Text sequence files
  seqdirectory:Generate sequence files(of Text)from a directory
  seqdumper:ジェネリックSequence File dumper
  seqmail larchives:Creates SequenceFile from a directory containing gziped mail archives
  seqwiki::Wikipedia xml dump to sequence file
  spectrolkmeans:Spectrol k-means clustering
  split::Split Input data into test and train sets
  split Dataset:split a rating dataset into trining and probe parts
  ssvd:Stchastic SVD
  streaming k-means clustering
  svd:Lanczos Singular Value Decompsition
  testnb:Test the Vector-based Bayes classifer
  trinAdaptive Logistic:Train an Adaptivelogistic Regression model
  Train a logic regression using stochastic gradient descent
  Train the Vector-based Bayes classifer
  trospose:Take the transpose of a matrix
  validateAdaptive Logistic:Validate an Adaptivelogistic Regression model against hold-out data set
  vecdist:Computte the distance between a set of Vectors(or Custer or Canoopy、they must fit in memory)and a list of Vectors
  vectordump:Dump vectors from a sequence file to text
  Viterbi:Viterbi decoding of hidden states from given output states sequence
[andy@localhostビン 
5:mahoust使用準備:
  • 準備データ:
  • テストデータのダウンロード先:
    http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
    ダウンロードした後、データを$MAHOUT_に入れます.ホームファイルの下
  • 試験ディレクトリを作成する
  • テストディレクトリtestdataを作成し、testdataにデータを導入します.      
     $ cd $HADOOP_HOME/bin/
    $ hadoop fs -mkdir testdata #
    $ hadoop fs -put $MAHOUT_HOME/synthetic_control.data testdata
     
         

    • 使用kmeans算法

    $ hadoop jar /home/hadoop/mahout-distribution-0.7/mahout-examples-0.7-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
     
          

    • 查看结果

    $ hadoop fs -lsr output
    $ hadoop fs -get output $MAHOUT_HOME/result
    $ cd $MAHOUT_HOME/example/result
    $ ls
    Mahout安装与测试-基于hadoop单结点伪分布式_第1张图片
    上の図のようにインストールが成功しました.