Mahoustのインストールとテスト-hadoopシングルポイントに基づいて疑似分布

14045 ワード

Linux マホトマシン学習 Hadoop

JDKの取り付け
私を見る前にJDK 1.7のブログについて:

http://blog.csdn.net/stanely_hwang/articale/detail/18883599
Hadoop単結点の擬似分布式のインストール
以前にHadoop単結点に関する擬似分散式のブログを見ました.

http://blog.csdn.net/stanely_hwang/article/details/18884181

Mahortの設置と配置
1:バイナリドライブのインストールをダウンロードします.
Mahortダウンロード先:

http://www.apache.org/dyn/closer.cgi/mahout/

Mahotをダウンロードしてから、直接解凍します.Mahototをダウンロードします./opt/hadoopの下で、このディレクトリに入り、解凍操作を行います.

$ cd /opt/hadoop

$ tar -zxvf mahout-distribution-0.9

2：配置环境变量：

用vim编辑/etc/profile文件, 再文件末尾添加$JHADOOP_HOME, $HADOOP_CONF,$MAHOUT_HOME 环境遍历，

详细配置信息如下所示：

JAVA_HOME=/opt/java/jdk

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin

JRE_HOME=/opt/java/jdk

PATH=/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin

export JAVA_HOME

export JRE_HOME

export HADOOP_HOME=/home/andy/hadoop-2.2.0

export HADOOP_CONF_DIR=/home/andy/hadoop-2.2.0/conf

export MAHOUT_HOME=/opt/hadoop/mahout-distribution-0.9

export PATH=$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$PATH

export PATH

export PATH=/sbin:/bin:/usr/sbin:/usr/bin:/sbin

3：启动Hadoop：

到Hadoop安装目录的sbin目录下执行（ ~/hadoop-2.2.0/sbin目录下)

 $  ./hadoop-daemon.sh start namenode

 $ ./hadoop-daemon.sh start datanode

$ ./yarn-daemon.sh start resourcemanager

 $ ./yarn-daemon.sh start nodemanager

4:mahoust--help #Mahoustが完全にインストールされているかどうかを確認して、いくつかのアルゴリズムがリストされているかを確認します.
MAHOUTに入るホーム/binディレクトリ

 $ cd $MAHOUT_HOME/bin

 $ ./mahout --help

出力内容は以下の通りです.

マホートLOCAL is not setadding HADOOP_CONF_DIR to classipath.
Running on hadoop、using/home/andy/hadoop-23.0/bin/hadoop and HADOOP_CONF_DIR=/home/andy/hadoop-23.0/conf
MAHOUT-JOB:/opt/hadoop/mahot-distribution-0.9/mahot-examples-09-job.jar
Unknown program'--help'chsen.
Valid program names are:
arff.vector:Generate Vectors from an ARFF file or directory
baumwelch:Baum-Welch algorithm for unsupervised HMM trining
canopy:Canoopy clustering
cat::Print a file or reource as the logitic regression models would see it
cleansvd:Cleeanup and verification of SVD out put
clusterdump:Dump cluster output to text
clusterpp:Groups Custering Output In Cluusters
cmdump:Dump confusion matrix in HTML or text formats
concatmatices:Concatens 2 matraces of same cardinality into a single marix
cvb:LDA via Collappsed Varation Bayes(0 th derive.approx)
cvb 0_local:LDA via Collapped Varation Bayes,in memory locally.
evaluate Factoriazation:comput RMSE and MAE of a rating mactoriation against probes
fkmeans:Fuzy K-means clustering
hmmpredict:Generate random sequence of oservations by given HMM
itemimillarity:Coputte the item-intem-simillarties for item-based collaborative filtering
kmeans:K-means clustering
lucene.vector:Generate Vectors from a Lucene index
lucene 2 seq:Generate Text SequenceFiles from a Lucene index
matrxdump:Dump matrix in CSV format
matixmutt:Take the product of two matices
parallel ALS:ALS-WR factorzation of a rating marix
qualcluster:Runs clustering experiments and summares reults in a CSV
recommendfactorized:Compute recommandations the factortization of a rating marix
recommanditembased:Compute recommandations using item-based collabortive filtering
regexconverter:Covert text files on a per line based on reglar expressions
replit:Splity a set of SequenceFiles into a number of equal splity
rowid::Map SequenceFile to{SequenceFile,SequenceFile}
rowsmiilarty:Compute the pairwise simillares of the rows of a matrix
runAdaptive Logistic:スコアnew production data using a probably trined and validated Adaptivelogistic Regression model
ルンロゴistic:Run a logic regression model against CSV data
seq 2 encoded:Encocded Sparse Vector generation from Text sequence files
seq 2 sparse:Sparse Vector generation from Text sequence files
seqdirectory:Generate sequence files(of Text)from a directory
seqdumper:ジェネリックSequence File dumper
seqmail larchives:Creates SequenceFile from a directory containing gziped mail archives
seqwiki::Wikipedia xml dump to sequence file
spectrolkmeans:Spectrol k-means clustering
split::Split Input data into test and train sets
split Dataset:split a rating dataset into trining and probe parts
ssvd:Stchastic SVD
streaming k-means clustering
svd:Lanczos Singular Value Decompsition
testnb:Test the Vector-based Bayes classifer
trinAdaptive Logistic:Train an Adaptivelogistic Regression model
Train a logic regression using stochastic gradient descent
Train the Vector-based Bayes classifer
trospose:Take the transpose of a matrix
validateAdaptive Logistic:Validate an Adaptivelogistic Regression model against hold-out data set
vecdist:Computte the distance between a set of Vectors(or Custer or Canoopy、they must fit in memory)and a list of Vectors
vectordump:Dump vectors from a sequence file to text
Viterbi:Viterbi decoding of hidden states from given output states sequence
［andy@localhostビン
5:mahoust使用準備:

準備データ:

テストデータのダウンロード先:

http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

ダウンロードした後、データを$MAHOUT_に入れます.ホームファイルの下

試験ディレクトリを作成する

テストディレクトリtestdataを作成し、testdataにデータを導入します.

 $ cd $HADOOP_HOME/bin/

$ hadoop fs -mkdir testdata #

$ hadoop fs -put $MAHOUT_HOME/synthetic_control.data testdata

使用kmeans算法

$ hadoop jar /home/hadoop/mahout-distribution-0.7/mahout-examples-0.7-job.jar org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

查看结果

$ hadoop fs -lsr output

$ hadoop fs －get output $MAHOUT_HOME/result

$ cd $MAHOUT_HOME/example/result

$ ls

上の図のようにインストールが成功しました.

JavaScript配列とサイクル