Mahoustのインストールとテスト-hadoopシングルポイントに基づいて疑似分布
JDKの取り付け
私を見る前にJDK 1.7のブログについて:
Hadoop単結点の擬似分布式のインストール
以前にHadoop単結点に関する擬似分散式のブログを見ました.
1:バイナリドライブのインストールをダウンロードします.
Mahortダウンロード先:
私を見る前にJDK 1.7のブログについて:
http://blog.csdn.net/stanely_hwang/articale/detail/18883599Hadoop単結点の擬似分布式のインストール
以前にHadoop単結点に関する擬似分散式のブログを見ました.
http://blog.csdn.net/stanely_hwang/article/details/18884181
Mahortの設置と配置1:バイナリドライブのインストールをダウンロードします.
Mahortダウンロード先:
http://www.apache.org/dyn/closer.cgi/mahout/
Mahotをダウンロードしてから、直接解凍します.Mahototをダウンロードします./opt/hadoopの下で、このディレクトリに入り、解凍操作を行います.$ cd /opt/hadoop
$ tar -zxvf mahout-distribution-0.9
2:配置环境变量:
用vim编辑/etc/profile文件, 再文件末尾添加$JHADOOP_HOME, $HADOOP_CONF,$MAHOUT_HOME 环境遍历,
详细配置信息如下所示:
JAVA_HOME=/opt/java/jdk
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin
JRE_HOME=/opt/java/jdk
PATH=/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/bin
export JAVA_HOME
export JRE_HOME
export HADOOP_HOME=/home/andy/hadoop-2.2.0
export HADOOP_CONF_DIR=/home/andy/hadoop-2.2.0/conf
export MAHOUT_HOME=/opt/hadoop/mahout-distribution-0.9
export PATH=$HADOOP_HOME/bin:$MAHOUT_HOME/bin:$PATH
export PATH
export PATH=/sbin:/bin:/usr/sbin:/usr/bin:/sbin
3:启动Hadoop:
到Hadoop安装目录的sbin目录下执行 ( ~/hadoop-2.2.0/sbin目录下)
$ ./hadoop-daemon.sh start namenode
$ ./hadoop-daemon.sh start datanode
$ ./yarn-daemon.sh start resourcemanager
4:mahoust--help #Mahoustが完全にインストールされているかどうかを確認して、いくつかのアルゴリズムがリストされているかを確認します.$ ./yarn-daemon.sh start nodemanager
MAHOUTに入るホーム/binディレクトリ$ cd $MAHOUT_HOME/bin
出力内容は以下の通りです.$ ./mahout --help
マホートLOCAL is not setadding HADOOP_CONF_DIR to classipath.
Running on hadoop、using/home/andy/hadoop-23.0/bin/hadoop and HADOOP_CONF_DIR=/home/andy/hadoop-23.0/conf
MAHOUT-JOB:/opt/hadoop/mahot-distribution-0.9/mahot-examples-09-job.jar
Unknown program'--help'chsen.
Valid program names are:
arff.vector:Generate Vectors from an ARFF file or directory
baumwelch:Baum-Welch algorithm for unsupervised HMM trining
canopy:Canoopy clustering
cat::Print a file or reource as the logitic regression models would see it
cleansvd:Cleeanup and verification of SVD out put
clusterdump:Dump cluster output to text
clusterpp:Groups Custering Output In Cluusters
cmdump:Dump confusion matrix in HTML or text formats
concatmatices:Concatens 2 matraces of same cardinality into a single marix
cvb:LDA via Collappsed Varation Bayes(0 th derive.approx)
cvb 0_local:LDA via Collapped Varation Bayes,in memory locally.
evaluate Factoriazation:comput RMSE and MAE of a rating mactoriation against probes
fkmeans:Fuzy K-means clustering
hmmpredict:Generate random sequence of oservations by given HMM
itemimillarity:Coputte the item-intem-simillarties for item-based collaborative filtering
kmeans:K-means clustering
lucene.vector:Generate Vectors from a Lucene index
lucene 2 seq:Generate Text SequenceFiles from a Lucene index
matrxdump:Dump matrix in CSV format
matixmutt:Take the product of two matices
parallel ALS:ALS-WR factorzation of a rating marix
qualcluster:Runs clustering experiments and summares reults in a CSV
recommendfactorized:Compute recommandations the factortization of a rating marix
recommanditembased:Compute recommandations using item-based collabortive filtering
regexconverter:Covert text files on a per line based on reglar expressions
replit:Splity a set of SequenceFiles into a number of equal splity
rowid::Map SequenceFile to{SequenceFile,SequenceFile}
rowsmiilarty:Compute the pairwise simillares of the rows of a matrix
runAdaptive Logistic:スコアnew production data using a probably trined and validated Adaptivelogistic Regression model
ルンロゴistic:Run a logic regression model against CSV data
seq 2 encoded:Encocded Sparse Vector generation from Text sequence files
seq 2 sparse:Sparse Vector generation from Text sequence files
seqdirectory:Generate sequence files(of Text)from a directory
seqdumper:ジェネリックSequence File dumper
seqmail larchives:Creates SequenceFile from a directory containing gziped mail archives
seqwiki::Wikipedia xml dump to sequence file
spectrolkmeans:Spectrol k-means clustering
split::Split Input data into test and train sets
split Dataset:split a rating dataset into trining and probe parts
ssvd:Stchastic SVD
streaming k-means clustering
svd:Lanczos Singular Value Decompsition
testnb:Test the Vector-based Bayes classifer
trinAdaptive Logistic:Train an Adaptivelogistic Regression model
Train a logic regression using stochastic gradient descent
Train the Vector-based Bayes classifer
trospose:Take the transpose of a matrix
validateAdaptive Logistic:Validate an Adaptivelogistic Regression model against hold-out data set
vecdist:Computte the distance between a set of Vectors(or Custer or Canoopy、they must fit in memory)and a list of Vectors
vectordump:Dump vectors from a sequence file to text
Viterbi:Viterbi decoding of hidden states from given output states sequence
[andy@localhostビン
5:mahoust使用準備:準備データ: テストデータのダウンロード先:ダウンロードした後、データを$MAHOUT_に入れます.ホームファイルの下http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
試験ディレクトリを作成する テストディレクトリtestdataを作成し、testdataにデータを導入します.$ cd $HADOOP_HOME/bin/
$ hadoop fs -mkdir testdata #
$ hadoop fs -put $MAHOUT_HOME/synthetic_control.data testdata