CDH5でHiveを利用する(Embedded Mode)

14385 ワード

はじめに

CDH5でHive(Embedded Mode)を利用する方法を記述します。

環境

CentOS 6.5
CDH 5
Hive 0.12.0-cdh5.1.3
jdk 1.7.0_55

構成

ホスト名	IPアドレス	ResourceManager	Namenode	NodeManager	Datanode	JobHistoryServer
hadoop-master	192.168.122.101	○	○	-	-	○
hadoop-master2	192.168.122.102	○	○	-	-	-
hadoop-slave	192.168.122.111	-	-	○	○	-
hadoop-slave2	192.168.122.112	-	-	○	○	-
hadoop-slave3	192.168.122.113	-	-	○	○	-
hadoop-client	192.168.122.201	-	-	-	-	-

※ Hadoopのクラスタの構築方法は、CDH5でhadoopのクラスタを構築するをご参照ください。

Hiveの設定

※ hadoop-clientにHiveをインストールします。

Hiveのインストール

$ sudo yum install hive

Hive用ディレクトリをHDFS上に作成します。

$ sudo -u hdfs hadoop fs -mkdir /user/hive
$ sudo -u hdfs hadoop fs -chown hive:hadoop /user/hive
$ sudo -u hdfs hadoop fs ls /user/
Found 3 items
drwxr-xr-x   - hdfs   hadoop          0 2014-09-20 08:09 /user/hdfs
drwxrwxrwt   - mapred hadoop          0 2014-09-20 05:39 /user/history
drwxr-xr-x   - hive   hadoop          0 2014-10-06 13:34 /user/hive

ローカルディレクトリのパーミッションの調整

$ sudo chown hive /var/lib/hive
$ ls -ld /var/lib/hive
drwxr-xr-x 3 hive root 4096 Oct  6 13:34 /var/lib/hive

データの準備

今回は郵便番号データを使用します。

$ cd /tmp
$ curl -O http://www.post.japanpost.jp/zipcode/dl/roman/ken_all_rome.zip
$ unzip ken_all_rome.zip
$ nkf -S -w ken_all_rome/KEN_ALL_ROME.CSV > ken_all_rome/KEN_ALL_ROME.UTF8.CSV
$ head ken_all_rome/KEN_ALL_ROME.UTF8.CSV
"0600000","北海道","札幌市　中央区","以下に掲載がない場合","HOKKAIDO","SAPPORO SHI CHUO KU","IKANIKEISAIGANAIBAAI"
"0640941","北海道","札幌市　中央区","旭ケ丘","HOKKAIDO","SAPPORO SHI CHUO KU","ASAHIGAOKA"
"0600041","北海道","札幌市　中央区","大通東","HOKKAIDO","SAPPORO SHI CHUO KU","ODORIHIGASHI"
"0600042","北海道","札幌市　中央区","大通西（１～１９丁目）","HOKKAIDO","SAPPORO SHI CHUO KU","ODORINISHI(1-19-CHOME)"
"0640820","北海道","札幌市　中央区","大通西（２０～２８丁目）","HOKKAIDO","SAPPORO SHI CHUO KU","ODORINISHI(20-28-CHOME)"
"0600031","北海道","札幌市　中央区","北一条東","HOKKAIDO","SAPPORO SHI CHUO KU","KITA1-JOHIGASHI"
"0600001","北海道","札幌市　中央区","北一条西（１～１９丁目）","HOKKAIDO","SAPPORO SHI CHUO KU","KITA1-JONISHI(1-19-CHOME)"
"0640821","北海道","札幌市　中央区","北一条西（２０～２８丁目）","HOKKAIDO","SAPPORO SHI CHUO KU","KITA1-JONISHI(20-28-CHOME)"
"0600032","北海道","札幌市　中央区","北二条東","HOKKAIDO","SAPPORO SHI CHUO KU","KITA2-JOHIGASHI"
"0600002","北海道","札幌市　中央区","北二条西（１～１９丁目）","HOKKAIDO","SAPPORO SHI CHUO KU","KITA2-JONISHI(1-19-CHOME)"

※ 郵便番号データの文字コードは、「SHIFT_JIS」ですが、そのままでは扱いにくいので「UTF8」に変換して使用しています。

データの投入

データベース及びテーブルの作成

$ cd /tmp
$ sudo -u hive hive

hive> create database sample;
OK

hive> show databases;
OK
default
sample
Time taken: 5.021 seconds, Fetched: 2 row(s)

hive> use sample;
OK

hive> create table zip_all (
    > zip string,
    > pref string,
    > city string,
    > town string,
    > pref_r string,
    > city_r string,
    > town_r string
    > )
    > row format delimited
    > fields terminated by ','
    > lines terminated by '\n'
    > ;

hive> show tables;
OK
zip_all
Time taken: 0.022 seconds, Fetched: 1 row(s)

hive> load data local inpath '/tmp/ken_all_rome/KEN_ALL_ROME.UTF8.CSV' into table zip_all;
Copying data from file:/tmp/ken_all_rome/KEN_ALL_ROME.UTF8.CSV
Copying file: file:/tmp/ken_all_rome/KEN_ALL_ROME.UTF8.CSV
Loading data to table sample.zip_all
Table sample.zip_all stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 12527284, raw_data_size: 0]
OK
Time taken: 0.817 seconds

データの検索

hive> select count(*) from zip_all;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
14/10/06 15:19:14 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-19-10_668_7801925482149728044-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/06 15:19:14 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-19-10_668_7801925482149728044-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/10/06 15:19:14 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
Execution log at: /tmp/hive/hive_20141006151919_0ae7a324-9f85-4b3f-8036-61e18070c4bd.log
Job running in-process (local Hadoop)
2014-10-06 15:19:18,218 null map = 100%,  reduce = 0%
2014-10-06 15:19:19,226 null map = 100%,  reduce = 100%
Ended Job = job_local1780116823_0001
Execution completed successfully
MapredLocal task succeeded
OK
123699
Time taken: 9.292 seconds, Fetched: 1 row(s)

hive> select * from zip_all where pref_r = '"TOKYO TO"' and city_r = '"SHIBUYA KU"' and town_r = '"SHIBUYA"';
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
14/10/06 15:22:59 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-22-56_612_8533133809152280651-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/06 15:22:59 WARN conf.Configuration: file:/tmp/hive/hive_2014-10-06_15-22-56_612_8533133809152280651-1/-local-10002/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/10/06 15:22:59 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
Execution log at: /tmp/hive/hive_20141006152222_ad9fa2f4-76da-4d3b-b083-6d84a7c48ab8.log
Job running in-process (local Hadoop)
2014-10-06 15:23:03,607 null map = 0%,  reduce = 0%
2014-10-06 15:23:04,617 null map = 100%,  reduce = 0%
Ended Job = job_local2040699265_0001
Execution completed successfully
MapredLocal task succeeded
OK
"1500002"       "東京都"        "渋谷区"        "渋谷"  "TOKYO TO"      "SHIBUYA KU"    "SHIBUYA"
Time taken: 8.706 seconds, Fetched: 1 row(s)

参考

Author And Source

この問題について(CDH5でHiveを利用する(Embedded Mode)), 我々は、より多くの情報をここで見つけました https://qiita.com/toshiro3/items/41a1c78f99da998071ae

著者帰属：元の著者の情報は、元のURLに含まれています。著作権は原作者に属する。

Content is automatically searched and collected through network algorithms . If there is a violation . Please contact us . We will adjust (correct author information ,or delete content ) as soon as possible .