Sqoopを使用してSQL ServerビューのデータをHiveにインポート


    : ·HDP-2.5.3  ·Hive 1.2.1  ·Sqoop 1.4.6  ·SQL Server 2012

文書ディレクトリ
  • 1.sqljdbc 4をダウンロードします.jarを$SQOOP_に配置HOME/lib下
  • 2.SQL Server接続のテスト
  • 2.1 List available databases on a server
  • 2.2 List available tables in a database
  • 2.3実行クエリ文
  • 3.全量導入HDFS
  • 4.全量導入Hive
  • 5.元のデータを上書き
  • 備考
  • 1.sqljdbc 4をダウンロードする.jarを$SQOOP_に配置HOME/lib下
    2.SQL Server接続のテスト
    2.1 List available databases on a server
    [root@hqc-test-hdp1 ~]# sqoop list-databases --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx
    Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    19/10/29 16:13:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
    19/10/29 16:13:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/10/29 16:13:24 INFO manager.SqlManager: Using default fetchSize of 1000
    master
    AI
    

    2.2 List available tables in a database
    dboのテーブルのみが表示され、ビューは表示されません
    [root@hqc-test-hdp1 ~]# sqoop list-tables --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=AI" -username xx -password xx
    Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    19/10/30 08:52:21 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
    19/10/30 08:52:21 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/10/30 08:52:21 INFO manager.SqlManager: Using default fetchSize of 1000
    tt
    table1
    Dictionary
    
    # SELECT TOP 1000  * FROM [dbo].[Dictionary]
    

    2.3クエリー文の実行
    [root@hqc-test-hdp1 ~]# sqoop eval --connect jdbc:sqlserver://10.35.xx.xx -username xx -password xx --query "SELECT TOP 5 * from [xx.xx]"
    Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    19/10/29 16:22:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
    19/10/29 16:22:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/10/29 16:22:22 INFO manager.SqlManager: Using default fetchSize of 1000
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    | Name                 | Code                 | Expr1    | Sname                | Cname                | Aname                | Longitude            | Latitude             | Position             | 
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     
    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    3.全量HDFS導入
    ビューはテーブルではないため、–tableは使用できません.–queryのみ使用できます.
    [root@hqc-test-hdp1 ~]# su hdfs
    [hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
    Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    find: failed to restore initial working directory:     
    19/10/30 09:27:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
    19/10/30 09:27:22 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/10/30 09:27:22 INFO manager.SqlManager: Using default fetchSize of 1000
    19/10/30 09:27:22 INFO tool.CodeGenTool: Beginning code generation
    19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
    19/10/30 09:27:23 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE 1=1 AND  (1 = 0) 
    19/10/30 09:27:23 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
      :   /usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar   ; cannot read zip file
      :   /usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar   ; cannot read zip file
     : /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.java           API。
     :       ,     -Xlint:deprecation     。
    19/10/30 09:27:26 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/3c7b7eafcd1020b0b4e6d390fb32265b/QueryResult.jar
    19/10/30 09:27:26 INFO mapreduce.ImportJobBase: Beginning query import.
    19/10/30 09:27:27 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
    19/10/30 09:27:27 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
    19/10/30 09:27:28 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
    19/10/30 09:27:30 INFO db.DBInputFormat: Using read commited transaction isolation
    19/10/30 09:27:30 INFO mapreduce.JobSubmitter: number of splits:1
    19/10/30 09:27:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0008
    19/10/30 09:27:30 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0008
    19/10/30 09:27:30 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0008/
    19/10/30 09:27:30 INFO mapreduce.Job: Running job: job_1564035532438_0008
    19/10/30 09:27:46 INFO mapreduce.Job: Job job_1564035532438_0008 running in uber mode : false
    19/10/30 09:27:46 INFO mapreduce.Job:  map 0% reduce 0%
    19/10/30 09:27:55 INFO mapreduce.Job:  map 100% reduce 0%
    19/10/30 09:27:55 INFO mapreduce.Job: Job job_1564035532438_0008 completed successfully
    19/10/30 09:27:55 INFO mapreduce.Job: Counters: 30
    	File System Counters
    		FILE: Number of bytes read=0
    		FILE: Number of bytes written=158794
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=87
    		HDFS: Number of bytes written=717075
    		HDFS: Number of read operations=4
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=2
    	Job Counters 
    		Launched map tasks=1
    		Other local map tasks=1
    		Total time spent by all maps in occupied slots (ms)=6086
    		Total time spent by all reduces in occupied slots (ms)=0
    		Total time spent by all map tasks (ms)=6086
    		Total vcore-milliseconds taken by all map tasks=6086
    		Total megabyte-milliseconds taken by all map tasks=31160320
    	Map-Reduce Framework
    		Map input records=5676
    		Map output records=5676
    		Input split bytes=87
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=133
    		CPU time spent (ms)=6280
    		Physical memory (bytes) snapshot=369975296
    		Virtual memory (bytes) snapshot=6399401984
    		Total committed heap usage (bytes)=329777152
    	File Input Format Counters 
    		Bytes Read=0
    	File Output Format Counters 
    		Bytes Written=717075
    19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 28.168 seconds (24.8605 KB/sec)
    19/10/30 09:27:55 INFO mapreduce.ImportJobBase: Retrieved 5676 records.
    

    4.全量導入Hive
    手動でhiveテーブルを事前に作成する必要はありません!手動でhiveテーブルを事前に作成する必要はありません!手動でhiveテーブルを事前に作成する必要はありません!–hive-tableはテーブル名を指定すればいいですが、sqoopはバージョンによって異なる場合がありますので、冒頭の環境バージョンに注意してください.ビューはテーブルではないため、–tableは使用できません.–queryのみ使用できます.import--queryを使用する場合、-split-byはなくてもいいですが、-target-dirは必ずあり、パスは一時パスを書けばいいです.私の観察によると、hdfsのパスの下にデータを抽出し、hive(テーブルに対応するhdfsディレクトリ)にデータをロードし、最後にフォルダと中のファイルを削除します.
    [hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
    Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    find: failed to restore initial working directory:     
    19/10/30 10:14:24 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
    19/10/30 10:14:24 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/10/30 10:14:24 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
    19/10/30 10:14:24 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
    19/10/30 10:14:25 INFO manager.SqlManager: Using default fetchSize of 1000
    19/10/30 10:14:25 INFO tool.CodeGenTool: Beginning code generation
    19/10/30 10:14:25 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 10:14:26 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 10:14:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
      :   /usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar   ; cannot read zip file
      :   /usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar   ; cannot read zip file
     : /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.java           API。
     :       ,     -Xlint:deprecation     。
    19/10/30 10:14:28 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f1f7d212fbef24d849cb7d0604d2b0e5/QueryResult.jar
    19/10/30 10:14:29 INFO mapreduce.ImportJobBase: Beginning query import.
    19/10/30 10:14:30 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
    19/10/30 10:14:30 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
    19/10/30 10:14:31 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
    19/10/30 10:14:33 INFO db.DBInputFormat: Using read commited transaction isolation
    19/10/30 10:14:33 INFO mapreduce.JobSubmitter: number of splits:1
    19/10/30 10:14:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0010
    19/10/30 10:14:34 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0010
    19/10/30 10:14:34 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0010/
    19/10/30 10:14:34 INFO mapreduce.Job: Running job: job_1564035532438_0010
    19/10/30 10:14:44 INFO mapreduce.Job: Job job_1564035532438_0010 running in uber mode : false
    19/10/30 10:14:44 INFO mapreduce.Job:  map 0% reduce 0%
    19/10/30 10:14:59 INFO mapreduce.Job:  map 100% reduce 0%
    19/10/30 10:14:59 INFO mapreduce.Job: Job job_1564035532438_0010 completed successfully
    19/10/30 10:14:59 INFO mapreduce.Job: Counters: 30
    	File System Counters
    		FILE: Number of bytes read=0
    		FILE: Number of bytes written=158786
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=87
    		HDFS: Number of bytes written=717075
    		HDFS: Number of read operations=4
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=2
    	Job Counters 
    		Launched map tasks=1
    		Other local map tasks=1
    		Total time spent by all maps in occupied slots (ms)=12530
    		Total time spent by all reduces in occupied slots (ms)=0
    		Total time spent by all map tasks (ms)=12530
    		Total vcore-milliseconds taken by all map tasks=12530
    		Total megabyte-milliseconds taken by all map tasks=64153600
    	Map-Reduce Framework
    		Map input records=5676
    		Map output records=5676
    		Input split bytes=87
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=99
    		CPU time spent (ms)=5950
    		Physical memory (bytes) snapshot=362852352
    		Virtual memory (bytes) snapshot=6400389120
    		Total committed heap usage (bytes)=336068608
    	File Input Format Counters 
    		Bytes Read=0
    	File Output Format Counters 
    		Bytes Written=717075
    19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Transferred 700.2686 KB in 29.3386 seconds (23.8685 KB/sec)
    19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Retrieved 5676 records.
    19/10/30 10:14:59 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
    19/10/30 10:14:59 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 10:15:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 10:15:00 WARN hive.TableDefWriter: Column Longitude had to be cast to a less precise type in Hive
    19/10/30 10:15:00 WARN hive.TableDefWriter: Column Latitude had to be cast to a less precise type in Hive
    19/10/30 10:15:00 INFO hive.HiveImport: Loading uploaded data into Hive
    
    Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
    OK
    Time taken: 2.898 seconds
    Loading data to table hqc.xx
    Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=717075, rawDataSize=0]
    OK
    Time taken: 0.627 seconds
    

    5.元のデータを上書きする
    追加--hive-overwrite
    [hdfs@hqc-test-hdp1 root]$ sqoop import --connect "jdbc:sqlserver://10.35.xx.xx:1433;DatabaseName=ICP" -username xx -password xx --query "SELECT * from [xx.xx] WHERE \$CONDITIONS" --hive-overwrite --hive-import -hive-database hqc --hive-table xx --target-dir /apps/hive/warehouse/hqc.db/xx -m 1
    Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    find: failed to restore initial working directory:     
    19/10/30 16:12:59 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
    19/10/30 16:12:59 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    19/10/30 16:12:59 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
    19/10/30 16:12:59 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
    19/10/30 16:13:00 INFO manager.SqlManager: Using default fetchSize of 1000
    19/10/30 16:13:00 INFO tool.CodeGenTool: Beginning code generation
    19/10/30 16:13:00 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 16:13:01 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 16:13:01 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.3.0-37/hadoop-mapreduce
      :   /usr/hdp/2.5.3.0-37/sqoop/lib/mysql-connector-java.jar   ; cannot read zip file
      :   /usr/hdp/2.5.3.0-37/hive/lib/mysql-connector-java.jar   ; cannot read zip file
     : /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.java           API。
     :       ,     -Xlint:deprecation     。
    19/10/30 16:13:03 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/2dcbcc5ee20eac3b80e2f2109726b44c/QueryResult.jar
    19/10/30 16:13:03 INFO mapreduce.ImportJobBase: Beginning query import.
    19/10/30 16:13:05 INFO impl.TimelineClientImpl: Timeline service address: http://hqc-test-hdp2:8188/ws/v1/timeline/
    19/10/30 16:13:05 INFO client.RMProxy: Connecting to ResourceManager at hqc-test-hdp1/10.35:8050
    19/10/30 16:13:06 INFO client.AHSProxy: Connecting to Application History server at hqc-test-hdp2/10.35:10200
    19/10/30 16:13:08 INFO db.DBInputFormat: Using read commited transaction isolation
    19/10/30 16:13:08 INFO mapreduce.JobSubmitter: number of splits:1
    19/10/30 16:13:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1564035532438_0023
    19/10/30 16:13:09 INFO impl.YarnClientImpl: Submitted application application_1564035532438_0023
    19/10/30 16:13:09 INFO mapreduce.Job: The url to track the job: http://hqc-test-hdp1:8088/proxy/application_1564035532438_0023/
    19/10/30 16:13:09 INFO mapreduce.Job: Running job: job_1564035532438_0023
    19/10/30 16:13:19 INFO mapreduce.Job: Job job_1564035532438_0023 running in uber mode : false
    19/10/30 16:13:19 INFO mapreduce.Job:  map 0% reduce 0%
    19/10/30 16:13:29 INFO mapreduce.Job:  map 100% reduce 0%
    19/10/30 16:13:29 INFO mapreduce.Job: Job job_1564035532438_0023 completed successfully
    19/10/30 16:13:29 INFO mapreduce.Job: Counters: 30
    	File System Counters
    		FILE: Number of bytes read=0
    		FILE: Number of bytes written=158455
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=87
    		HDFS: Number of bytes written=5217236
    		HDFS: Number of read operations=4
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=2
    	Job Counters 
    		Launched map tasks=1
    		Other local map tasks=1
    		Total time spent by all maps in occupied slots (ms)=7426
    		Total time spent by all reduces in occupied slots (ms)=0
    		Total time spent by all map tasks (ms)=7426
    		Total vcore-milliseconds taken by all map tasks=7426
    		Total megabyte-milliseconds taken by all map tasks=38021120
    	Map-Reduce Framework
    		Map input records=16993
    		Map output records=16993
    		Input split bytes=87
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=108
    		CPU time spent (ms)=9300
    		Physical memory (bytes) snapshot=398262272
    		Virtual memory (bytes) snapshot=6417539072
    		Total committed heap usage (bytes)=415760384
    	File Input Format Counters 
    		Bytes Read=0
    	File Output Format Counters 
    		Bytes Written=5217236
    19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Transferred 4.9755 MB in 24.7647 seconds (205.7342 KB/sec)
    19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Retrieved 16993 records.
    19/10/30 16:13:29 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
    19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 16:13:30 INFO manager.SqlManager: Executing SQL statement: SELECT * from [xx.xx] WHERE  (1 = 0) 
    19/10/30 16:13:30 WARN hive.TableDefWriter: Column AlarmTimeStamp had to be cast to a less precise type in Hive
    19/10/30 16:13:30 WARN hive.TableDefWriter: Column HandlerTime had to be cast to a less precise type in Hive
    19/10/30 16:13:30 INFO hive.HiveImport: Loading uploaded data into Hive
    
    Logging initialized using configuration in jar:file:/usr/hdp/2.5.3.0-37/hive/lib/hive-common-1.2.1000.2.5.3.0-37.jar!/hive-log4j.properties
    OK
    Time taken: 2.809 seconds
    Loading data to table hqc.xx
    Table hqc.xx stats: [numFiles=1, numRows=0, totalSize=5217236, rawDataSize=0]
    OK
    Time taken: 1.321 seconds
    

    コメント
    1.create-hive-table & hive-import   
      :
    Can anyone tell the difference between create-hive-table & hive-import method? Both will create a hive table, but still what is the significance of each?
      :
    The difference is that create-hive-table will create table in Hive based on the source table in database but will NOT transfer any data. Command "import --hive-import" will both create table in Hive and import data from the source table.
        :
    sqoop create-hive-table --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --table test --hive-table myhive2 --hive-partition-key partition_time --map-column-hive ID=String,name=String,addr=String
        (sqlserver   where,oracle and):
    sqoop import --connect "jdbc:sqlserver://192.168.13.1:1433;username=root;password=12345;databasename=test" --query "select * from test i where \$CONDITIONS" --target-dir /user/hive/warehouse/myhive2/partition_time=20171023 --hive-import -m 5 --hive-table myhive2 --split-by ID --hive-partition-key partition_time --hive-partition-value 20171023
    
    2.sqoop help
    Warning: /usr/hdp/2.5.3.0-37/accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    19/10/30 08:42:05 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.5.3.0-37
    usage: sqoop COMMAND [ARGS]
    
    Available commands:
      codegen            Generate code to interact with database records
      create-hive-table  Import a table definition into Hive
      eval               Evaluate a SQL statement and display the results
      export             Export an HDFS directory to a database table
      help               List available commands
      import             Import a table from a database to HDFS
      import-all-tables  Import tables from a database to HDFS
      import-mainframe   Import datasets from a mainframe server to HDFS
      job                Work with saved jobs
      list-databases     List available databases on a server
      list-tables        List available tables in a database
      merge              Merge results of incremental imports
      metastore          Run a standalone Sqoop metastore
      version            Display version information
      
    3.sqoop export  hdfs/hive         ,    、  、      
    --update-mode  updateonly/allowinsert
          
    HQL  :insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from ;
    SQOOP  :sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ','  --columns F1,F2,F3 --export-dir /user/root/export/test
     
    --update-mode  allowinsert      
    HQL  :insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from  where ;
    SQOOP  :sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  allowinsert --export-dir /user/root/export/test
     
     
    --update-mode  updateonly    
    HQL  :insert overwrite  directory ‘/user/root/export/test’ row format delimited fields terminated by ‘,’ STORED AS textfile select F1,F2,F3 from  where ;
    SQOOP  :sqoop export --connect jdbc:mysql://localhost:3306/wht --username root --password cloudera --table  --fields-terminated-by ‘,’  --columns F1,F2,F3 --update-key F4 --update-mode  updateonly --export-dir /user/root/export/test
    
    usage: sqoop export [GENERIC-ARGS] [TOOL-ARGS]
    Common arguments:
       --connect                          Specify JDBC connect
                                                    string
       --connection-manager             Specify connection manager
                                                    class name
       --connection-param-file     Specify connection
                                                    parameters file
       --driver                         Manually specify JDBC
                                                    driver class to use
       --hadoop-home                          Override
                                                    $HADOOP_MAPRED_HOME_ARG
       --hadoop-mapred-home                    Override
                                                    $HADOOP_MAPRED_HOME_ARG
       --help                                       Print usage instructions
    -P                                              Read password from console
       --password                         Set authentication
                                                    password
       --password-alias             Credential provider
                                                    password alias
       --password-file               Set authentication
                                                    password file path
       --relaxed-isolation                          Use read-uncommitted
                                                    isolation for imports
       --skip-dist-cache                            Skip copying jars to
                                                    distributed cache
       --temporary-rootdir                 Defines the temporary root
                                                    directory for the import
       --username                         Set authentication
                                                    username
       --verbose                                    Print more information
                                                    while working
    
    Export control arguments:
       --batch                                                    Indicates
                                                                  underlying
                                                                  statements
                                                                  to be
                                                                  executed in
                                                                  batch mode
       --call                                                Populate the
                                                                  table using
                                                                  this stored
                                                                  procedure
                                                                  (one call
                                                                  per row)
       --clear-staging-table                                      Indicates
                                                                  that any
                                                                  data in
                                                                  staging
                                                                  table can be
                                                                  deleted
       --columns                                  Columns to
                                                                  export to
                                                                  table
       --direct                                                   Use direct
                                                                  export fast
                                                                  path
       --export-dir                                          HDFS source
                                                                  path for the
                                                                  export
    -m,--num-mappers                                           Use 'n' map
                                                                  tasks to
                                                                  export in
                                                                  parallel
       --mapreduce-job-name                                 Set name for
                                                                  generated
                                                                  mapreduce
                                                                  job
       --staging-table                                Intermediate
                                                                  staging
                                                                  table
       --table                                        Table to
                                                                  populate
       --update-key                                          Update
                                                                  records by
                                                                  specified
                                                                  key column
       --update-mode                                        Specifies
                                                                  how updates
                                                                  are
                                                                  performed
                                                                  when new
                                                                  rows are
                                                                  found with
                                                                  non-matching
                                                                  keys in
                                                                  database
       --validate                                                 Validate the
                                                                  copy using
                                                                  the
                                                                  configured
                                                                  validator
       --validation-failurehandler     Fully
                                                                  qualified
                                                                  class name
                                                                  for
                                                                  ValidationFa
                                                                  ilureHandler
       --validation-threshold               Fully
                                                                  qualified
                                                                  class name
                                                                  for
                                                                  ValidationTh
                                                                  reshold
       --validator                                     Fully
                                                                  qualified
                                                                  class name
                                                                  for the
                                                                  Validator
    
    Input parsing arguments:
       --input-enclosed-by                Sets a required field encloser
       --input-escaped-by                 Sets the input escape
                                                character
       --input-fields-terminated-by       Sets the input field separator
       --input-lines-terminated-by        Sets the input end-of-line
                                                char
       --input-optionally-enclosed-by     Sets a field enclosing
                                                character
    
    Output line formatting arguments:
       --enclosed-by                Sets a required field enclosing
                                          character
       --escaped-by                 Sets the escape character
       --fields-terminated-by       Sets the field separator character
       --lines-terminated-by        Sets the end-of-line character
       --mysql-delimiters                 Uses MySQL's default delimiter set:
                                          fields: ,  lines: 
    escaped-by: \ optionally-enclosed-by: ' --optionally-enclosed-by Sets a field enclosing character Code generation arguments: --bindir Output directory for compiled objects --class-name Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --input-null-non-string Input null non-string representation --input-null-string Input null string representation --jar-file Disable code generation; use specified jar --map-column-java Override mapping for specific columns to java types --null-non-string Null non-string representation --null-string Null string representation --outdir Output directory for generated code --package-name Put auto-generated classes in this package HCatalog arguments: --hcatalog-database HCatalog database name --hcatalog-home Override $HCAT_HOME --hcatalog-partition-keys Sets the partition keys to use when importing to hive --hcatalog-partition-values Sets the partition values to use when importing to hive --hcatalog-table HCatalog table name --hive-home Override $HIVE_HOME --hive-partition-key Sets the partition key to use when importing to hive --hive-partition-value Sets the partition value to use when importing to hive --map-column-hive Override mapping for specific column to hive types. Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a ResourceManager -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] At minimum, you must specify --connect, --export-dir, and --table 4.sqoop import help usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS] Common arguments: --connect Specify JDBC connect string --connection-manager Specify connection manager class name --connection-param-file Specify connection parameters file --driver Manually specify JDBC driver class to use --hadoop-home Override $HADOOP_MAPRED_HOME_ARG --hadoop-mapred-home Override $HADOOP_MAPRED_HOME_ARG --help Print usage instructions -P Read password from console --password Set authentication password --password-alias Credential provider password alias --password-file Set authentication password file path --relaxed-isolation Use read-uncommitted isolation for imports --skip-dist-cache Skip copying jars to distributed cache --temporary-rootdir Defines the temporary root directory for the import --username Set authentication username --verbose Print more information while working Import control arguments: --append Imports data in append mode --as-avrodatafile Imports data to Avro data files --as-parquetfile Imports data to Parquet files --as-sequencefile Imports data to SequenceFile s --as-textfile Imports data as plain text (default) --autoreset-to-one-mapper Reset the number of mappers to one mapper if no split key available --boundary-query Set boundary query for retrieving max and min value of the primary key --columns Columns to import from table --compression-codec Compression codec to use for import --delete-target-dir Imports data in delete mode --direct Use direct import fast path --direct-split-size Split the input stream every 'n' bytes when importing in direct mode -e,--query Import results of SQL 'statement' --fetch-size Set number 'n' of rows to fetch from the database when more rows are needed --inline-lob-limit Set the maximum size for an inline LOB -m,--num-mappers Use 'n' map tasks to import in parallel --mapreduce-job-name Set name for generated mapreduce job --merge-key Key column to use to join results --split-by Column of the table used to split work units --split-limit Upper Limit of rows per split for split columns of Date/Time/Ti mestamp and integer types. For date or timestamp fields it is calculated in seconds. split-limit should be greater than 0 --table Table to read --target-dir HDFS plain table destination --validate Validate the copy using the configured validator --validation-failurehandler Fully qualified class name for ValidationFa ilureHandler --validation-threshold Fully qualified class name for ValidationTh reshold --validator Fully qualified class name for the Validator --warehouse-dir HDFS parent for table destination --where WHERE clause to use during import -z,--compress Enable compression Incremental import arguments: --check-column Source column to check for incremental change --incremental Define an incremental import of type 'append' or 'lastmodified' --last-value Last imported value in the incremental check column Output line formatting arguments: --enclosed-by Sets a required field enclosing character --escaped-by Sets the escape character --fields-terminated-by Sets the field separator character --lines-terminated-by Sets the end-of-line character --mysql-delimiters Uses MySQL's default delimiter set: fields: , lines:
    escaped-by: \ optionally-enclosed-by: ' --optionally-enclosed-by Sets a field enclosing character Input parsing arguments: --input-enclosed-by Sets a required field encloser --input-escaped-by Sets the input escape character --input-fields-terminated-by Sets the input field separator --input-lines-terminated-by Sets the input end-of-line char --input-optionally-enclosed-by Sets a field enclosing character Hive arguments: --create-hive-table Fail if the target hive table exists --hive-compute-stats Overwrite existing data in the Hive table --hive-database Sets the database name to use when importing to hive --hive-delims-replacement Replace Hive record \0x01 and row delimiters (
    \r) from imported string fields with user-defined string --hive-drop-import-delims Drop Hive record \0x01 and row delimiters (
    \r) from imported string fields --hive-home Override $HIVE_HOME --hive-import Import tables into Hive (Uses Hive's default delimiters if none are set.) --hive-overwrite Overwrite existing data in the Hive table --hive-partition-key Sets the partition key to use when importing to hive --hive-partition-value Sets the partition value to use when importing to hive --hive-table Sets the table name to use when importing to hive --map-column-hive Override mapping for specific column to hive types. HBase arguments: --column-family Sets the target column family for the import --hbase-bulkload Enables HBase bulk loading --hbase-create-table If specified, create missing HBase tables --hbase-row-key Specifies which input column to use as the row key --hbase-table Import to
    in HBase HCatalog arguments: --hcatalog-database HCatalog database name --hcatalog-home Override $HCAT_HOME --hcatalog-partition-keys Sets the partition keys to use when importing to hive --hcatalog-partition-values Sets the partition values to use when importing to hive --hcatalog-table HCatalog table name --hive-home Override $HIVE_HOME --hive-partition-key Sets the partition key to use when importing to hive --hive-partition-value Sets the partition value to use when importing to hive --map-column-hive Override mapping for specific column to hive types. HCatalog import specific options: --create-hcatalog-table Create HCatalog before import --drop-and-create-hcatalog-table Drop and Create HCatalog before import --hcatalog-storage-stanza HCatalog storage stanza for table creation Accumulo arguments: --accumulo-batch-size Batch size in bytes --accumulo-column-family Sets the target column family for the import --accumulo-create-table If specified, create missing Accumulo tables --accumulo-instance Accumulo instance name. --accumulo-max-latency Max write latency in milliseconds --accumulo-password Accumulo password. --accumulo-row-key Specifies which input column to use as the row key --accumulo-table
    Import to
    in Accumulo --accumulo-user Accumulo user name. --accumulo-visibility Visibility token to be applied to all rows imported --accumulo-zookeepers Comma-separated list of zookeepers (host:port) Code generation arguments: --bindir Output directory for compiled objects --class-name Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. --input-null-non-string Input null non-string representation --input-null-string Input null string representation --jar-file Disable code generation; use specified jar --map-column-java Override mapping for specific columns to java types --null-non-string Null non-string representation --null-string Null string representation --outdir Output directory for generated code --package-name Put auto-generated classes in this package Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf specify an application configuration file -D use value for given property -fs specify a namenode -jt specify a ResourceManager -files specify comma separated files to be copied to the map reduce cluster -libjars specify comma separated jar files to include in the classpath. -archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] At minimum, you must specify --connect and --table Arguments to mysqldump and other subprograms may be supplied after a '--' on the command line.