Hadoop httpfs APIの使用
17972 ワード
公式restインタフェースの使用方法を参照:WebHDFS REST API
HTTP GET OPEN (see FileSystem.open) GETFILESTATUS (see FileSystem.getFileStatus) LISTSTATUS (see FileSystem.listStatus) GETCONTENTSUMMARY (see FileSystem.getContentSummary) GETFILECHECKSUM (see FileSystem.getFileChecksum) GETHOMEDIRECTORY (see FileSystem.getHomeDirectory) GETDELEGATIONTOKEN (see FileSystem.getDelegationToken) GETDELEGATIONTOKENS (see FileSystem.getDelegationTokens) GETXATTRS (see FileSystem.getXAttr) GETXATTRS (see FileSystem.getXAttrs) GETXATTRS (see FileSystem.getXAttrs) LISTXATTRS (see FileSystem.listXAttrs) CHECKACCESS (see FileSystem.access)
HTTP PUT CREATE (see FileSystem.create) MKDIRS (see FileSystem.mkdirs) CREATESYMLINK (see FileContext.createSymlink) RENAME (see FileSystem.rename) SETREPLICATION (see FileSystem.setReplication) SETOWNER (see FileSystem.setOwner) SETPERMISSION (see FileSystem.setPermission) SETTIMES (see FileSystem.setTimes) RENEWDELEGATIONTOKEN (see FileSystem.renewDelegationToken) CANCELDELEGATIONTOKEN (see FileSystem.cancelDelegationToken) CREATESNAPSHOT (see FileSystem.createSnapshot) RENAMESNAPSHOT (see FileSystem.renameSnapshot) SETXATTR (see FileSystem.setXAttr) REMOVEXATTR (see FileSystem.removeXAttr)
HTTP POST APPEND (see FileSystem.append) CONCAT (see FileSystem.concat)
HTTP DELETE DELETE (see FileSystem.delete) DELETESNAPSHOT (see FileSystem.deleteSnapshot)
The FileSystem scheme of WebHDFS is
The above WebHDFS URI corresponds to the below HDFS URI.
In the REST API, the prefix
Below are the HDFS configuration options for WebHDFS.
Property Name
Description
dfs.webhdfs.enabled
Enable/disable WebHDFS in Namenodes and Datanodes
dfs.web.authentication.kerberos.principal
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. A value of "*"will use all HTTP principals found in the keytab.
dfs.web.authentication.kerberos.keytab
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
When security is off, the authenticated user is the username specified in the user.name query parameter. If the user.name parameter is not set, the server may either set the authenticated user to a default web user, if there is any, or return an error response.
When security is on, authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. If a token is set in the delegation query parameter, the authenticated user is the user encoded in the token. If the delegation parameter is not set, the user is authenticated by Kerberos SPNEGO.
Below are examples using the curl command tool. Authentication when security is off: Authentication using Kerberos SPNEGO when security is on: Authentication using Hadoop delegation token when security is on:
See also: Authentication for Hadoop HTTP web-consoles
When the proxy user feature is enabled, a proxy user P may submit a request on behalf of another user U. The username of U must be specified in the doas query parameter unless a delegation token is presented in authentication. In such case, the information of both users P and U must be encoded in the delegation token. A proxy request when security is off: A proxy request using Kerberos SPNEGO when security is on: A proxy request using Hadoop delegation token when security is on:
Create and Write to a File Step 1: Submit a HTTP PUT request without automatically following redirects and without sending the file data.
The request is redirected to a datanode where the file data is to be written: Step 2: Submit another HTTP PUT request using the URL in the Location header with the file data to be written.
The client receives a 201 Created response with zero content length and the WebHDFS URI of the file in the Location header:
Note that the reason of having two-step create/append is for preventing clients to send out data before the redirect. This issue is addressed by the "Expect: 100-continue"header in HTTP/1.1; see RFC 2616, Section 8.2.3. Unfortunately, there are software library bugs (e.g. Jetty 6 HTTP server and Java 6 HTTP client), which do not correctly implement "Expect: 100-continue". The two-step create/append is a temporary workaround for the software library bugs.
See also: overwrite, blocksize, replication, permission, buffersize, FileSystem.create
Step 1: Submit a HTTP POST request without automatically following redirects and without sending the file data.
The request is redirected to a datanode where the file data is to be appended: Step 2: Submit another HTTP POST request using the URL in the Location header with the file data to be appended.
The client receives a response with zero content length:
See the note in the previous section for the description of why this operation requires two steps.
See also: buffersize, FileSystem.append
Submit a HTTP GET request with automatically following redirects.
The request is redirected to a datanode where the file data can be read:
The client follows the redirect to the datanode and receives the file data:
See also: offset, length, buffersize, FileSystem.open
Submit a HTTP PUT request.
The client receives a response with a boolean JSON object:
See also: permission, FileSystem.mkdirs
Submit a HTTP PUT request.
The client receives a response with zero content length:
See also: destination, createParent, FileSystem.createSymlink
Submit a HTTP PUT request.
The client receives a response with a boolean JSON object:
See also: destination, FileSystem.rename
Submit a HTTP DELETE request.
The client receives a response with a boolean JSON object:
See also: recursive, FileSystem.delete
Submit a HTTP GET request.
The client receives a response with a FileStatus JSON object:
See also: FileSystem.getFileStatus
Submit a HTTP GET request.
The client receives a response with a FileStatuses JSON object:
See also: FileSystem.listStatus
Other File System Operations
Operations
FileSystem URIs vs HTTP URLs
The FileSystem scheme of WebHDFS is
"webhdfs://"
. A WebHDFS FileSystem URI has the following format. webhdfs://:/
The above WebHDFS URI corresponds to the below HDFS URI.
hdfs://:/
In the REST API, the prefix
"/webhdfs/v1"
is inserted in the path and a query is appended at the end. Therefore, the corresponding HTTP URL has the following format. http://:/webhdfs/v1/?op=...
HDFS Configuration Options
Below are the HDFS configuration options for WebHDFS.
Property Name
Description
dfs.webhdfs.enabled
Enable/disable WebHDFS in Namenodes and Datanodes
dfs.web.authentication.kerberos.principal
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. A value of "*"will use all HTTP principals found in the keytab.
dfs.web.authentication.kerberos.keytab
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
Authentication
When security is off, the authenticated user is the username specified in the user.name query parameter. If the user.name parameter is not set, the server may either set the authenticated user to a default web user, if there is any, or return an error response.
When security is on, authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. If a token is set in the delegation query parameter, the authenticated user is the user encoded in the token. If the delegation parameter is not set, the user is authenticated by Kerberos SPNEGO.
Below are examples using the curl command tool.
curl -i "http://:/webhdfs/v1/?[user.name=&]op=..."
curl -i --negotiate -u : "http://:/webhdfs/v1/?op=..."
curl -i "http://:/webhdfs/v1/?delegation=&op=..."
See also: Authentication for Hadoop HTTP web-consoles
Proxy Users
When the proxy user feature is enabled, a proxy user P may submit a request on behalf of another user U. The username of U must be specified in the doas query parameter unless a delegation token is presented in authentication. In such case, the information of both users P and U must be encoded in the delegation token.
curl -i "http://:/webhdfs/v1/?[user.name=&]doas=&op=..."
curl -i --negotiate -u : "http://:/webhdfs/v1/?doas=&op=..."
curl -i "http://:/webhdfs/v1/?delegation=&op=..."
File and Directory Operations
Create and Write to a File
curl -i -X PUT "http://:/webhdfs/v1/?op=CREATE
[&overwrite=][&blocksize=][&replication=]
[&permission=][&buffersize=]"
The request is redirected to a datanode where the file data is to be written:
HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://:/webhdfs/v1/?op=CREATE...
Content-Length: 0
curl -i -X PUT -T "http://:/webhdfs/v1/?op=CREATE..."
The client receives a 201 Created response with zero content length and the WebHDFS URI of the file in the Location header:
HTTP/1.1 201 Created
Location: webhdfs://:/
Content-Length: 0
Note that the reason of having two-step create/append is for preventing clients to send out data before the redirect. This issue is addressed by the "Expect: 100-continue"header in HTTP/1.1; see RFC 2616, Section 8.2.3. Unfortunately, there are software library bugs (e.g. Jetty 6 HTTP server and Java 6 HTTP client), which do not correctly implement "Expect: 100-continue". The two-step create/append is a temporary workaround for the software library bugs.
See also: overwrite, blocksize, replication, permission, buffersize, FileSystem.create
Append to a File
curl -i -X POST "http://:/webhdfs/v1/?op=APPEND[&buffersize=]"
The request is redirected to a datanode where the file data is to be appended:
HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://:/webhdfs/v1/?op=APPEND...
Content-Length: 0
curl -i -X POST -T "http://:/webhdfs/v1/?op=APPEND..."
The client receives a response with zero content length:
HTTP/1.1 200 OK
Content-Length: 0
See the note in the previous section for the description of why this operation requires two steps.
See also: buffersize, FileSystem.append
Open and Read a File
curl -i -L "http://:/webhdfs/v1/?op=OPEN
[&offset=][&length=][&buffersize=]"
The request is redirected to a datanode where the file data can be read:
HTTP/1.1 307 TEMPORARY_REDIRECT
Location: http://:/webhdfs/v1/?op=OPEN...
Content-Length: 0
The client follows the redirect to the datanode and receives the file data:
HTTP/1.1 200 OK
Content-Type: application/octet-stream
Content-Length: 22
Hello, webhdfs user!
See also: offset, length, buffersize, FileSystem.open
Make a Directory
curl -i -X PUT "http://:/webhdfs/v1/?op=MKDIRS[&permission=]"
The client receives a response with a boolean JSON object:
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{"boolean": true}
See also: permission, FileSystem.mkdirs
Create a Symbolic Link
curl -i -X PUT "http://:/webhdfs/v1/?op=CREATESYMLINK
&destination=[&createParent=]"
The client receives a response with zero content length:
HTTP/1.1 200 OK
Content-Length: 0
See also: destination, createParent, FileSystem.createSymlink
Rename a File/Directory
curl -i -X PUT ":/webhdfs/v1/?op=RENAME&destination="
The client receives a response with a boolean JSON object:
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{"boolean": true}
See also: destination, FileSystem.rename
Delete a File/Directory
curl -i -X DELETE "http://:/webhdfs/v1/?op=DELETE
[&recursive=]"
The client receives a response with a boolean JSON object:
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{"boolean": true}
See also: recursive, FileSystem.delete
Status of a File/Directory
curl -i "http://:/webhdfs/v1/?op=GETFILESTATUS"
The client receives a response with a FileStatus JSON object:
HTTP/1.1 200 OK
Content-Type: application/json
Transfer-Encoding: chunked
{
"FileStatus":
{
"accessTime" : 0,
"blockSize" : 0,
"childrenNum" : 1,
"fileId" : 16386,
"group" : "supergroup",
"length" : 0, //in bytes, zero for directories
"modificationTime": 1320173277227,
"owner" : "webuser",
"pathSuffix" : "",
"permission" : "777",
"replication" : 0,
"type" : "DIRECTORY" //enum {FILE, DIRECTORY, SYMLINK}
}
}
See also: FileSystem.getFileStatus
List a Directory
Submit a HTTP GET request.
curl -i "http://:/webhdfs/v1/?op=LISTSTATUS"
The client receives a response with a FileStatuses JSON object:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 427
{
"FileStatuses":
{
"FileStatus":
[
{
"accessTime" : 1320171722771,
"blockSize" : 33554432,
"childrenNum" : 0,
"fileId" : 16387,
"group" : "supergroup",
"length" : 24930,
"modificationTime": 1320171722771,
"owner" : "webuser",
"pathSuffix" : "a.patch",
"permission" : "644",
"replication" : 1,
"type" : "FILE"
},
{
"accessTime" : 0,
"blockSize" : 0,
"childrenNum" : 2,
"fileId" : 16388,
"group" : "supergroup",
"length" : 0,
"modificationTime": 1320895981256,
"owner" : "szetszwo",
"pathSuffix" : "bar",
"permission" : "711",
"replication" : 0,
"type" : "DIRECTORY"
},
...
]
}
}
See also: FileSystem.listStatus
Other File System Operations