Webhdfs
Python WebHDFS library and shell.
Install / Use
/learn @mk23/WebhdfsREADME
Python WebHDFS
WebHDFS python client library and simple shell.
Table of Contents
Prerequisites
- Python 3.4+
- Python requests module
Installation
Install python-webhdfs as a Debian package by building a deb:
dpkg-buildpackage
# or
pdebuild
Install python-webhdfs using the standard setuptools script:
python setup.py install
API
To use the WebHDFS Client API, start by importing the class from the module
>>> from webhdfs import WebHDFSClient
All functions may throw a WebHDFSError exception or one of these subclasses:
| Exception Type | Remote Exception | Description | |----------------------------------|-------------------------------|--------------------------------------------| | WebHDFSConnectionError | | Unable to connect to active NameNode | | WebHDFSIncompleteTransferError | | Transferred file doesn't match origin size | | WebHDFSAccessControlError | AccessControlException | Access to specified path denied | | WebHDFSIllegalArgumentError | IllegalArgumentException | Invalid parameter value | | WebHDFSFileNotFoundError | FileNotFoundException | Specified path does not exist | | WebHDFSSecurityError | SecurityException | Failed to obtain user/group information | | WebHDFSUnsupportedOperationError | UnsupportedOperationException | Requested operation is not implemented | | WebHDFSUnknownRemoteError | | Remote exception unrecognized |
WebHDFSClient
__init__(base, user, conf=None, wait=None)
Creates a new WebHDFSClient object
Parameters:
base: base webhdfs url. (e.g. http://localhost:50070)user: user name with which to access all resourcesconf: (optional) path to hadoop configuration directory for NameNode HA resolutionwait: (optional) floating point number in seconds for request timeout waits
>>> import getpass
>>> hdfs = WebHDFSClient('http://localhost:50070', getpass.getuser(), conf='/etc/hadoop/conf', wait=1.5)
stat(path, catch=False)
Retrieves metadata about the specified HDFS item. Uses this WebHDFS REst request:
GET <BASE>/webhdfs/v1/<PATH>?op=GETFILESTATUS
Parameters:
path: HDFS path to fetchcatch: (optional) trapWebHDFSFileNotFoundErrorinstead of raising the exception
Returns:
- A single
WebHDFSObjectobject for the specified path. Falseif object not found in HDFS andcatch=True.
>>> o = hdfs.stat('/user')
>>> print o.full
/user
>>> print o.kind
DIRECTORY
>>> o = hdfs.stat('/foo', catch=True)
>>> print o
False
ls(path, recurse=False, request=False)
Lists a specified HDFS path. Uses this WebHDFS REst request:
GET <BASE>/webhdfs/v1/<PATH>?op=LISTSTATUS
Parameters:
path: HDFS path to listrecurse: (optional) descend down the directory treerequest: (optional) filter request callback for each returned object
Returns:
- Generator producing children
WebHDFSObjectobjects for the specified path.
>>> l = list(hdfs.ls('/')) # must convert to list if referencing by index
>>> print l[0].full
/user
>>> print l[0].kind
DIRECTORY
>>> l = list(hdfs.ls('/user', request=lambda x: x.name.startswith('m')))
>>> print l[0].full
/user/max
glob(path)
Lists a specified HDFS path pattern. Uses this WebHDFS REst request:
GET <BASE>/webhdfs/v1/<PATH>?op=LISTSTATUS
Parameters:
path: HDFS path pattern to list
Returns:
- List of
WebHDFSObjectobjects for the specified pattern.
>>> l = hdfs.glob('/us*')
>>> print l[0].full
/user
>>> print l[0].kind
DIRECTORY
du(path, real=False)
Gets the usage of a specified HDFS path. Uses this WebHDFS REst request:
GET <BASE>/webhdfs/v1/<PATH>?op=GETCONTENTSUMMARY
Parameters:
path: HDFS path to analyzereal: (optional) specifies return type
Returns:
- If
realisNone: Instance of aduobject:du(dirs=, files=, hdfs_usage=, disk_usage=, hdfs_quota=, disk_quota=) - If
realis a string: Integer for theduobject attribute name. - If
realis booleanTrue: Integer of hdfs bytes used by the specified path. - If
realis booleanFalse: Integer of disk bytes used by the specified path.
>>> u = hdfs.du('/user')
>>> print u
110433
>>> u = hdfs.du('/user', real=True)
>>> print u
331299
>>> u = hdfs.du('/user', real='disk_quota')
>>> print u
-1
>>> u = hdfs.du('/user', real=None)
>>> print u
du(dirs=3, files=5, hdfs_usage=110433, disk_usage=331299, hdfs_quota=-1, disk_quota=-1)
mkdir(path)
Creates the specified HDFS path. Uses this WebHDFS rest request:
PUT <BASE>/webhdfs/v1/<PATH>?op=MKDIRS
Parameters:
path: HDFS path to create
Returns:
- Boolean
True
>>> hdfs.mkdir('/user/%s/test' % getpass.getuser())
True
mv(path, dest)
Moves/renames the specified HDFS path to specified destination. Uses this WebHDFS rest request:
PUT <BASE>/webhdfs/v1/<PATH>?op=RENAME&destination=<DEST>
Parameters:
path: HDFS path to move/renamedest: Destination path
Returns:
- Boolean
Trueon success andFalseon error
>>> hdfs.mv('/user/%s/test' % getpass.getuser(), '/user/%s/test.old' % getpass.getuser())
True
>>> hdfs.mv('/user/%s/test.old' % getpass.getuser(), '/some/non-existant/path')
False
rm(path)
Removes the specified HDFS path. Uses this WebHDFS rest request:
DELETE <BASE>/webhdfs/v1/<PATH>?op=DELETE
Parameters:
path: HDFS path to remove
Returns:
- Boolean
True
>>> hdfs.rm('/user/%s/test' % getpass.getuser())
True
repl(path, num)
Sets the replication factor for the specified HDFS path. Uses this WebHDFS rest request:
PUT <BASE>/webhdfs/v1/<PATH>?op=SETREPLICATION
Parameters:
path: HDFS path to changenum: new replication factor to apply
Returns:
- Boolean
Trueon success,Falseotherwise
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).repl
1
>>> hdfs.repl('/user/%s/test' % getpass.getuser(), 3).repl
True
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).repl
3
chown(path, owner='', group='')
Sets the owner and/or group of a specified HDFS path. Uses this WebHDFS REst request:
PUT <BASE>/webhdfs/v1/<PATH>?op=SETOWNER[&owner=<OWNER>][&group=<GROUP>]
Parameters:
path: HDFS path to changeowner: (optional) new object ownergroup: (optional) new object group
Returns:
- Boolean
Trueif ownership successfully applied
Raises:
WebHDFSIllegalArgumentErrorif both owner and group are unspecified or empty
>>> hdfs.chown('/user/%s/test' % getpass.getuser(), owner='other_owner', group='other_group')
True
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).owner
'other_owner'
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).group
'other_group'
chmod(path, perm)
Sets the permission of a specified HDFS path. Uses this WebHDFS REst request:
PUT <BASE>/webhdfs/v1/<PATH>?op=SETPERMISSION&permission=<PERM>
Parameters:
path: HDFS path to changeperm: new object permission
Returns:
- Boolean
Trueif permission successfully applied
Raises:
WebHDFSIllegalArgumentErrorif permission is not octal integer under 0777
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).mode
'-rwxr-xr-x'
>>> hdfs.chmod('/user/%s/test' % getpass.getuser(), perm=0644)
True
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).mode
'-rw-r--r--'
touch(path, time=None)
Sets the modification time of a specified HDFS path, optionally creating it. Uses this WebHDFS REst request:
PUT <BASE>/webhdfs/v1/<PATH>?op=SETTIMES&modificationtime=<TIME>
Parameters:
path: HDFS path to changetime: (optional) object modification time, represented as a Python datetime object orintepoch timestamp, defaulting to current time
Returns:
- Boolean
Trueif modification time successfully changed
Raises:
WebHDFSIllegalArgumentErrorif time is not a valid type
>>> hdfs.touch('/user/%s/new_test' % getpass.getuser())
True
>>> hdfs.stat('/user/%s/new_test' % getpass.getuser()).date
datetime.datetime(2019, 1, 28, 12, 10, 20)
>>> hdfs.touch('/user/%s/new_test' % getpass.getuser(), datetime.datetime(2018, 9, 27, 11, 1, 17))
True
>>> hdfs.stat('/user/%s/new_test' % getpass.getuser()).date
datetime.datetime(2018, 9, 27, 11, 1, 17)
get(path, data=None)
Fetches the specified HDFS
Related Skills
node-connect
339.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
83.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
339.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
83.8kCommit, push, and open a PR
