SkillAgentSearch skills...

Webhdfs

Python WebHDFS library and shell.

Install / Use

/learn @mk23/Webhdfs
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Python WebHDFS

WebHDFS python client library and simple shell.

Table of Contents

Prerequisites

Installation

Install python-webhdfs as a Debian package by building a deb:

dpkg-buildpackage
# or
pdebuild

Install python-webhdfs using the standard setuptools script:

python setup.py install

API

To use the WebHDFS Client API, start by importing the class from the module

>>> from webhdfs import WebHDFSClient

All functions may throw a WebHDFSError exception or one of these subclasses:

| Exception Type | Remote Exception | Description | |----------------------------------|-------------------------------|--------------------------------------------| | WebHDFSConnectionError | | Unable to connect to active NameNode | | WebHDFSIncompleteTransferError | | Transferred file doesn't match origin size | | WebHDFSAccessControlError | AccessControlException | Access to specified path denied | | WebHDFSIllegalArgumentError | IllegalArgumentException | Invalid parameter value | | WebHDFSFileNotFoundError | FileNotFoundException | Specified path does not exist | | WebHDFSSecurityError | SecurityException | Failed to obtain user/group information | | WebHDFSUnsupportedOperationError | UnsupportedOperationException | Requested operation is not implemented | | WebHDFSUnknownRemoteError | | Remote exception unrecognized |

WebHDFSClient

__init__(base, user, conf=None, wait=None)

Creates a new WebHDFSClient object

Parameters:

  • base: base webhdfs url. (e.g. http://localhost:50070)
  • user: user name with which to access all resources
  • conf: (optional) path to hadoop configuration directory for NameNode HA resolution
  • wait: (optional) floating point number in seconds for request timeout waits
>>> import getpass
>>> hdfs = WebHDFSClient('http://localhost:50070', getpass.getuser(), conf='/etc/hadoop/conf', wait=1.5)

stat(path, catch=False)

Retrieves metadata about the specified HDFS item. Uses this WebHDFS REst request:

GET <BASE>/webhdfs/v1/<PATH>?op=GETFILESTATUS

Parameters:

  • path: HDFS path to fetch
  • catch: (optional) trap WebHDFSFileNotFoundError instead of raising the exception

Returns:

  • A single WebHDFSObject object for the specified path.
  • False if object not found in HDFS and catch=True.
>>> o = hdfs.stat('/user')
>>> print o.full
/user
>>> print o.kind
DIRECTORY
>>> o = hdfs.stat('/foo', catch=True)
>>> print o
False

ls(path, recurse=False, request=False)

Lists a specified HDFS path. Uses this WebHDFS REst request:

GET <BASE>/webhdfs/v1/<PATH>?op=LISTSTATUS

Parameters:

  • path: HDFS path to list
  • recurse: (optional) descend down the directory tree
  • request: (optional) filter request callback for each returned object

Returns:

  • Generator producing children WebHDFSObject objects for the specified path.
>>> l = list(hdfs.ls('/')) # must convert to list if referencing by index
>>> print l[0].full
/user
>>> print l[0].kind
DIRECTORY
>>> l = list(hdfs.ls('/user', request=lambda x: x.name.startswith('m')))
>>> print l[0].full
/user/max

glob(path)

Lists a specified HDFS path pattern. Uses this WebHDFS REst request:

GET <BASE>/webhdfs/v1/<PATH>?op=LISTSTATUS

Parameters:

  • path: HDFS path pattern to list

Returns:

>>> l = hdfs.glob('/us*')
>>> print l[0].full
/user
>>> print l[0].kind
DIRECTORY

du(path, real=False)

Gets the usage of a specified HDFS path. Uses this WebHDFS REst request:

GET <BASE>/webhdfs/v1/<PATH>?op=GETCONTENTSUMMARY

Parameters:

  • path: HDFS path to analyze
  • real: (optional) specifies return type

Returns:

  • If real is None: Instance of a du object: du(dirs=, files=, hdfs_usage=, disk_usage=, hdfs_quota=, disk_quota=)
  • If real is a string: Integer for the du object attribute name.
  • If real is boolean True: Integer of hdfs bytes used by the specified path.
  • If real is boolean False: Integer of disk bytes used by the specified path.
>>> u = hdfs.du('/user')
>>> print u
110433
>>> u = hdfs.du('/user', real=True)
>>> print u
331299
>>> u = hdfs.du('/user', real='disk_quota')
>>> print u
-1
>>> u = hdfs.du('/user', real=None)
>>> print u
du(dirs=3, files=5, hdfs_usage=110433, disk_usage=331299, hdfs_quota=-1, disk_quota=-1)

mkdir(path)

Creates the specified HDFS path. Uses this WebHDFS rest request:

PUT <BASE>/webhdfs/v1/<PATH>?op=MKDIRS

Parameters:

  • path: HDFS path to create

Returns:

  • Boolean True
>>> hdfs.mkdir('/user/%s/test' % getpass.getuser())
True

mv(path, dest)

Moves/renames the specified HDFS path to specified destination. Uses this WebHDFS rest request:

PUT <BASE>/webhdfs/v1/<PATH>?op=RENAME&destination=<DEST>

Parameters:

  • path: HDFS path to move/rename
  • dest: Destination path

Returns:

  • Boolean True on success and False on error
>>> hdfs.mv('/user/%s/test' % getpass.getuser(), '/user/%s/test.old' % getpass.getuser())
True
>>> hdfs.mv('/user/%s/test.old' % getpass.getuser(), '/some/non-existant/path')
False

rm(path)

Removes the specified HDFS path. Uses this WebHDFS rest request:

DELETE <BASE>/webhdfs/v1/<PATH>?op=DELETE

Parameters:

  • path: HDFS path to remove

Returns:

  • Boolean True
>>> hdfs.rm('/user/%s/test' % getpass.getuser())
True

repl(path, num)

Sets the replication factor for the specified HDFS path. Uses this WebHDFS rest request:

PUT <BASE>/webhdfs/v1/<PATH>?op=SETREPLICATION

Parameters:

  • path: HDFS path to change
  • num: new replication factor to apply

Returns:

  • Boolean True on success, False otherwise
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).repl
1
>>> hdfs.repl('/user/%s/test' % getpass.getuser(), 3).repl
True
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).repl
3

chown(path, owner='', group='')

Sets the owner and/or group of a specified HDFS path. Uses this WebHDFS REst request:

PUT <BASE>/webhdfs/v1/<PATH>?op=SETOWNER[&owner=<OWNER>][&group=<GROUP>]

Parameters:

  • path: HDFS path to change
  • owner: (optional) new object owner
  • group: (optional) new object group

Returns:

  • Boolean True if ownership successfully applied

Raises:

  • WebHDFSIllegalArgumentError if both owner and group are unspecified or empty
>>> hdfs.chown('/user/%s/test' % getpass.getuser(), owner='other_owner', group='other_group')
True
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).owner
'other_owner'
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).group
'other_group'

chmod(path, perm)

Sets the permission of a specified HDFS path. Uses this WebHDFS REst request:

PUT <BASE>/webhdfs/v1/<PATH>?op=SETPERMISSION&permission=<PERM>

Parameters:

  • path: HDFS path to change
  • perm: new object permission

Returns:

  • Boolean True if permission successfully applied

Raises:

  • WebHDFSIllegalArgumentError if permission is not octal integer under 0777
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).mode
'-rwxr-xr-x'
>>> hdfs.chmod('/user/%s/test' % getpass.getuser(), perm=0644)
True
>>> hdfs.stat('/user/%s/test' % getpass.getuser()).mode
'-rw-r--r--'

touch(path, time=None)

Sets the modification time of a specified HDFS path, optionally creating it. Uses this WebHDFS REst request:

PUT <BASE>/webhdfs/v1/<PATH>?op=SETTIMES&modificationtime=<TIME>

Parameters:

  • path: HDFS path to change
  • time: (optional) object modification time, represented as a Python datetime object or int epoch timestamp, defaulting to current time

Returns:

  • Boolean True if modification time successfully changed

Raises:

  • WebHDFSIllegalArgumentError if time is not a valid type
>>> hdfs.touch('/user/%s/new_test' % getpass.getuser())
True
>>> hdfs.stat('/user/%s/new_test' % getpass.getuser()).date
datetime.datetime(2019, 1, 28, 12, 10, 20)
>>> hdfs.touch('/user/%s/new_test' % getpass.getuser(), datetime.datetime(2018, 9, 27, 11, 1, 17))
True
>>> hdfs.stat('/user/%s/new_test' % getpass.getuser()).date
datetime.datetime(2018, 9, 27, 11, 1, 17)

get(path, data=None)

Fetches the specified HDFS

Related Skills

View on GitHub
GitHub Stars9
CategoryDevelopment
Updated3y ago
Forks6

Languages

Python

Security Score

70/100

Audited on May 25, 2022

No findings