Furl
π The easiest way to parse and modify URLs in Python.
Install / Use
/learn @gruns/FurlREADME
furl is a small Python library that makes parsing and<br>modifying URLs easy.
Python's standard urllib and urlparse modules provide a number of URL related functions, but using these functions to perform common URL operations proves tedious. Furl makes parsing and modifying URLs easy.
Furl is well tested, Unlicensed in the public domain, and supports Python 3 and PyPy3.
Furl is maintained by Alex Cochran, with support from the confidential computing folks at π Lunal.
Usage
Code time: Paths and query arguments are easy. Really easy.
>>> from furl import furl
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f /= 'path'
>>> del f.args['one']
>>> f.args['three'] = '3'
>>> f.url
'http://www.google.com/path?two=2&three=3'
Or use furl's inline modification methods.
>>> furl('http://www.google.com/?one=1').add({'two':'2'}).url
'http://www.google.com/?one=1&two=2'
>>> furl('http://www.google.com/?one=1&two=2').set({'three':'3'}).url
'http://www.google.com/?three=3'
>>> furl('http://www.google.com/?one=1&two=2').remove(['one']).url
'http://www.google.com/?two=2'
Encoding is handled for you. Unicode, too.
>>> f = furl('http://www.google.com/')
>>> f.path = 'some encoding here'
>>> f.args['and some encoding'] = 'here, too'
>>> f.url
'http://www.google.com/some%20encoding%20here?and+some+encoding=here,+too'
>>> f.set(host=u'γγ‘γ€γ³.γγΉγ', path=u'Π΄ΠΆΠΊ', query=u'β=βΊ')
>>> f.url
'http://xn--eckwd4c7c.xn--zckzah/%D0%B4%D0%B6%D0%BA?%E2%98%83=%E2%98%BA'
Fragments also have a path and a query.
>>> f = furl('http://www.google.com/')
>>> f.fragment.path.segments = ['two', 'directories']
>>> f.fragment.args = {'one': 'argument'}
>>> f.url
'http://www.google.com/#two/directories?one=argument'
Installation
Installing furl with pip is easy.
$ pip install furl
API
- Basics
- Scheme, Username, Password, Host, Port, Network Location, and Origin
- Path
- Query
- Fragment
- Encoding
- Inline modification
- Miscellaneous
Basics
furl objects let you access and modify the various components of a URL.
scheme://username:password@host:port/path?query#fragment
- scheme is the scheme string (all lowercase) or None. None means no
scheme. An empty string means a protocol relative URL, like
//www.google.com. - username is the username string for authentication.
- password is the password string for authentication with username.
- host is the domain name, IPv4, or IPv6 address as a string. Domain names are all lowercase.
- port is an integer or None. A value of None means no port specified and
the default port for the given scheme should be inferred, if possible
(e.g. port 80 for the scheme
http). - path is a Path object comprised of path segments.
- query is a Query object comprised of key:value query arguments.
- fragment is a Fragment object comprised of a Path object and Query object
separated by an optional
?separator.
Scheme, Username, Password, Host, Port, Network Location, and Origin
scheme, username, password, and host are strings or None. port is an integer or None.
>>> f = furl('http://user:pass@www.google.com:99/')
>>> f.scheme, f.username, f.password, f.host, f.port
('http', 'user', 'pass', 'www.google.com', 99)
furl infers the default port for common schemes.
>>> f = furl('https://secure.google.com/')
>>> f.port
443
>>> f = furl('unknown://www.google.com/')
>>> print(f.port)
None
netloc is the string combination of username, password, host, and port, not including port if it's None or the default port for the provided scheme.
>>> furl('http://www.google.com/').netloc
'www.google.com'
>>> furl('http://www.google.com:99/').netloc
'www.google.com:99'
>>> furl('http://user:pass@www.google.com:99/').netloc
'user:pass@www.google.com:99'
origin is the string combination of scheme, host, and port, not including port if it's None or the default port for the provided scheme.
>>> furl('http://www.google.com/').origin
'http://www.google.com'
>>> furl('http://www.google.com:99/').origin
'http://www.google.com:99'
Path
URL paths in furl are Path objects that have segments, a list of zero or more path segments that can be modified directly. Path segments in segments are percent-decoded and all interaction with segments should take place with percent-decoded strings.
>>> f = furl('http://www.google.com/a/large%20ish/path')
>>> f.path
Path('/a/large ish/path')
>>> f.path.segments
['a', 'large ish', 'path']
>>> str(f.path)
'/a/large%20ish/path'
Modification
>>> f.path.segments = ['a', 'new', 'path', '']
>>> str(f.path)
'/a/new/path/'
>>> f.path = 'o/hi/there/with%20some%20encoding/'
>>> f.path.segments
['o', 'hi', 'there', 'with some encoding', '']
>>> str(f.path)
'/o/hi/there/with%20some%20encoding/'
>>> f.url
'http://www.google.com/o/hi/there/with%20some%20encoding/'
>>> f.path.segments = ['segments', 'are', 'maintained', 'decoded', '^`<>[]"#/?']
>>> str(f.path)
'/segments/are/maintained/decoded/%5E%60%3C%3E%5B%5D%22%23%2F%3F'
A path that starts with / is considered absolute, and a Path can be absolute
or not as specified (or set) by the boolean attribute isabsolute. URL Paths
have a special restriction: they must be absolute if a netloc (username,
password, host, and/or port) is present. This restriction exists because a URL
path must start with / to separate itself from the netloc, if
present. Fragment Paths have no such limitation and isabsolute and can be
True or False without restriction.
Here's a URL Path example that illustrates how isabsolute becomes True and read-only in the presence of a netloc.
>>> f = furl('/url/path')
>>> f.path.isabsolute
True
>>> f.path.isabsolute = False
>>> f.url
'url/path'
>>> f.host = 'blaps.ru'
>>> f.url
'blaps.ru/url/path'
>>> f.path.isabsolute
True
>>> f.path.isabsolute = False
Traceback (most recent call last):
...
AttributeError: Path.isabsolute is True and read-only for URLs with a netloc (a username, password, host, and/or port). URL paths must be absolute if a netloc exists.
>>> f.url
'blaps.ru/url/path'
Conversely, the isabsolute attribute of Fragment Paths isn't bound by the
same read-only restriction. URL fragments are always prefixed by a # character
and don't need to be separated from the netloc.
>>> f = furl('http://www.google.com/#/absolute/fragment/path/')
>>> f.fragment.path.isabsolute
True
>>> f.fragment.path.isabsolute = False
>>> f.url
'http://www.google.com/#absolute/fragment/path/'
>>> f.fragment.path.isabsolute = True
>>> f.url
'http://www.google.com/#/absolute/fragment/path/'
A path that ends with / is considered a directory, and otherwise considered a
file. The Path attribute isdir returns True if the path is a directory,
False otherwise. Conversely, the attribute isfile returns True if the path
is a file, False otherwise.
>>> f = furl('http://www.google.com/a/directory/')
>>> f.path.isdir
True
>>> f.path.isfile
False
>>> f = furl('http://www.google.com/a/file')
>>> f.path.isdir
False
>>> f.path.isfile
True
A path can be normalized with normalize(), and normalize() returns the Path object for method chaining.
>>> f = furl('http://www.google.com////a/./b/lolsup/../c/')
>>> f.path.normalize()
>>> f.url
'http://www.google.com/a/b/c/'
Path segments can also be appended with the slash operator, like with pathlib.Path.
>>> from __future__ import division # For Python 2.x.
>>>
>>> f = furl('path')
>>> f.path /= 'with'
>>> f.path = f.path / 'more' / 'path segments/'
>>> f.url
'/path/with/more/path%20segments/'
For a dictionary representation of a path, use asdict().
>>> f = furl('http://www.google.com/some/enc%20oding')
>>> f.path.asdict()
{ 'encoded': '/some/enc%20oding',
'isabsolute': True,
'isdir': False,
'isfile': True,
'segments': ['some', 'enc oding'] }
Query
URL queries in furl are Query objects that have params, a one dimensional ordered multivalue dictionary of query keys and values. Query keys and values in params are percent-decoded and all interaction with params should take place with percent-decoded strings.
>>> f = furl('http://www.google.com/?one=1&two=2')
>>> f.query
Query('one=1&two=2')
>>> f.query.params
omdict1D([('one', '1'), ('two', '2')])
>>> str(f.query)
'one=1&two=2'
furl objects and Fragment objects (covered below) contain a Query object, and args is provided as a shortcut on these objects to access query.params.
>>> f = furl('http://www.google.com/?one=1&two
