Wcwidth

Python library that measures the width of strings in a terminal

Generate Convert Improve

Install / Use

/learn @jquast/Wcwidth

About this skill

Quality Score

0/100

README

============ Introduction

This library is mainly for CLI/TUI programs that carefully produce output for Terminals.

Installation

The stable version of this package is maintained on pypi, install or upgrade, using pip::

pip install --upgrade wcwidth

Problem

All Python string-formatting functions, textwrap.wrap(), str.ljust(), str.rjust(), and str.center() incorrectly measure the displayed width of a string as equal to the number of their codepoints.

Some examples of incorrect results:

.. code-block:: python

>>> # result consumes 16 total cells, 11 expected,
>>> 'コンニチハ'.rjust(11, 'X')
'XXXXXXコンニチハ'

>>> # result consumes 5 total cells, 6 expected,
>>> 'café'.center(6, 'X')
'caféX'

Solution

The lowest-level functions in this library are the POSIX.1-2001 and POSIX.1-2008 wcwidth(3)_ and wcswidth(3), which this library precisely copies by interface as wcwidth() and wcswidth()_. These functions return -1 when C0 and C1 control codes are present.

An easy-to-use width()_ function is provided as a wrapper of wcswidth()_ that is also capable of measuring most terminal control codes and sequences, like colors, bold, tabstops, and horizontal cursor movement.

Text-justification is solved by the grapheme and sequence-aware functions ljust(), rjust(), center(), and wrap(), serving as drop-in replacements to python standard functions of the same names.

The iterator functions iter_graphemes()_ and iter_sequences()_ allow for careful navigation of grapheme and terminal control sequence boundaries. iter_graphemes_reverse(), and grapheme_boundary_before() are useful for editing and searching of complex unicode. The clip()_ function extracts substrings by display column positions, and strip_sequences()_ removes terminal escape sequences from text altogether.

Discrepancies

You may find that support varies for complex unicode sequences or codepoints.

A companion utility, jquast/ucs-detect_ was authored to gather and publish the results of Wide character, language/grapheme clustering and complex script support, emojis and zero-width joiner, variations, and regional indicator (flags) as a General Tabulated Summary_ by terminal emulator software and version.

======== Overview

wcwidth()

Use function wcwidth() to determine the length of a single unicode codepoint.

A brief overview, through examples, for all of the public API functions.

Full API Documentation at https://wcwidth.readthedocs.io/en/latest/api.html

wcwidth()

Measures width of a single codepoint,

.. code-block:: python

>>> # '♀' narrow emoji
>>> wcwidth.wcwidth('\u2640')
1

Use function wcwidth()_ to determine the length of a single unicode character.

See specification_ of character measurements. Note that -1 is returned for control codes.

wcswidth()

Measures width of a string, returns -1 for control codes.

.. code-block:: python

>>> # '♀️' emoji w/vs-16
>>> wcwidth.wcswidth('\u2640\ufe0f')
2

Use function wcswidth()_ to determine the length of many, a string of unicode characters.

See specification_ of character measurements. Note that -1 is returned if control codes occurs anywhere in the string.

width()

Use function width()_ to measure a string with improved handling of control_codes.

.. code-block:: python

>>> # same support as wcswidth(), eg. regional indicator flag:
>>> wcwidth.width('\U0001F1FF\U0001F1FC')
2
>>> # but also supports SGR colored text, 'WARN', followed by SGR reset
>>> wcwidth.width('\x1b[38;2;255;150;100mWARN\x1b[0m')
4
>>> # tabs,
>>> wcwidth.width('\t', tabsize=4)
4
>>> # or, tab and all other control characters can be ignored
>>> wcwidth.width('\t', control_codes='ignore')
0
>>> # "vertical" control characters are ignored
>>> wcwidth.width('\n')
0
>>> # as well as sequences with "indeterminate" effects like Home + Clear
>>> wcwidth.width('\x1b[H\x1b[2J')
0
>>> # or, raise ValueError for "indeterminate" effects using control_codes='strict'
>>> wcwidth.width('\n', control_codes='strict')
Traceback (most recent call last):
...
ValueError: Vertical movement character 0xa at position 0

Use control_codes='ignore' when the input is known not to contain any control characters or terminal sequences for slightly improved performance. Note that TAB ('\t') is a control character and is also ignored, you may want to use str.expandtabs()_, first.

iter_sequences()

Iterates through text, segmented by terminal sequence,

.. code-block:: python

>>> list(wcwidth.iter_sequences('hello'))
[('hello', False)]
>>> list(wcwidth.iter_sequences('\x1b[31mred\x1b[0m'))
[('\x1b[31m', True), ('red', False), ('\x1b[0m', True)]

Use iter_sequences()_ to split text into segments of plain text and escape sequences. Each tuple contains the segment string and a boolean indicating whether it is an escape sequence (True) or text (False).

iter_graphemes()

Use iter_graphemes()_ to iterate over grapheme clusters of a string.

.. code-block:: python

>>> from wcwidth import iter_graphemes
>>> # ok + Regional Indicator 'Z', 'W' (Zimbabwe)
>>> list(wcwidth.iter_graphemes('ok\U0001F1FF\U0001F1FC'))
['o', 'k', '🇿🇼']

>>> # cafe + combining acute accent
>>> list(wcwidth.iter_graphemes('cafe\u0301'))
['c', 'a', 'f', 'é']

>>> # ok + Emoji Man + ZWJ + Woman + ZWJ + Girl
>>> list(wcwidth.iter_graphemes('ok\U0001F468\u200D\U0001F469\u200D\U0001F467'))
['o', 'k', '👨\u200d👩\u200d👧']

A grapheme cluster is what a user perceives as a single character, even if it is composed of multiple Unicode codepoints. This function implements Unicode Standard Annex #29_ grapheme cluster boundary rules.

ljust()

Use ljust()_ as replacement of str.ljust()_:

.. code-block:: python

>>> 'コンニチハ'.ljust(11, '*')             # don't do this
'コンニチハ******'
>>> wcwidth.ljust('コンニチハ', 11, '*')    # do this!
'コンニチハ*'

rjust()

Use rjust()_ as replacement of str.rjust()_:

.. code-block:: python

>>> 'コンニチハ'.rjust(11, '*')             # don't do this
'******コンニチハ'
>>> wcwidth.rjust('コンニチハ', 11, '*')    # do this!
'*コンニチハ'

center()

Use center()_ as replacement of str.center()_:

.. code-block:: python

>>> 'cafe\u0301'.center(6, '*')             # don't do this
'café*'
>>> wcwidth.center('cafe\u0301', 6, '*')
'*café*'                                    # do this!

wrap()

Use function wrap()_ to wrap text containing terminal sequences, Unicode grapheme clusters, and wide characters to a given display width.

.. code-block:: python

>>> from wcwidth import wrap
>>> # Basic wrapping
>>> wrap('hello world', 5)
['hello', 'world']

>>> # Wrapping CJK text (each character is 2 cells wide)
>>> wrap('コンニチハ', 4)
['コン', 'ニチ', 'ハ']

>>> # Text with ANSI color sequences - SGR codes are propagated by default
>>> # Each line ends with reset, next line starts with restored style
>>> wrap('\x1b[1;31mhello world\x1b[0m', 5)
['\x1b[1;31mhello\x1b[0m', '\x1b[1;31mworld\x1b[0m']

clip()

Use clip()_ to extract a substring by column positions, preserving terminal sequences.

.. code-block:: python

>>> from wcwidth import clip
>>> # Wide characters split to Narrow boundaries using fillchar=' '
>>> clip('中文字', 0, 3)
'中 '
>>> clip('中文字', 1, 5, fillchar='.')
'.文.'

>>> # SGR codes are propagated by default - result begins with active style
>>> # and ends with reset if styles are active
>>> clip('\x1b[1;31mHello world\x1b[0m', 6, 11)
'\x1b[1;31mworld\x1b[0m'

>>> # Disable SGR propagation to preserve original sequences as-is
>>> clip('\x1b[31m中文\x1b[0m', 0, 3, propagate_sgr=False)
'\x1b[31m中 \x1b[0m'

strip_sequences()

Use strip_sequences()_ to remove all terminal escape sequences from text.

.. code-block:: python

>>> from wcwidth import strip_sequences
>>> strip_sequences('\x1b[31mred\x1b[0m')
'red'

.. _ambiguous_width:

ambiguous_width

Some Unicode characters have "East Asian Ambiguous" (A) width. These characters display as 1 cell by default, matching Western terminal contexts, but many CJK (Chinese, Japanese, Korean) environments may have a preference for 2 cells. This is often found as boolean option, "Ambiguous width as wide" in Terminal Emulator software preferences.

By default, wcwidth treats ambiguous characters as narrow (width 1). For CJK environments where your terminal is configured to display ambiguous characters as double-width, pass ambiguous_width=2:

.. code-block:: python

>>> # CIRCLED DIGIT ONE - ambiguous width
>>> wcwidth.width('\u2460')
1
>>> wcwidth.width('\u2460', ambiguous_width=2)
2

The ambiguous_width parameter is available on all width-measuring functions: wcwidth(), wcswidth(), width(), ljust(), rjust(), center(), wrap(), and clip().

Terminal Detection

The most reliable method to detect whether a terminal profile is set for "Ambiguous width as wide" mode is to display an ambiguous character surrounded by a pair of Cursor Position Report (CPR) queries with a terminal in cooked or raw mode, and to parse the responses for their (y, x) locations and measure the difference x.

This code should also be careful to check whether it is attached to a terminal and be careful of possible timeout, slow network, or non-response when working with "dumb terminals" like a CI build.

jquast/blessed_ library provides such a helping Terminal.detect_ambiguous_width()_ method:

.. code-block:

Related Skills

node-connect

347.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

107.8k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

347.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

347.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。