ubelt.util_str module

Functions for working with text and strings.

The ensure_unicode function does its best to coerce python 2/3 bytes and text into a consistent unicode text representation.

The codeblock and paragraph wrap multiline strings to help write text blocks without hindering the surrounding code indentation.

The hzcat function horizontally concatenates multiline text.

The indent prefixes all lines in a text block with a given prefix. By default that prefix is 4 spaces.

ubelt.util_str.indent(text, prefix=' ')[source]

Indents a block of text

Parameters:
  • text (str) – text to indent
  • prefix (str) – prefix to add to each line (default = ‘ ‘)
Returns:

indented text

Return type:

str

CommandLine:
python -m util_str indent

Example

>>> from ubelt.util_str import *  # NOQA
>>> NL = chr(10)  # newline character
>>> text = 'Lorem ipsum' + NL + 'dolor sit amet'
>>> prefix = '    '
>>> result = indent(text, prefix)
>>> assert all(t.startswith(prefix) for t in result.split(NL))
ubelt.util_str.codeblock(block_str)[source]

Create a block of text that preserves all newlines and relative indentation

Wraps multiline string blocks and returns unindented code. Useful for templated code defined in indented parts of code.

Parameters:block_str (str) – typically in the form of a multiline string
Returns:the unindented string
Return type:str
CommandLine:
python -m ubelt.util_str codeblock

Example

>>> from ubelt.util_str import *  # NOQA
>>> # Simulate an indented part of code
>>> if True:
>>>     # notice the indentation on this will be normal
>>>     codeblock_version = codeblock(
...             '''
...             def foo():
...                 return 'bar'
...             '''
...         )
>>>     # notice the indentation and newlines on this will be odd
>>>     normal_version = ('''
...         def foo():
...             return 'bar'
...     ''')
>>> assert normal_version != codeblock_version
>>> print('Without codeblock')
>>> print(normal_version)
>>> print('With codeblock')
>>> print(codeblock_version)
ubelt.util_str.paragraph(block_str)[source]

Wraps multi-line strings and restructures the text to remove all newlines, heading, trailing, and double spaces.

Useful for writing log messages

Parameters:block_str (str) – typically in the form of a multiline string
Returns:the reduced text block
Return type:str
CommandLine:
xdoctest -m ubelt.util_str paragraph

Example

>>> from ubelt.util_str import *  # NOQA
>>> block_str = (
>>>     '''
>>>     Lorem ipsum dolor sit amet, consectetur adipiscing
>>>     elit, sed do eiusmod tempor incididunt ut labore et
>>>     dolore magna aliqua.
>>>     ''')
>>> out = paragraph(block_str)
>>> assert chr(10) in block_str
>>> assert chr(10) not in out
>>> print('block_str = {!r}'.format(block_str))
>>> print('out = {!r}'.format(out))
ubelt.util_str.hzcat(args, sep='')[source]

Horizontally concatenates strings preserving indentation

Concatenates a list of objects ensuring that the next item in the list is all the way to the right of any previous items.

Parameters:
  • args (List[str]) – strings to concatenate
  • sep (str) – separator (defaults to ‘’)
CommandLine:
python -m ubelt.util_str hzcat
Example1:
>>> import ubelt as ub
>>> B = ub.repr2([[1, 2], [3, 457]], nl=1, cbr=True, trailsep=False)
>>> C = ub.repr2([[5, 6], [7, 8]], nl=1, cbr=True, trailsep=False)
>>> args = ['A = ', B, ' * ', C]
>>> print(ub.hzcat(args))
A = [[1, 2],   * [[5, 6],
     [3, 457]]    [7, 8]]
Example2:
>>> from ubelt.util_str import *
>>> import ubelt as ub
>>> import unicodedata
>>> aa = unicodedata.normalize('NFD', 'á')  # a unicode char with len2
>>> B = ub.repr2([['θ', aa], [aa, aa, aa]], nl=1, si=True, cbr=True, trailsep=False)
>>> C = ub.repr2([[5, 6], [7, 'θ']], nl=1, si=True, cbr=True, trailsep=False)
>>> args = ['A', '=', B, '*', C]
>>> print(ub.hzcat(args, sep='|'))
A|=|[[θ, á],   |*|[[5, 6],
 | | [á, á, á]]| | [7, θ]]
ubelt.util_str.ensure_unicode(text)[source]

Casts bytes into utf8 (mostly for python2 compatibility)

References

http://stackoverflow.com/questions/12561063/extract-data-from-file

Example

>>> from ubelt.util_str import *
>>> import codecs  # NOQA
>>> assert ensure_unicode('my ünicôdé strįng') == 'my ünicôdé strįng'
>>> assert ensure_unicode('text1') == 'text1'
>>> assert ensure_unicode('text1'.encode('utf8')) == 'text1'
>>> assert ensure_unicode('text1'.encode('utf8')) == 'text1'
>>> assert (codecs.BOM_UTF8 + 'text»¿'.encode('utf8')).decode('utf8')