ubelt.util_str module¶

Functions for working with text strings.

ubelt.util_str.indent(text, prefix=' ')[source]¶

Indents a block of text

Parameters:	text (str) – text to indent prefix (str) – prefix to add to each line (default = ‘ ‘)
Returns:	indented text
Return type:	str

CommandLine:: python -m util_str indent

Example

>>> from ubelt.util_str import *  # NOQA
>>> NL = chr(10)  # newline character
>>> text = 'Lorem ipsum' + NL + 'dolor sit amet'
>>> prefix = '    '
>>> result = indent(text, prefix)
>>> assert all(t.startswith(prefix) for t in result.split(NL))

ubelt.util_str.codeblock(block_str)[source]¶

Wraps multiline string blocks and returns unindented code. Useful for templated code defined in indented parts of code.

Parameters:	block_str (str) – typically in the form of a multiline string
Returns:	the unindented string
Return type:	str

CommandLine:: python -m ubelt.util_str codeblock

Example

>>> from ubelt.util_str import *  # NOQA
>>> # Simulate an indented part of code
>>> if True:
>>>     # notice the indentation on this will be normal
>>>     codeblock_version = codeblock(
...             '''
...             def foo():
...                 return 'bar'
...             '''
...         )
>>>     # notice the indentation and newlines on this will be odd
>>>     normal_version = ('''
...         def foo():
...             return 'bar'
...     ''')
>>> assert normal_version != codeblock_version
>>> print('Without codeblock')
>>> print(normal_version)
>>> print('With codeblock')
>>> print(codeblock_version)

ubelt.util_str.hzcat(args, sep='')[source]¶

Horizontally concatenates strings preserving indentation

Concatenates a list of objects ensuring that the next item in the list is all the way to the right of any previous items.

Parameters:	args (List[str]) – strings to concatenate sep (str) – separator (defaults to ‘’)

CommandLine:

python -m ubelt.util_str hzcat

Example1:

>>> import ubelt as ub
>>> B = ub.repr2([[1, 2], [3, 457]], nl=1, cbr=True, trailsep=False)
>>> C = ub.repr2([[5, 6], [7, 8]], nl=1, cbr=True, trailsep=False)
>>> args = ['A = ', B, ' * ', C]
>>> print(ub.hzcat(args))
A = [[1, 2],   * [[5, 6],
     [3, 457]]    [7, 8]]

Example2:

>>> from ubelt.util_str import *
>>> import ubelt as ub
>>> import unicodedata
>>> aa = unicodedata.normalize('NFD', 'á')  # a unicode char with len2
>>> B = ub.repr2([['θ', aa], [aa, aa, aa]], nl=1, si=True, cbr=True, trailsep=False)
>>> C = ub.repr2([[5, 6], [7, 'θ']], nl=1, si=True, cbr=True, trailsep=False)
>>> args = ['A', '=', B, '*', C]
>>> print(ub.hzcat(args, sep='｜'))
A｜=｜[[θ, á],   ｜*｜[[5, 6],
 ｜ ｜ [á, á, á]]｜ ｜ [7, θ]]

ubelt.util_str.ensure_unicode(text)[source]¶

Casts bytes into utf8 (mostly for python2 compatibility)

References

http://stackoverflow.com/questions/12561063/extract-data-from-file

Example

>>> from ubelt.util_str import *
>>> import codecs  # NOQA
>>> assert ensure_unicode('my ünicôdé strįng') == 'my ünicôdé strįng'
>>> assert ensure_unicode('text1') == 'text1'
>>> assert ensure_unicode('text1'.encode('utf8')) == 'text1'
>>> assert ensure_unicode('ï»¿text1'.encode('utf8')) == 'ï»¿text1'
>>> assert (codecs.BOM_UTF8 + 'text»¿'.encode('utf8')).decode('utf8')