Source code for ubelt.util_format

"""
Defines the function :func:`repr2`, which allows for a bit more customization
than :func:`repr` or :func:`pprint`. See the docstring for more details.


Two main goals of repr2 are to provide nice string representations of nested
data structures and make those "eval-able" whenever possible. As an example
take the value ``float('inf')``, which normally has a non-evalable repr of
``inf``:

>>> import ubelt as ub
>>> ub.repr2(float('inf'))
"float('inf')"

The ``newline`` (or ``nl``) keyword argument can control how deep in the
nesting newlines are allowed.

>>> print(ub.repr2({1: float('nan'), 2: float('inf'), 3: 3.0}))
{
    1: float('nan'),
    2: float('inf'),
    3: 3.0,
}

>>> print(ub.repr2({1: float('nan'), 2: float('inf'), 3: 3.0}, nl=0))
{1: float('nan'), 2: float('inf'), 3: 3.0}


You can also define or overwrite how representations for different types are
created. You can either create your own extension object, or you can
monkey-patch `ub.util_format._FORMATTER_EXTENSIONS` without specifying the
extensions keyword argument (although this will be a global change).

>>> extensions = ub.util_format.FormatterExtensions()
>>> @extensions.register(float)
>>> def my_float_formater(data, **kw):
>>>     return "monkey({})".format(data)
>>> print(ub.repr2({1: float('nan'), 2: float('inf'), 3: 3.0}, nl=0, extensions=extensions))
{1: monkey(nan), 2: monkey(inf), 3: monkey(3.0)}

As of ubelt 1.1.0 you can now access and update the default extensions via the
repr2 function itself.

>>> # xdoctest: +SKIP
>>> # We skip this at test time to not modify global state
>>> @ub.repr2.EXTENSIONS.register(float)
>>> def my_float_formater(data, **kw):
>>>     return "monkey2({})".format(data)
>>> print(ub.repr2({1: float('nan'), 2: float('inf'), 3: 3.0}, nl=0))
"""
import collections
from ubelt import util_str
from ubelt import util_list

__all__ = ['repr2', 'FormatterExtensions']


[docs]def repr2(data, **kwargs): """ Makes a pretty string representation of ``data``. Makes a pretty and easy-to-doctest string representation. Has nice handling of common nested datatypes. This is an alternative to repr, and :func:`pprint.pformat`. This output of this function are configurable. By default it aims to produce strings that are consistent, compact, and executable. This makes them great for doctests. Note: This function has many keyword arguments that can be used to customize the final representation. For convenience some of the more frequently used kwargs have short aliases. See "Kwargs" for more details. Args: data (object): an arbitrary python object to form the string "representation" of Kwargs: si, stritems, (bool): dict/list items use str instead of repr strkeys, sk (bool): dict keys use str instead of repr strvals, sv (bool): dict values use str instead of repr nl, newlines (int | bool): number of top level nestings to place a newline after. If true all items are followed by newlines regardless of nesting level. Defaults to 1 for lists and True for dicts. nobr, nobraces (bool, default=False): if True, text will not contain outer braces for containers cbr, compact_brace (bool, default=False): if True, braces are compactified (i.e. they will not have newlines placed directly after them, think java / K&R / 1TBS) trailsep, trailing_sep (bool): if True, a separator is placed after the last item in a sequence. By default this is True if there are any ``nl > 0``. explicit (bool, default=False): changes dict representation from ``{k1: v1, ...}`` to ``dict(k1=v1, ...)``. Modifies: default kvsep is modified to ``'='`` dict braces from `{}` to `dict()`. compact (bool, default=False): Produces values more suitable for space constrianed environments Modifies: default kvsep is modified to ``'='`` default itemsep is modified to ``''`` default nobraces is modified to ``1``. default newlines is modified to ``0``. default strkeys to ``True`` default strvals to ``True`` precision (int, default=None): if specified floats are formatted with this precision kvsep (str, default=': '): separator between keys and values itemsep (str, default=' '): separator between items. This separator is placed after commas, which are currently not configurable. This may be modified in the future. sort (bool | callable, default=None): if None, then sort unordered collections, but keep the ordering of ordered collections. This option attempts to be deterministic in most cases. New in 0.8.0: if ``sort`` is callable, it will be used as a key-function to sort all collections. if False, then nothing will be sorted, and the representation of unordered collections will be arbitrary and possibly non-determenistic. if True, attempts to sort all collections in the returned text. Currently if True this WILL sort lists. Currently if True this WILL NOT sort OrderedDicts. NOTE: The previous behavior may not be intuitive, as such the behavior of this arg is subject to change. suppress_small (bool): passed to :func:`numpy.array2string` for ndarrays max_line_width (int): passed to :func:`numpy.array2string` for ndarrays with_dtype (bool): only relevant to numpy.ndarrays. if True includes the dtype. Defaults to `not strvals`. align (bool | str, default=False): if True, will align multi-line dictionaries by the kvsep extensions (FormatterExtensions): a custom :class:`FormatterExtensions` instance that can overwrite or define how different types of objects are formatted. Returns: str: outstr - output string Note: There are also internal kwargs, which should not be used: _return_info (bool): return information about child context _root_info (depth): information about parent context RelatedWork: :func:`rich.pretty.pretty_repr` :func:`pprint.pformat` Example: >>> import ubelt as ub >>> dict_ = { ... 'custom_types': [slice(0, 1, None), 1/3], ... 'nest_dict': {'k1': [1, 2, {3: {4, 5}}], ... 'key2': [1, 2, {3: {4, 5}}], ... 'key3': [1, 2, {3: {4, 5}}], ... }, ... 'nest_dict2': {'k': [1, 2, {3: {4, 5}}]}, ... 'nested_tuples': [tuple([1]), tuple([2, 3]), frozenset([4, 5, 6])], ... 'one_tup': tuple([1]), ... 'simple_dict': {'spam': 'eggs', 'ham': 'jam'}, ... 'simple_list': [1, 2, 'red', 'blue'], ... 'odict': ub.odict([(1, '1'), (2, '2')]), ... } >>> # In the interest of saving space we are only going to show the >>> # output for the first example. >>> result = ub.repr2(dict_, nl=1, precision=2) >>> print(result) { 'custom_types': [slice(0, 1, None), 0.33], 'nest_dict': {'k1': [1, 2, {3: {4, 5}}], 'key2': [1, 2, {3: {4, 5}}], 'key3': [1, 2, {3: {4, 5}}]}, 'nest_dict2': {'k': [1, 2, {3: {4, 5}}]}, 'nested_tuples': [(1,), (2, 3), {4, 5, 6}], 'odict': {1: '1', 2: '2'}, 'one_tup': (1,), 'simple_dict': {'ham': 'jam', 'spam': 'eggs'}, 'simple_list': [1, 2, 'red', 'blue'], } >>> # You can try the rest yourself. >>> result = ub.repr2(dict_, nl=3, precision=2); print(result) >>> result = ub.repr2(dict_, nl=2, precision=2); print(result) >>> result = ub.repr2(dict_, nl=1, precision=2, itemsep='', explicit=True); print(result) >>> result = ub.repr2(dict_, nl=1, precision=2, nobr=1, itemsep='', explicit=True); print(result) >>> result = ub.repr2(dict_, nl=3, precision=2, cbr=True); print(result) >>> result = ub.repr2(dict_, nl=3, precision=2, si=True); print(result) >>> result = ub.repr2(dict_, nl=3, sort=True); print(result) >>> result = ub.repr2(dict_, nl=3, sort=False, trailing_sep=False); print(result) >>> result = ub.repr2(dict_, nl=3, sort=False, trailing_sep=False, nobr=True); print(result) Example: >>> import ubelt as ub >>> def _nest(d, w): ... if d == 0: ... return {} ... else: ... return {'n{}'.format(d): _nest(d - 1, w + 1), 'm{}'.format(d): _nest(d - 1, w + 1)} >>> dict_ = _nest(d=4, w=1) >>> result = ub.repr2(dict_, nl=6, precision=2, cbr=1) >>> print('---') >>> print(result) >>> result = ub.repr2(dict_, nl=-1, precision=2) >>> print('---') >>> print(result) Example: >>> import ubelt as ub >>> data = {'a': 100, 'b': [1, '2', 3], 'c': {20:30, 40: 'five'}} >>> print(ub.repr2(data, nl=1)) { 'a': 100, 'b': [1, '2', 3], 'c': {20: 30, 40: 'five'}, } >>> # Compact is useful for things like timerit.Timerit labels >>> print(ub.repr2(data, compact=True)) a=100,b=[1,2,3],c={20=30,40=five} >>> print(ub.repr2(data, compact=True, nobr=False)) {a=100,b=[1,2,3],c={20=30,40=five}} """ custom_extensions = kwargs.get('extensions', None) _return_info = kwargs.get('_return_info', False) kwargs['_root_info'] = _rectify_root_info(kwargs.get('_root_info', None)) if kwargs.get('compact', False): # Compact profile defaults kwargs['newlines'] = kwargs.get('newlines', 0) kwargs['strkeys'] = kwargs.get('strkeys', True) kwargs['strvals'] = kwargs.get('strvals', True) kwargs['nobraces'] = kwargs.get('nobraces', 1) kwargs['itemsep'] = kwargs.get('itemsep', '') kwargs['kvsep'] = kwargs.get('kvsep', '=') outstr = None _leaf_info = None if custom_extensions: func = custom_extensions.lookup(data) if func is not None: outstr = func(data, **kwargs) if outstr is None: if isinstance(data, dict): outstr, _leaf_info = _format_dict(data, **kwargs) elif isinstance(data, (list, tuple, set, frozenset)): outstr, _leaf_info = _format_list(data, **kwargs) if outstr is None: # check any globally registered functions for special formatters func = _FORMATTER_EXTENSIONS.lookup(data) if func is not None: outstr = func(data, **kwargs) else: outstr = _format_object(data, **kwargs) if _return_info: _leaf_info = _rectify_leaf_info(_leaf_info) return outstr, _leaf_info else: return outstr
def _rectify_root_info(_root_info): if _root_info is None: _root_info = { 'depth': 0, } return _root_info def _rectify_leaf_info(_leaf_info): if _leaf_info is None: _leaf_info = { 'max_height': 0, 'min_height': 0, } return _leaf_info
[docs]class FormatterExtensions(object): """ Helper class for managing non-builtin (e.g. numpy) format types. This module (:mod:`ubelt.util_format`) maintains a global set of basic extensions, but it is also possible to create a locally scoped set of extensions and explicitly pass it to repr2. The following example demonstrates this. Example: >>> import ubelt as ub >>> class MyObject(object): >>> pass >>> data = {'a': [1, 2.2222, MyObject()], 'b': MyObject()} >>> # Create a custom set of extensions >>> extensions = ub.FormatterExtensions() >>> # Register a function to format your specific type >>> @extensions.register(MyObject) >>> def format_myobject(data, **kwargs): >>> return 'I can do anything here' >>> # Repr2 will now respect the passed custom extensions >>> # Note that the global extensions will still be respected >>> # unless they are overloaded. >>> print(ub.repr2(data, nl=-1, precision=1, extensions=extensions)) { 'a': [1, 2.2, I can do anything here], 'b': I can do anything here } >>> # Overload the formatter for float and int >>> @extensions.register((float, int)) >>> def format_myobject(data, **kwargs): >>> return str((data + 10) // 2) >>> print(ub.repr2(data, nl=-1, precision=1, extensions=extensions)) { 'a': [5, 6.0, I can do anything here], 'b': I can do anything here } """ # set_types = [set, frozenset] # list_types = [list, tuple] # dict_types = [dict] # custom_types = { # 'numpy': [], # 'pandas': [], # } # @classmethod # def sequence_types(cls): # return cls.list_types + cls.set_types def __init__(self): self._type_registry = {} # type: Dict[Type, Callable] # NOQA self._typename_registry = {} # type: Dict[str, Callable] # NOQA self._lazy_queue = [] # type: List[Callable] # NOQA # self._lazy_registrations = [ # self._register_numpy_extensions, # self._register_builtin_extensions, # ]
[docs] def register(self, key): """ Registers a custom formatting function with ub.repr2 Args: key (Type | Tuple[Type] | str): indicator of the type Returns: Callable: decorator function """ def _decorator(func): if isinstance(key, tuple): for t in key: self._type_registry[t] = func if isinstance(key, str): self._typename_registry[key] = func else: self._type_registry[key] = func return func return _decorator
[docs] def lookup(self, data): """ Returns an appropriate function to format ``data`` if one has been registered. """ # Evaluate the lazy queue if anything is in it if self._lazy_queue: for func in self._lazy_queue: func() self._lazy_queue = [] for type_, func in self._type_registry.items(): if isinstance(data, type_): return func # Fallback to registered typenames. # If we cannot find a formatter for this type, then return None typename = type(data).__name__ func = self._typename_registry.get(typename, None) return func
def _register_pandas_extensions(self): """ Example: >>> # xdoctest: +REQUIRES(module:pandas) >>> # xdoctest: +IGNORE_WHITESPACE >>> import pandas as pd >>> import numpy as np >>> import ubelt as ub >>> rng = np.random.RandomState(0) >>> data = pd.DataFrame(rng.rand(3, 3)) >>> print(ub.repr2(data)) >>> print(ub.repr2(data, precision=2)) >>> print(ub.repr2({'akeyfdfj': data}, precision=2)) """ @self.register('DataFrame') def format_pandas(data, **kwargs): # nocover precision = kwargs.get('precision', None) float_format = (None if precision is None else '%.{}f'.format(precision)) formatted = data.to_string(float_format=float_format) return formatted # def _register_torch_extensions(self): # @self.register('Tensor') # def format_tensor(data, **kwargs): # """ # Example: # >>> # xdoctest: +REQUIRES(module:torch) # >>> # xdoctest: +IGNORE_WHITESPACE # >>> import torch # >>> import numpy as np # >>> data = np.array([[.2, 42, 5], [21.2, 3, .4]]) # >>> data = torch.from_numpy(data) # >>> data = torch.rand(100, 100) # >>> print('data = {}'.format(ub.repr2(data, nl=1))) # >>> print(ub.repr2(data)) # """ # import numpy as np # func = self._type_registry[np.ndarray] # npdata = data.data.cpu().numpy() # # kwargs['strvals'] = True # kwargs['with_dtype'] = False # formatted = func(npdata, **kwargs) # # hack for prefix class # formatted = formatted.replace('np.array', '__Tensor') # # import ubelt as ub # # formatted = ub.hzcat('Tensor(' + formatted + ')') # return formatted def _register_numpy_extensions(self): """ Example: >>> # xdoctest: +REQUIRES(module:numpy) >>> import sys >>> import pytest >>> import ubelt as ub >>> if not ub.modname_to_modpath('numpy'): ... raise pytest.skip() >>> # xdoctest: +IGNORE_WHITESPACE >>> import numpy as np >>> data = np.array([[.2, 42, 5], [21.2, 3, .4]]) >>> print(ub.repr2(data)) np.array([[ 0.2, 42. , 5. ], [21.2, 3. , 0.4]], dtype=np.float64) >>> print(ub.repr2(data, with_dtype=False)) np.array([[ 0.2, 42. , 5. ], [21.2, 3. , 0.4]]) >>> print(ub.repr2(data, strvals=True)) [[ 0.2, 42. , 5. ], [21.2, 3. , 0.4]] >>> data = np.empty((0, 10), dtype=np.float64) >>> print(ub.repr2(data, strvals=False)) np.empty((0, 10), dtype=np.float64) >>> print(ub.repr2(data, strvals=True)) [] >>> data = np.ma.empty((0, 10), dtype=np.float64) >>> print(ub.repr2(data, strvals=False)) np.ma.empty((0, 10), dtype=np.float64) """ # TODO: should we register numpy using the new string method? import numpy as np @self.register(np.ndarray) def format_ndarray(data, **kwargs): import re strvals = kwargs.get('sv', kwargs.get('strvals', False)) itemsep = kwargs.get('itemsep', ' ') precision = kwargs.get('precision', None) suppress_small = kwargs.get('supress_small', None) max_line_width = kwargs.get('max_line_width', None) with_dtype = kwargs.get('with_dtype', kwargs.get('dtype', not strvals)) newlines = kwargs.pop('nl', kwargs.pop('newlines', 1)) # if with_dtype and strvals: # raise ValueError('cannot format with strvals and dtype') separator = ',' + itemsep if strvals: prefix = '' suffix = '' else: modname = type(data).__module__ # substitute shorthand for numpy module names np_nice = 'np' modname = re.sub('\\bnumpy\\b', np_nice, modname) modname = re.sub('\\bma.core\\b', 'ma', modname) class_name = type(data).__name__ if class_name == 'ndarray': class_name = 'array' prefix = modname + '.' + class_name + '(' if with_dtype: dtype_repr = data.dtype.name # dtype_repr = np.core.arrayprint.dtype_short_repr(data.dtype) suffix = ',{}dtype={}.{})'.format(itemsep, np_nice, dtype_repr) else: suffix = ')' if not strvals and data.size == 0 and data.shape != (0,): # Special case for displaying empty data prefix = modname + '.empty(' body = repr(tuple(map(int, data.shape))) else: body = np.array2string(data, precision=precision, separator=separator, suppress_small=suppress_small, prefix=prefix, max_line_width=max_line_width) if not strvals: # Handle special float values inf / nan body = re.sub('\\binf\\b', np_nice + '.inf', body) body = re.sub('\\bnan\\b', np_nice + '.nan', body) if not newlines: # remove newlines if we need to body = re.sub('\n *', '', body) formatted = prefix + body + suffix return formatted # Hack, make sure we also register numpy floats self.register(np.float32)(self._type_registry[float]) def _register_builtin_extensions(self): @self.register(float) def format_float(data, **kwargs): precision = kwargs.get('precision', None) strvals = kwargs.get('sv', kwargs.get('strvals', False)) if precision is None: text = str(data) else: text = ('{:.%df}' % precision).format(data) if not strvals: # Ensure the representation of inf and nan is evaluatable # NOTE: sometimes this function is used to make json objects # how can we ensure that this doesn't break things? # Turns out json, never handled these cases. In the future we # may want to add a json flag to repr2 to encourage it to # output json-like representations. # json.loads("[0, 1, 2, nan]") # json.loads("[Infinity, NaN]") # json.dumps([float('inf'), float('nan')]) import math if math.isinf(data) or math.isnan(data): text = "float('{}')".format(text) return text @self.register(slice) def format_slice(data, **kwargs): if kwargs.get('itemsep', ' ') == '': return 'slice(%r,%r,%r)' % (data.start, data.stop, data.step) else: return _format_object(data, **kwargs)
_FORMATTER_EXTENSIONS = FormatterExtensions() _FORMATTER_EXTENSIONS._register_builtin_extensions() def _lazy_init(): """ Only called in the case where we encounter an unknown type that a commonly used external library might have. For now this is just numpy. Numpy is ubiquitous. """ try: # TODO: can we use lazy loading to prevent trying to import numpy until # some attribute of _FORMATTER_EXTENSIONS is used? _FORMATTER_EXTENSIONS._register_numpy_extensions() _FORMATTER_EXTENSIONS._register_pandas_extensions() # _FORMATTER_EXTENSIONS._register_torch_extensions() except ImportError: # nocover pass _FORMATTER_EXTENSIONS._lazy_queue.append(_lazy_init) def _format_object(val, **kwargs): stritems = kwargs.get('si', kwargs.get('stritems', False)) strvals = stritems or kwargs.get('sv', kwargs.get('strvals', False)) base_valfunc = str if strvals else repr itemstr = base_valfunc(val) return itemstr def _format_list(list_, **kwargs): """ Makes a pretty printable / human-readable string representation of a sequence. In most cases this string could be evaled. Args: list_ (list): input list **kwargs: nl, newlines, packed, nobr, nobraces, itemsep, trailing_sep, strvals indent_, precision, use_numpy, with_dtype, force_dtype, stritems, strkeys, explicit, sort, key_order, maxlen Returns: Tuple[str, Dict] : retstr, _leaf_info Example: >>> print(_format_list([])[0]) [] >>> print(_format_list([], nobr=True)[0]) [] >>> print(_format_list([1], nl=0)[0]) [1] >>> print(_format_list([1], nobr=True)[0]) 1, """ kwargs['_root_info'] = _rectify_root_info(kwargs.get('_root_info', None)) kwargs['_root_info']['depth'] += 1 newlines = kwargs.pop('nl', kwargs.pop('newlines', 1)) kwargs['nl'] = _rectify_countdown_or_bool(newlines) nobraces = kwargs.pop('nobr', kwargs.pop('nobraces', False)) kwargs['nobraces'] = _rectify_countdown_or_bool(nobraces) itemsep = kwargs.get('itemsep', ' ') compact_brace = kwargs.get('cbr', kwargs.get('compact_brace', False)) # kwargs['cbr'] = _rectify_countdown_or_bool(compact_brace) itemstrs, _leaf_info = _list_itemstrs(list_, **kwargs) if len(itemstrs) == 0: nobraces = False # force braces to prevent empty output is_tuple = isinstance(list_, tuple) is_set = isinstance(list_, (set, frozenset,)) if nobraces: lbr, rbr = '', '' elif is_tuple: lbr, rbr = '(', ')' elif is_set: lbr, rbr = '{', '}' else: lbr, rbr = '[', ']' # Doesn't actually put in trailing comma if on same line trailing_sep = kwargs.get('trailsep', kwargs.get('trailing_sep', newlines > 0 and len(itemstrs))) # The trailing separator is always needed for single item tuples if is_tuple and len(list_) <= 1: trailing_sep = True if len(itemstrs) == 0: newlines = False retstr = _join_itemstrs(itemstrs, itemsep, newlines, _leaf_info, nobraces, trailing_sep, compact_brace, lbr, rbr) return retstr, _leaf_info def _format_dict(dict_, **kwargs): """ Makes a pretty printable / human-readable string representation of a dictionary. In most cases this string could be evaled. Args: dict_ (dict): a dictionary **kwargs: si, stritems, strkeys, strvals, sk, sv, nl, newlines, nobr, nobraces, cbr, compact_brace, trailing_sep, explicit, itemsep, precision, kvsep, sort Kwargs: sort (None, default=None): if True, sorts ALL collections and subcollections, note, collections with undefined orders (e.g. dicts, sets) are sorted by default. nl (int, default=None): preferred alias for newline. can be a countdown variable explicit (int, default=False): can be a countdown variable. if True, uses dict(a=b) syntax instead of {'a': b} nobr (bool, default=False): removes outer braces Returns: Tuple[str, Dict] : retstr, _leaf_info Example: >>> from ubelt.util_format import * # NOQA >>> dict_ = {'a': 'edf', 'bc': 'ghi'} >>> print(_format_dict(dict_)[0]) { 'a': 'edf', 'bc': 'ghi', } >>> print(_format_dict(dict_, align=True)[0]) >>> print(_format_dict(dict_, align=':')[0]) { 'a' : 'edf', 'bc': 'ghi', } >>> print(_format_dict(dict_, explicit=True, align=True)[0]) dict( a ='edf', bc='ghi', ) """ kwargs['_root_info'] = _rectify_root_info(kwargs.get('_root_info', None)) kwargs['_root_info']['depth'] += 1 stritems = kwargs.pop('si', kwargs.pop('stritems', False)) if stritems: kwargs['strkeys'] = True kwargs['strvals'] = True kwargs['strkeys'] = kwargs.pop('sk', kwargs.pop('strkeys', False)) kwargs['strvals'] = kwargs.pop('sv', kwargs.pop('strvals', False)) newlines = kwargs.pop('nl', kwargs.pop('newlines', True)) kwargs['nl'] = _rectify_countdown_or_bool(newlines) nobraces = kwargs.pop('nobr', kwargs.pop('nobraces', False)) kwargs['nobraces'] = _rectify_countdown_or_bool(nobraces) compact_brace = kwargs.get('cbr', kwargs.get('compact_brace', False)) # kwargs['cbr'] = _rectify_countdown_or_bool(compact_brace) # Doesn't actually put in trailing comma if on same line trailing_sep = kwargs.get('trailsep', kwargs.get('trailing_sep', newlines > 0)) explicit = kwargs.get('explicit', False) itemsep = kwargs.get('itemsep', ' ') align = kwargs.get('align', False) if align and not isinstance(align, str): default_kvsep = ': ' if explicit: default_kvsep = '=' kvsep = kwargs.get('kvsep', default_kvsep) align = kvsep if len(dict_) == 0: retstr = 'dict()' if explicit else '{}' _leaf_info = None else: itemstrs, _leaf_info = _dict_itemstrs(dict_, **kwargs) if nobraces: lbr, rbr = '', '' elif explicit: lbr, rbr = 'dict(', ')' else: lbr, rbr = '{', '}' retstr = _join_itemstrs(itemstrs, itemsep, newlines, _leaf_info, nobraces, trailing_sep, compact_brace, lbr, rbr, align) return retstr, _leaf_info def _join_itemstrs(itemstrs, itemsep, newlines, _leaf_info, nobraces, trailing_sep, compact_brace, lbr, rbr, align=False): """ Joins string-ified items with separators newlines and container-braces. """ # positive newlines means start counting from the root use_newline = newlines > 0 # negative countdown values mean start counting from the leafs # if compact_brace < 0: # compact_brace = (-compact_brace) >= _leaf_info['max_height'] if newlines < 0: use_newline = (-newlines) < _leaf_info['max_height'] if use_newline: sep = ',\n' if nobraces: body_str = sep.join(itemstrs) if trailing_sep and len(itemstrs) > 0: body_str += ',' retstr = body_str else: if compact_brace: # Why must we modify the indentation below and not here? # prefix = '' # rest = [util_str.indent(s, prefix) for s in itemstrs[1:]] # indented = itemstrs[0:1] + rest indented = itemstrs else: prefix = ' ' * 4 indented = [util_str.indent(s, prefix) for s in itemstrs] if align: indented = _align_lines(indented, character=align) body_str = sep.join(indented) if trailing_sep and len(itemstrs) > 0: body_str += ',' if compact_brace: # Why can we modify the indentation here but not above? braced_body_str = (lbr + body_str.replace('\n', '\n ') + rbr) else: braced_body_str = (lbr + '\n' + body_str + '\n' + rbr) retstr = braced_body_str else: sep = ',' + itemsep body_str = sep.join(itemstrs) if trailing_sep and len(itemstrs) > 0: body_str += ',' retstr = (lbr + body_str + rbr) return retstr def _dict_itemstrs(dict_, **kwargs): """ Create a string representation for each item in a dict. Args: dict_ (dict): the dict **kwargs: explicit, precision, kvsep, strkeys, _return_info, cbr, compact_brace, sort Ignore: from ubelt.util_format import _dict_itemstrs import xinspect print(', '.join(xinspect.get_kwargs(_dict_itemstrs, max_depth=0).keys())) Example: >>> from ubelt.util_format import * >>> dict_ = {'b': .1, 'l': 'st', 'g': 1.0, 's': 10, 'm': 0.9, 'w': .5} >>> kwargs = {'strkeys': True} >>> itemstrs, _ = _dict_itemstrs(dict_, **kwargs) >>> char_order = [p[0] for p in itemstrs] >>> assert char_order == ['b', 'g', 'l', 'm', 's', 'w'] """ import ubelt as ub explicit = kwargs.get('explicit', False) kwargs['explicit'] = _rectify_countdown_or_bool(explicit) precision = kwargs.get('precision', None) default_kvsep = ': ' default_strkeys = False if explicit: default_strkeys = True default_kvsep = '=' kvsep = kwargs.get('kvsep', default_kvsep) def make_item_str(key, val): if explicit or kwargs.get('strkeys', default_strkeys): key_str = str(key) else: key_str = repr2(key, precision=precision, newlines=0) prefix = key_str + kvsep kwargs['_return_info'] = True val_str, _leaf_info = repr2(val, **kwargs) # If the first line does not end with an open nest char # (e.g. for ndarrays), otherwise we need to worry about # residual indentation. pos = val_str.find('\n') first_line = val_str if pos == -1 else val_str[:pos] compact_brace = kwargs.get('cbr', kwargs.get('compact_brace', False)) if compact_brace or not first_line.rstrip().endswith(tuple('([{<')): rest = '' if pos == -1 else val_str[pos:] # val_str = first_line.lstrip() + rest val_str = first_line + rest if '\n' in prefix: # Fix issue with keys that span new lines item_str = prefix + val_str else: item_str = ub.hzcat([prefix, val_str]) else: item_str = prefix + val_str return item_str, _leaf_info items = list(dict_.items()) _tups = [make_item_str(key, val) for (key, val) in items] itemstrs = [t[0] for t in _tups] max_height = max([t[1]['max_height'] for t in _tups]) if _tups else 0 _leaf_info = { 'max_height': max_height + 1, } sort = kwargs.get('sort', None) if sort is None: # if sort is None, force orderings on unordered collections like dicts, # but keep ordering of ordered collections like OrderedDicts. # NOTE: WE WANT TO CHANGE THIS TO FALSE BY DEFAULT. # MIGHT REQUIRE DEPRECATING PYTHON 3.6 SUPPORT sort = True # LEGACY UBELT BEHAVIOR # HOW TO WE INTRODUCE A BACKWARDS COMPATIBLE WAY TO MAKE THIS CHANGE? # sort = False # cannot make this change safely if isinstance(dict_, collections.OrderedDict): # never sort ordered dicts; they are perfect just the way they are! sort = False if sort: key = sort if callable(sort) else None itemstrs = _sort_itemstrs(items, itemstrs, key) return itemstrs, _leaf_info def _list_itemstrs(list_, **kwargs): """ Create a string representation for each item in a list. Args: list_ (Sequence): **kwargs: _return_info, sort """ items = list(list_) kwargs['_return_info'] = True _tups = [repr2(item, **kwargs) for item in items] itemstrs = [t[0] for t in _tups] max_height = max([t[1]['max_height'] for t in _tups]) if _tups else 0 _leaf_info = { 'max_height': max_height + 1, } sort = kwargs.get('sort', None) if sort is None: # if sort is None, force orderings on unordered collections like sets, # but keep ordering of ordered collections like lists. sort = isinstance(list_, (set, frozenset)) if sort: key = sort if callable(sort) else None itemstrs = _sort_itemstrs(items, itemstrs, key) return itemstrs, _leaf_info def _sort_itemstrs(items, itemstrs, key=None): """ Equivalent to ``sorted(items)`` except if ``items`` are unorderable, then string values are used to define an ordering. """ # First try to sort items by their normal values # If that does not work, then sort by their string values try: # Set ordering is not unique. Sort by strings values instead. if len(items) > 0 and isinstance(items[0], (set, frozenset)): raise TypeError sortx = util_list.argsort(items, key=key) except TypeError: sortx = util_list.argsort(itemstrs, key=key) itemstrs = [itemstrs[x] for x in sortx] return itemstrs def _rectify_countdown_or_bool(count_or_bool): """ used by recursive functions to specify which level to turn a bool on in counting down yields True, True, ..., False counting up yields False, False, False, ... True Args: count_or_bool (bool | int): if positive and an integer, it will count down, otherwise it will remain the same. Returns: int or bool: count_or_bool_ Example: >>> from ubelt.util_format import _rectify_countdown_or_bool # NOQA >>> count_or_bool = True >>> a1 = (_rectify_countdown_or_bool(2)) >>> a2 = (_rectify_countdown_or_bool(1)) >>> a3 = (_rectify_countdown_or_bool(0)) >>> a4 = (_rectify_countdown_or_bool(-1)) >>> a5 = (_rectify_countdown_or_bool(-2)) >>> a6 = (_rectify_countdown_or_bool(True)) >>> a7 = (_rectify_countdown_or_bool(False)) >>> a8 = (_rectify_countdown_or_bool(None)) >>> result = [a1, a2, a3, a4, a5, a6, a7, a8] >>> print(result) [1, 0, 0, -1, -2, True, False, False] """ if count_or_bool is True or count_or_bool is False: count_or_bool_ = count_or_bool elif isinstance(count_or_bool, int): if count_or_bool == 0: return 0 elif count_or_bool > 0: count_or_bool_ = count_or_bool - 1 else: # We dont countup negatives anymore count_or_bool_ = count_or_bool else: count_or_bool_ = False return count_or_bool_ def _align_text(text, character='=', replchar=None, pos=0): r""" Left justifies text on the left side of character Args: text (str): text to align character (str): character to align at replchar (str): replacement character (default=None) Returns: str: new_text Example: >>> character = '=' >>> text = 'a = b=\none = two\nthree = fish\n' >>> print(text) >>> result = (_align_text(text, '=')) >>> print(result) a = b= one = two three = fish """ line_list = text.splitlines() new_lines = _align_lines(line_list, character, replchar, pos=pos) new_text = '\n'.join(new_lines) return new_text def _align_lines(line_list, character='=', replchar=None, pos=0): r""" Left justifies text on the left side of character Args: line_list (list of strs): character (str): pos (int or list or None): does one alignment for all chars beyond this column position. If pos is None, then all chars are aligned. Returns: list: new_lines Example: >>> line_list = 'a = b\none = two\nthree = fish'.split('\n') >>> character = '=' >>> new_lines = _align_lines(line_list, character) >>> result = ('\n'.join(new_lines)) >>> print(result) a = b one = two three = fish Example: >>> line_list = 'foofish:\n a = b\n one = two\n three = fish'.split('\n') >>> character = '=' >>> new_lines = _align_lines(line_list, character) >>> result = ('\n'.join(new_lines)) >>> print(result) foofish: a = b one = two three = fish Example: >>> import ubelt as ub >>> character = ':' >>> text = ub.codeblock(''' {'max': '1970/01/01 02:30:13', 'mean': '1970/01/01 01:10:15', 'min': '1970/01/01 00:01:41', 'range': '2:28:32', 'std': '1:13:57',}''').split('\n') >>> new_lines = _align_lines(text, ':', ' :') >>> result = '\n'.join(new_lines) >>> print(result) {'max' : '1970/01/01 02:30:13', 'mean' : '1970/01/01 01:10:15', 'min' : '1970/01/01 00:01:41', 'range' : '2:28:32', 'std' : '1:13:57',} Example: >>> line_list = 'foofish:\n a = b = c\n one = two = three\nthree=4= fish'.split('\n') >>> character = '=' >>> # align the second occurrence of a character >>> new_lines = _align_lines(line_list, character, pos=None) >>> print(('\n'.join(line_list))) >>> result = ('\n'.join(new_lines)) >>> print(result) foofish: a = b = c one = two = three three=4 = fish """ import re # FIXME: continue to fix ansi if pos is None: # Align all occurrences num_pos = max([line.count(character) for line in line_list]) pos = list(range(num_pos)) # Allow multiple alignments if isinstance(pos, list): pos_list = pos # recursive calls new_lines = line_list for pos in pos_list: new_lines = _align_lines(new_lines, character=character, replchar=replchar, pos=pos) return new_lines # base case if replchar is None: replchar = character # the pos-th character to align lpos = pos rpos = lpos + 1 tup_list = [line.split(character) for line in line_list] handle_ansi = True if handle_ansi: # nocover # Remove ansi from length calculation # References: http://stackoverflow.com/questions/14693701remove-ansi ansi_escape = re.compile(r'\x1b[^m]*m') # Find how much padding is needed maxlen = 0 for tup in tup_list: if len(tup) >= rpos + 1: if handle_ansi: # nocover tup = [ansi_escape.sub('', x) for x in tup] left_lenlist = list(map(len, tup[0:rpos])) left_len = sum(left_lenlist) + lpos * len(replchar) maxlen = max(maxlen, left_len) # Pad each line to align the pos-th occurrence of the chosen character new_lines = [] for tup in tup_list: if len(tup) >= rpos + 1: lhs = character.join(tup[0:rpos]) rhs = character.join(tup[rpos:]) # pad the new line with requested justification newline = lhs.ljust(maxlen) + replchar + rhs new_lines.append(newline) else: new_lines.append(replchar.join(tup)) return new_lines # Give the repr2 function itself a reference to the default extensions # register method so the user can modify them without accessing this module repr2.extensions = _FORMATTER_EXTENSIONS repr2.register = _FORMATTER_EXTENSIONS.register