ubelt.util_dict module

Functions for working with dictionaries.

The UDict is a subclass of dict with quality of life improvements. It contains methods for n-ary key-wise set operations as well as support for the binary operators in addition to other methods for mapping, inversion, subdicts, and peeking. It can be accessed via the alias ubelt.udict.

The SetDict only contains the key-wise set extensions to dict. It can be accessed via the alias ubelt.sdict.

The dict_hist() function counts the number of discrete occurrences of hashable items. Similarly find_duplicates() looks for indices of items that occur more than k=1 times.

The map_keys() and map_values() functions are useful for transforming the keys and values of a dictionary with less syntax than a dict comprehension.

The dict_union(), dict_isect(), and dict_diff() functions are similar to the set equivalents.

The dzip() function zips two iterables and packs them into a dictionary where the first iterable is used to generate keys and the second generates values.

The group_items() function takes two lists and returns a dict mapping values in the second list to all items in corresponding locations in the first list.

The invert_dict() function swaps keys and values. See the function docs for details on dealing with unique and non-unique values.

The ddict() and odict() functions are alias for the commonly used collections.defaultdict() and collections.OrderedDict() classes.

Related Work:

References

PyPIAddict

https://github.com/mewwts/addict

SetDictRecipe1

https://gist.github.com/rossmacarthur/38fa948b175abb512e12c516cc3b936d

SetDictRecipe2

https://code.activestate.com/recipes/577471-setdict/

PypiDictDiffer

https://pypi.org/project/dictdiffer/

DictView

https://docs.python.org/3.0/library/stdtypes.html#dictionary-view-objects

Pep3106

https://peps.python.org/pep-3106/

GHDictMap

https://github.com/ulisesojeda/dictionary_map

class ubelt.util_dict.AutoDict[source]

Bases: UDict

An infinitely nested default dict of dicts.

Implementation of Perl’s autovivification feature that follows [SO_651794].

References

SO_651794

http://stackoverflow.com/questions/651794/init-dict-of-dicts

Example

>>> import ubelt as ub
>>> auto = ub.AutoDict()
>>> auto[0][10][100] = None
>>> assert str(auto) == '{0: {10: {100: None}}}'
to_dict()[source]

Recursively casts a AutoDict into a regular dictionary. All directly nested AutoDict values are also converted.

This effectively de-defaults the structure.

Returns

a copy of this dict without autovivification

Return type

dict

Example

>>> import ubelt as ub
>>> auto = ub.AutoDict()
>>> auto[1] = 1
>>> auto['n1'] = ub.AutoDict()
>>> static = auto.to_dict()
>>> assert not isinstance(static, ub.AutoDict)
>>> assert not isinstance(static['n1'], ub.AutoDict)

Example

>>> import ubelt as ub
>>> auto = ub.AutoOrderedDict()
>>> auto[0][3] = 3
>>> auto[0][2] = 2
>>> auto[0][1] = 1
>>> assert list(auto[0].values()) == [3, 2, 1]
ubelt.util_dict.AutoOrderedDict

alias of AutoDict

ubelt.util_dict.dzip(items1, items2, cls=<class 'dict'>)[source]

Zips elementwise pairs between items1 and items2 into a dictionary.

Values from items2 can be broadcast onto items1.

Parameters
  • items1 (Iterable[KT]) – full sequence

  • items2 (Iterable[VT]) – can either be a sequence of one item or a sequence of equal length to items1

  • cls (Type[dict], default=dict) – dictionary type to use.

Returns

similar to dict(zip(items1, items2)).

Return type

Dict[KT, VT]

Example

>>> import ubelt as ub
>>> assert ub.dzip([1, 2, 3], [4]) == {1: 4, 2: 4, 3: 4}
>>> assert ub.dzip([1, 2, 3], [4, 4, 4]) == {1: 4, 2: 4, 3: 4}
>>> assert ub.dzip([], [4]) == {}
ubelt.util_dict.ddict

alias of defaultdict

ubelt.util_dict.dict_hist(items, weights=None, ordered=False, labels=None)[source]

Builds a histogram of items, counting the number of time each item appears in the input.

Parameters
  • items (Iterable[T]) – hashable items (usually containing duplicates)

  • weights (Iterable[float], default=None) – Corresponding weights for each item.

  • ordered (bool, default=False) – If True the result is ordered by frequency.

  • labels (Iterable[T], default=None) – Expected labels. Allows this function to pre-initialize the histogram. If specified the frequency of each label is initialized to zero and items can only contain items specified in labels.

Returns

dictionary where the keys are unique elements from items, and the values are the number of times the item appears in items.

Return type

dict[T, int]

Example

>>> import ubelt as ub
>>> items = [1, 2, 39, 900, 1232, 900, 1232, 2, 2, 2, 900]
>>> hist = ub.dict_hist(items)
>>> print(ub.repr2(hist, nl=0))
{1: 1, 2: 4, 39: 1, 900: 3, 1232: 2}

Example

>>> import ubelt as ub
>>> items = [1, 2, 39, 900, 1232, 900, 1232, 2, 2, 2, 900]
>>> hist1 = ub.dict_hist(items)
>>> hist2 = ub.dict_hist(items, ordered=True)
>>> try:
>>>     hist3 = ub.dict_hist(items, labels=[])
>>> except KeyError:
>>>     pass
>>> else:
>>>     raise AssertionError('expected key error')
>>> weights = [1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1]
>>> hist4 = ub.dict_hist(items, weights=weights)
>>> print(ub.repr2(hist1, nl=0))
{1: 1, 2: 4, 39: 1, 900: 3, 1232: 2}
>>> print(ub.repr2(hist4, nl=0))
{1: 1, 2: 4, 39: 1, 900: 1, 1232: 0}
ubelt.util_dict.dict_subset(dict_, keys, default=NoParam, cls=<class 'collections.OrderedDict'>)[source]

Get a subset of a dictionary

Parameters
  • dict_ (Dict[KT, VT]) – superset dictionary

  • keys (Iterable[KT]) – keys to take from dict_

  • default (Optional[object] | NoParamType) – if specified uses default if keys are missing.

  • cls (Type[Dict], default=OrderedDict) – type of the returned dictionary.

Returns

subset dictionary

Return type

Dict[KT, VT]

SeeAlso:

dict_isect() - similar functionality, but ignores missing keys

Example

>>> import ubelt as ub
>>> dict_ = {'K': 3, 'dcvs_clip_max': 0.2, 'p': 0.1}
>>> keys = ['K', 'dcvs_clip_max']
>>> subdict_ = ub.dict_subset(dict_, keys)
>>> print(ub.repr2(subdict_, nl=0))
{'K': 3, 'dcvs_clip_max': 0.2}
ubelt.util_dict.dict_union(*args)[source]

Dictionary set extension for set.union

Combines items with from multiple dictionaries. For items with intersecting keys, dictionaries towards the end of the sequence are given precedence.

Parameters

*args (List[Dict]) – A sequence of dictionaries. Values are taken from the last

Returns

OrderedDict if the first argument is an OrderedDict, otherwise dict

Return type

Dict | OrderedDict

Notes

In Python 3.8+, the bitwise or operator “|” operator performs a similar operation, but as of 2022-06-01 there is still no public method for dictionary union (or any other dictionary set operator).

References

https://stackoverflow.com/questions/38987/merge-two-dict

SeeAlso:

collections.ChainMap() - a standard python builtin data structure that provides a view that treats multiple dicts as a single dict. https://docs.python.org/3/library/collections.html#chainmap-objects

Example

>>> import ubelt as ub
>>> result = ub.dict_union({'a': 1, 'b': 1}, {'b': 2, 'c': 2})
>>> assert result == {'a': 1, 'b': 2, 'c': 2}
>>> ub.dict_union(
>>>     ub.odict([('a', 1), ('b', 2)]),
>>>     ub.odict([('c', 3), ('d', 4)]))
OrderedDict([('a', 1), ('b', 2), ('c', 3), ('d', 4)])
>>> ub.dict_union()
{}
ubelt.util_dict.dict_isect(*args)[source]

Dictionary set extension for set.intersection()

Constructs a dictionary that contains keys common between all inputs. The returned values will only belong to the first dictionary.

Parameters

*args (List[Dict[KT, VT] | Iterable[KT]]) – A sequence of dictionaries (or sets of keys). The first argument should always be a dictionary, but the subsequent arguments can just be sets of keys.

Returns

OrderedDict if the first argument is an OrderedDict, otherwise dict

Return type

Dict[KT, VT] | OrderedDict[KT, VT]

Note

This function can be used as an alternative to dict_subset() where any key not in the dictionary is ignored. See the following example:

>>> import ubelt as ub
>>> # xdoctest: +IGNORE_WANT
>>> ub.dict_isect({'a': 1, 'b': 2, 'c': 3}, ['a', 'c', 'd'])
{'a': 1, 'c': 3}

Example

>>> import ubelt as ub
>>> ub.dict_isect({'a': 1, 'b': 1}, {'b': 2, 'c': 2})
{'b': 1}
>>> ub.dict_isect(odict([('a', 1), ('b', 2)]), odict([('c', 3)]))
OrderedDict()
>>> ub.dict_isect()
{}
ubelt.util_dict.dict_diff(*args)[source]

Dictionary set extension for set.difference()

Constructs a dictionary that contains any of the keys in the first arg, which are not in any of the following args.

Parameters

*args (List[Dict[KT, VT] | Iterable[KT]]) – A sequence of dictionaries (or sets of keys). The first argument should always be a dictionary, but the subsequent arguments can just be sets of keys.

Returns

OrderedDict if the first argument is an OrderedDict, otherwise dict

Return type

Dict[KT, VT] | OrderedDict[KT, VT]

Todo

  • [ ] Add inplace keyword argument, which modifies the first dictionary inplace.

Example

>>> import ubelt as ub
>>> ub.dict_diff({'a': 1, 'b': 1}, {'a'}, {'c'})
{'b': 1}
>>> ub.dict_diff(odict([('a', 1), ('b', 2)]), odict([('c', 3)]))
OrderedDict([('a', 1), ('b', 2)])
>>> ub.dict_diff()
{}
>>> ub.dict_diff({'a': 1, 'b': 2}, {'c'})
ubelt.util_dict.find_duplicates(items, k=2, key=None)[source]

Find all duplicate items in a list.

Search for all items that appear more than k times and return a mapping from each (k)-duplicate item to the positions it appeared in.

Parameters
  • items (Iterable[T]) – hashable items possibly containing duplicates

  • k (int, default=2) – only return items that appear at least k times.

  • key (Callable[[T], Any], default=None) – Returns indices where key(items[i]) maps to a particular value at least k times.

Returns

maps each duplicate item to the indices at which it appears

Return type

dict[T, List[int]]

Notes

Similar to more_itertools.duplicates_everseen(), more_itertools.duplicates_justseen().

Example

>>> import ubelt as ub
>>> items = [0, 0, 1, 2, 3, 3, 0, 12, 2, 9]
>>> duplicates = ub.find_duplicates(items)
>>> # Duplicates are a mapping from each item that occurs 2 or more
>>> # times to the indices at which they occur.
>>> assert duplicates == {0: [0, 1, 6], 2: [3, 8], 3: [4, 5]}
>>> # You can set k=3 if you want to don't mind duplicates but you
>>> # want to find triplicates or quadruplets etc.
>>> assert ub.find_duplicates(items, k=3) == {0: [0, 1, 6]}

Example

>>> import ubelt as ub
>>> items = [0, 0, 1, 2, 3, 3, 0, 12, 2, 9]
>>> # note: k can less then 2
>>> duplicates = ub.find_duplicates(items, k=0)
>>> print(ub.repr2(duplicates, nl=0))
{0: [0, 1, 6], 1: [2], 2: [3, 8], 3: [4, 5], 9: [9], 12: [7]}

Example

>>> import ubelt as ub
>>> items = [10, 11, 12, 13, 14, 15, 16]
>>> duplicates = ub.find_duplicates(items, key=lambda x: x // 2)
>>> print(ub.repr2(duplicates, nl=0))
{5: [0, 1], 6: [2, 3], 7: [4, 5]}
ubelt.util_dict.group_items(items, key)[source]

Groups a list of items by group id.

Parameters
  • items (Iterable[VT]) – a list of items to group

  • key (Iterable[KT] | Callable[[VT], KT]) – either a corresponding list of group-ids for each item or a function used to map each item to a group-id.

Returns

a mapping from each group id to the list of corresponding items

Return type

dict[KT, List[VT]]

Example

>>> import ubelt as ub
>>> items    = ['ham',     'jam',   'spam',     'eggs',    'cheese', 'banana']
>>> groupids = ['protein', 'fruit', 'protein',  'protein', 'dairy',  'fruit']
>>> id_to_items = ub.group_items(items, groupids)
>>> print(ub.repr2(id_to_items, nl=0))
{'dairy': ['cheese'], 'fruit': ['jam', 'banana'], 'protein': ['ham', 'spam', 'eggs']}
ubelt.util_dict.invert_dict(dict_, unique_vals=True, cls=None)[source]

Swaps the keys and values in a dictionary.

Parameters
  • dict_ (Dict[KT, VT]) – dictionary to invert

  • unique_vals (bool, default=True) – if False, the values of the new dictionary are sets of the original keys.

  • cls (type | None) – specifies the dict subclassof the result. if unspecified will be dict or OrderedDict. This behavior may change.

Returns

the inverted dictionary

Return type

Dict[VT, KT] | Dict[VT, Set[KT]]

Note

The must values be hashable.

If the original dictionary contains duplicate values, then only one of the corresponding keys will be returned and the others will be discarded. This can be prevented by setting unique_vals=False, causing the inverted keys to be returned in a set.

Example

>>> import ubelt as ub
>>> dict_ = {'a': 1, 'b': 2}
>>> inverted = ub.invert_dict(dict_)
>>> assert inverted == {1: 'a', 2: 'b'}

Example

>>> import ubelt as ub
>>> dict_ = ub.odict([(2, 'a'), (1, 'b'), (0, 'c'), (None, 'd')])
>>> inverted = ub.invert_dict(dict_)
>>> assert list(inverted.keys())[0] == 'a'

Example

>>> import ubelt as ub
>>> dict_ = {'a': 1, 'b': 0, 'c': 0, 'd': 0, 'f': 2}
>>> inverted = ub.invert_dict(dict_, unique_vals=False)
>>> assert inverted == {0: {'b', 'c', 'd'}, 1: {'a'}, 2: {'f'}}
ubelt.util_dict.map_keys(func, dict_, cls=None)[source]

Apply a function to every key in a dictionary.

Creates a new dictionary with the same values and modified keys. An error is raised if the new keys are not unique.

Parameters
  • func (Callable[[KT], T] | Mapping[KT, T]) – a function or indexable object

  • dict_ (Dict[KT, VT]) – a dictionary

  • cls (type | None) – specifies the dict subclassof the result. if unspecified will be dict or OrderedDict. This behavior may change.

Returns

transformed dictionary

Return type

Dict[T, VT]

Raises

Exception – if multiple keys map to the same value

Example

>>> import ubelt as ub
>>> dict_ = {'a': [1, 2, 3], 'b': []}
>>> func = ord
>>> newdict = ub.map_keys(func, dict_)
>>> print(newdict)
>>> assert newdict == {97: [1, 2, 3], 98: []}
>>> dict_ = {0: [1, 2, 3], 1: []}
>>> func = ['a', 'b']
>>> newdict = ub.map_keys(func, dict_)
>>> print(newdict)
>>> assert newdict == {'a': [1, 2, 3], 'b': []}
ubelt.util_dict.map_vals(func, dict_, cls=None)

Apply a function to every value in a dictionary.

Creates a new dictionary with the same keys and modified values.

Parameters
  • func (Callable[[VT], T] | Mapping[VT, T]) – a function or indexable object

  • dict_ (Dict[KT, VT]) – a dictionary

  • cls (type | None) – specifies the dict subclassof the result. if unspecified will be dict or OrderedDict. This behavior may change.

Returns

transformed dictionary

Return type

Dict[KT, T]

Notes

Similar to :module:`dictmap.dict_map`

Example

>>> import ubelt as ub
>>> dict_ = {'a': [1, 2, 3], 'b': []}
>>> newdict = ub.map_values(len, dict_)
>>> assert newdict ==  {'a': 3, 'b': 0}

Example

>>> # Can also use an indexable as ``func``
>>> import ubelt as ub
>>> dict_ = {'a': 0, 'b': 1}
>>> func = [42, 21]
>>> newdict = ub.map_values(func, dict_)
>>> assert newdict ==  {'a': 42, 'b': 21}
>>> print(newdict)
ubelt.util_dict.map_values(func, dict_, cls=None)[source]

Apply a function to every value in a dictionary.

Creates a new dictionary with the same keys and modified values.

Parameters
  • func (Callable[[VT], T] | Mapping[VT, T]) – a function or indexable object

  • dict_ (Dict[KT, VT]) – a dictionary

  • cls (type | None) – specifies the dict subclassof the result. if unspecified will be dict or OrderedDict. This behavior may change.

Returns

transformed dictionary

Return type

Dict[KT, T]

Notes

Similar to :module:`dictmap.dict_map`

Example

>>> import ubelt as ub
>>> dict_ = {'a': [1, 2, 3], 'b': []}
>>> newdict = ub.map_values(len, dict_)
>>> assert newdict ==  {'a': 3, 'b': 0}

Example

>>> # Can also use an indexable as ``func``
>>> import ubelt as ub
>>> dict_ = {'a': 0, 'b': 1}
>>> func = [42, 21]
>>> newdict = ub.map_values(func, dict_)
>>> assert newdict ==  {'a': 42, 'b': 21}
>>> print(newdict)
ubelt.util_dict.sorted_keys(dict_, key=None, reverse=False, cls=<class 'collections.OrderedDict'>)[source]

Return an ordered dictionary sorted by its keys

Parameters
  • dict_ (Dict[KT, VT]) – dictionary to sort. The keys must be of comparable types.

  • key (Callable[[KT], Any] | None) – If given as a callable, customizes the sorting by ordering using transformed keys.

  • reverse (bool, default=False) – if True returns in descending order

  • cls (type) – specifies the dict return type

Returns

new dictionary where the keys are ordered

Return type

OrderedDict[KT, VT]

Example

>>> import ubelt as ub
>>> dict_ = {'spam': 2.62, 'eggs': 1.20, 'jam': 2.92}
>>> newdict = sorted_keys(dict_)
>>> print(ub.repr2(newdict, nl=0))
{'eggs': 1.2, 'jam': 2.92, 'spam': 2.62}
>>> newdict = sorted_keys(dict_, reverse=True)
>>> print(ub.repr2(newdict, nl=0))
{'spam': 2.62, 'jam': 2.92, 'eggs': 1.2}
>>> newdict = sorted_keys(dict_, key=lambda x: sum(map(ord, x)))
>>> print(ub.repr2(newdict, nl=0))
{'jam': 2.92, 'eggs': 1.2, 'spam': 2.62}
ubelt.util_dict.sorted_vals(dict_, key=None, reverse=False, cls=<class 'collections.OrderedDict'>)

Return an ordered dictionary sorted by its values

Parameters
  • dict_ (Dict[KT, VT]) – dictionary to sort. The values must be of comparable types.

  • key (Callable[[VT], Any] | None) – If given as a callable, customizes the sorting by ordering using transformed values.

  • reverse (bool, default=False) – if True returns in descending order

  • cls (type) – specifies the dict return type

Returns

new dictionary where the values are ordered

Return type

OrderedDict[KT, VT]

Example

>>> import ubelt as ub
>>> dict_ = {'spam': 2.62, 'eggs': 1.20, 'jam': 2.92}
>>> newdict = sorted_values(dict_)
>>> print(ub.repr2(newdict, nl=0))
{'eggs': 1.2, 'spam': 2.62, 'jam': 2.92}
>>> newdict = sorted_values(dict_, reverse=True)
>>> print(ub.repr2(newdict, nl=0))
{'jam': 2.92, 'spam': 2.62, 'eggs': 1.2}
>>> newdict = sorted_values(dict_, key=lambda x: x % 1.6)
>>> print(ub.repr2(newdict, nl=0))
{'spam': 2.62, 'eggs': 1.2, 'jam': 2.92}
ubelt.util_dict.sorted_values(dict_, key=None, reverse=False, cls=<class 'collections.OrderedDict'>)[source]

Return an ordered dictionary sorted by its values

Parameters
  • dict_ (Dict[KT, VT]) – dictionary to sort. The values must be of comparable types.

  • key (Callable[[VT], Any] | None) – If given as a callable, customizes the sorting by ordering using transformed values.

  • reverse (bool, default=False) – if True returns in descending order

  • cls (type) – specifies the dict return type

Returns

new dictionary where the values are ordered

Return type

OrderedDict[KT, VT]

Example

>>> import ubelt as ub
>>> dict_ = {'spam': 2.62, 'eggs': 1.20, 'jam': 2.92}
>>> newdict = sorted_values(dict_)
>>> print(ub.repr2(newdict, nl=0))
{'eggs': 1.2, 'spam': 2.62, 'jam': 2.92}
>>> newdict = sorted_values(dict_, reverse=True)
>>> print(ub.repr2(newdict, nl=0))
{'jam': 2.92, 'spam': 2.62, 'eggs': 1.2}
>>> newdict = sorted_values(dict_, key=lambda x: x % 1.6)
>>> print(ub.repr2(newdict, nl=0))
{'spam': 2.62, 'eggs': 1.2, 'jam': 2.92}
ubelt.util_dict.odict

alias of OrderedDict

ubelt.util_dict.named_product(_=None, **basis)[source]

Generates the Cartesian product of the basis.values(), where each generated item labeled by basis.keys().

In other words, given a dictionary that maps each “axes” (i.e. some variable) to its “basis” (i.e. the possible values that it can take), generate all possible points in that grid (i.e. unique assignments of variables to values).

Parameters
  • _ (Dict[str, List[VT]] | None, default=None) – Use of this positional argument is not recommend. Instead specify all arguments as keyword args.

    If specified, this should be a dictionary is unioned with the keyword args. This exists to support ordered dictionaries before Python 3.6, and may eventually be removed.

  • basis (Dict[str, List[VT]]) – A dictionary where the keys correspond to “columns” and the values are a list of possible values that “column” can take.

    I.E. each key corresponds to an “axes”, the values are the list of possible values for that “axes”.

Yields

Dict[str, VT] – a “row” in the “longform” data containing a point in the Cartesian product.

Note

This function is similar to itertools.product(), the only difference is that the generated items are a dictionary that retains the input keys instead of an tuple.

This function used to be called “basis_product”, but “named_product” might be more appropriate. This function exists in other places ([minstrel271_namedproduct], [pytb_namedproduct], and [Hettinger_namedproduct]).

References

minstrel271_namedproduct

https://gist.github.com/minstrel271/d51654af3fa4e6411267

pytb_namedproduct

https://py-toolbox.readthedocs.io/en/latest/modules/itertools.html#

Hettinger_namedproduct

https://twitter.com/raymondh/status/970380630822305792

Example

>>> # An example use case is looping over all possible settings in a
>>> # configuration dictionary for a grid search over parameters.
>>> import ubelt as ub
>>> basis = {
>>>     'arg1': [1, 2, 3],
>>>     'arg2': ['A1', 'B1'],
>>>     'arg3': [9999, 'Z2'],
>>>     'arg4': ['always'],
>>> }
>>> import ubelt as ub
>>> # sort input data for older python versions
>>> basis = ub.odict(sorted(basis.items()))
>>> got = list(ub.named_product(basis))
>>> print(ub.repr2(got, nl=-1))
[
    {'arg1': 1, 'arg2': 'A1', 'arg3': 9999, 'arg4': 'always'},
    {'arg1': 1, 'arg2': 'A1', 'arg3': 'Z2', 'arg4': 'always'},
    {'arg1': 1, 'arg2': 'B1', 'arg3': 9999, 'arg4': 'always'},
    {'arg1': 1, 'arg2': 'B1', 'arg3': 'Z2', 'arg4': 'always'},
    {'arg1': 2, 'arg2': 'A1', 'arg3': 9999, 'arg4': 'always'},
    {'arg1': 2, 'arg2': 'A1', 'arg3': 'Z2', 'arg4': 'always'},
    {'arg1': 2, 'arg2': 'B1', 'arg3': 9999, 'arg4': 'always'},
    {'arg1': 2, 'arg2': 'B1', 'arg3': 'Z2', 'arg4': 'always'},
    {'arg1': 3, 'arg2': 'A1', 'arg3': 9999, 'arg4': 'always'},
    {'arg1': 3, 'arg2': 'A1', 'arg3': 'Z2', 'arg4': 'always'},
    {'arg1': 3, 'arg2': 'B1', 'arg3': 9999, 'arg4': 'always'},
    {'arg1': 3, 'arg2': 'B1', 'arg3': 'Z2', 'arg4': 'always'}
]

Example

>>> import ubelt as ub
>>> list(ub.named_product(a=[1, 2, 3]))
[{'a': 1}, {'a': 2}, {'a': 3}]
>>> # xdoctest: +IGNORE_WANT
>>> list(ub.named_product(a=[1, 2, 3], b=[4, 5]))
[{'a': 1, 'b': 4},
 {'a': 1, 'b': 5},
 {'a': 2, 'b': 4},
 {'a': 2, 'b': 5},
 {'a': 3, 'b': 4},
 {'a': 3, 'b': 5}]
ubelt.util_dict.varied_values(longform, min_variations=0, default=NoParam)[source]

Given a list of dictionaries, find the values that differ between them.

Parameters
  • longform (List[Dict[KT, VT]]) – This is longform data, as described in [SeabornLongform]. It is a list of dictionaries.

    Each item in the list - or row - is a dictionary and can be thought of as an observation. The keys in each dictionary are the columns. The values of the dictionary must be hashable. Lists will be converted into tuples.

  • min_variations (int, default=0) – “columns” with fewer than min_variations unique values are removed from the result.

  • default (VT | NoParamType) – if specified, unspecified columns are given this value. Defaults to NoParam.

Returns

a mapping from each “column” to the set of unique values it took over each “row”. If a column is not specified for each row, it is assumed to take a default value, if it is specified.

Return type

Dict[KT, List[VT]]

Raises

KeyError – If default is unspecified and all the rows do not contain the same columns.

References

SeabornLongform

https://seaborn.pydata.org/tutorial/data_structure.html#long-form-data

Example

>>> # An example use case is to determine what values of a
>>> # configuration dictionary were tried in a random search
>>> # over a parameter grid.
>>> import ubelt as ub
>>> longform = [
>>>     {'col1': 1, 'col2': 'foo', 'col3': None},
>>>     {'col1': 1, 'col2': 'foo', 'col3': None},
>>>     {'col1': 2, 'col2': 'bar', 'col3': None},
>>>     {'col1': 3, 'col2': 'bar', 'col3': None},
>>>     {'col1': 9, 'col2': 'bar', 'col3': None},
>>>     {'col1': 1, 'col2': 'bar', 'col3': None},
>>> ]
>>> varied = ub.varied_values(longform)
>>> print('varied = {}'.format(ub.repr2(varied, nl=1)))
varied = {
    'col1': {1, 2, 3, 9},
    'col2': {'bar', 'foo'},
    'col3': {None},
}

Example

>>> import ubelt as ub
>>> import random
>>> longform = [
>>>     {'col1': 1, 'col2': 'foo', 'col3': None},
>>>     {'col1': 1, 'col2': [1, 2], 'col3': None},
>>>     {'col1': 2, 'col2': 'bar', 'col3': None},
>>>     {'col1': 3, 'col2': 'bar', 'col3': None},
>>>     {'col1': 9, 'col2': 'bar', 'col3': None},
>>>     {'col1': 1, 'col2': 'bar', 'col3': None, 'extra_col': 3},
>>> ]
>>> # Operation fails without a default
>>> import pytest
>>> with pytest.raises(KeyError):
>>>     varied = ub.varied_values(longform)
>>> #
>>> # Operation works with a default
>>> varied = ub.varied_values(longform, default='<unset>')
>>> expected = {
>>>     'col1': {1, 2, 3, 9},
>>>     'col2': {'bar', 'foo', (1, 2)},
>>>     'col3': set([None]),
>>>     'extra_col': {'<unset>', 3},
>>> }
>>> print('varied = {!r}'.format(varied))
>>> assert varied == expected

Example

>>> # xdoctest: +REQUIRES(PY3)
>>> # Random numbers are different in Python2, so skip in that case
>>> import ubelt as ub
>>> import random
>>> num_cols = 11
>>> num_rows = 17
>>> rng = random.Random(0)
>>> # Generate a set of columns
>>> columns = sorted(ub.hash_data(i)[0:8] for i in range(num_cols))
>>> # Generate rows for each column
>>> longform = [
>>>     {key: ub.hash_data(key)[0:8] for key in columns}
>>>     for _ in range(num_rows)
>>> ]
>>> # Add in some varied values in random positions
>>> for row in longform:
>>>     if rng.random() > 0.5:
>>>         for key in sorted(row.keys()):
>>>             if rng.random() > 0.95:
>>>                 row[key] = 'special-' + str(rng.randint(1, 32))
>>> varied = ub.varied_values(longform, min_variations=1)
>>> print('varied = {}'.format(ub.repr2(varied, nl=1, sort=True)))
varied = {
    '095f3e44': {'8fb4d4c9', 'special-23'},
    '365d11a1': {'daa409da', 'special-31', 'special-32'},
    '5815087d': {'1b823610', 'special-3'},
    '7b54b668': {'349a782c', 'special-10'},
    'b8244d02': {'d57bca90', 'special-8'},
    'f27b5bf8': {'fa0f90d1', 'special-19'},
}
class ubelt.util_dict.SetDict[source]

Bases: dict

A dictionary subclass where all set operations are defined.

All of the set operations are defined in a key-wise fashion, that is it is like performing the operation on sets of keys.

Note

The SetDict class only defines key-wise set operations. Value-wise or item-wise operations are in general not hashable and therefore not supported. A heavier extension would be needed for that.

Example

>>> import ubelt as ub
>>> primes = ub.sdict({v: f'prime_{v}' for v in [2, 3, 5, 7, 11]})
>>> evens = ub.sdict({v: f'even_{v}' for v in [0, 2, 4, 6, 8, 10]})
>>> odds = ub.sdict({v: f'odd_{v}' for v in [1, 3, 5, 7, 9, 11]})
>>> squares = ub.sdict({v: f'square_{v}' for v in [0, 1, 4, 9]})
>>> div3 = ub.sdict({v: f'div3_{v}' for v in [0, 3, 6, 9]})
>>> # All of the set methods are defined
>>> results1 = {}
>>> results1['ints'] = ints = odds.union(evens)
>>> results1['composites'] = ints.difference(primes)
>>> results1['even_primes'] = evens.intersection(primes)
>>> results1['odd_nonprimes_and_two'] = odds.symmetric_difference(primes)
>>> print('results1 = {}'.format(ub.repr2(results1, nl=2, sort=True)))
results1 = {
    'composites': {
        0: 'even_0',
        1: 'odd_1',
        4: 'even_4',
        6: 'even_6',
        8: 'even_8',
        9: 'odd_9',
        10: 'even_10',
    },
    'even_primes': {
        2: 'even_2',
    },
    'ints': {
        0: 'even_0',
        1: 'odd_1',
        2: 'even_2',
        3: 'odd_3',
        4: 'even_4',
        5: 'odd_5',
        6: 'even_6',
        7: 'odd_7',
        8: 'even_8',
        9: 'odd_9',
        10: 'even_10',
        11: 'odd_11',
    },
    'odd_nonprimes_and_two': {
        1: 'odd_1',
        2: 'prime_2',
        9: 'odd_9',
    },
}
>>> # As well as their corresponding binary operators
>>> assert results1['ints'] == odds | evens
>>> assert results1['composites'] == ints - primes
>>> assert results1['even_primes'] == evens & primes
>>> assert results1['odd_nonprimes_and_two'] == odds ^ primes
>>> # These can also be used as classmethods
>>> assert results1['ints'] == ub.sdict.union(odds, evens)
>>> assert results1['composites'] == ub.sdict.difference(ints, primes)
>>> assert results1['even_primes'] == ub.sdict.intersection(evens, primes)
>>> assert results1['odd_nonprimes_and_two'] == ub.sdict.symmetric_difference(odds, primes)
>>> # The narry variants are also implemented
>>> results2 = {}
>>> results2['nary_union'] = ub.sdict.union(primes, div3, odds)
>>> results2['nary_difference'] = ub.sdict.difference(primes, div3, odds)
>>> results2['nary_intersection'] = ub.sdict.intersection(primes, div3, odds)
>>> # Note that the definition of symmetric difference might not be what you think in the nary case.
>>> results2['nary_symmetric_difference'] = ub.sdict.symmetric_difference(primes, div3, odds)
>>> print('results2 = {}'.format(ub.repr2(results2, nl=2, sort=True)))
results2 = {
    'nary_difference': {
        2: 'prime_2',
    },
    'nary_intersection': {
        3: 'prime_3',
    },
    'nary_symmetric_difference': {
        0: 'div3_0',
        1: 'odd_1',
        2: 'prime_2',
        3: 'odd_3',
        6: 'div3_6',
    },
    'nary_union': {
        0: 'div3_0',
        1: 'odd_1',
        2: 'prime_2',
        3: 'odd_3',
        5: 'odd_5',
        6: 'div3_6',
        7: 'odd_7',
        9: 'odd_9',
        11: 'odd_11',
    },
}

Example

>>> # A neat thing about our implementation is that often the right
>>> # hand side is not required to be a dictionary, just something
>>> # that can be cast to a set.
>>> import ubelt as ub
>>> primes = ub.sdict({2: 'a', 3: 'b', 5: 'c', 7: 'd', 11: 'e'})
>>> assert primes - {2, 3} == {5: 'c', 7: 'd', 11: 'e'}
>>> assert primes & {2, 3} == {2: 'a', 3: 'b'}
>>> # Union does need to have a second dictionary
>>> import pytest
>>> with pytest.raises(AttributeError):
>>>     primes | {2, 3}
copy()[source]

Example

>>> import ubelt as ub
>>> a = ub.sdict({1: 1, 2: 2, 3: 3})
>>> b = ub.udict({1: 1, 2: 2, 3: 3})
>>> c = a.copy()
>>> d = b.copy()
>>> assert c is not a
>>> assert d is not b
>>> assert d == b
>>> assert c == a
>>> list(map(type, [a, b, c, d]))
>>> assert isinstance(c, ub.sdict)
>>> assert isinstance(d, ub.udict)
union(*others, cls=None)[source]

Return the key-wise union of two or more dictionaries.

For items with intersecting keys, dictionaries towards the end of the sequence are given precedence.

Parameters
  • self (SetDict | dict) – if called as a static method this must be provided.

  • *others – other dictionary like objects that have an items method. (i.e. it must return an iterable of 2-tuples where the first item is hashable.)

  • cls (type) – the desired return dictionary type.

Returns

whatever the dictionary type of the first argument is

Return type

dict

Example

>>> import ubelt as ub
>>> a = ub.SetDict({k: 'A_' + chr(97 + k) for k in [2, 3, 5, 7]})
>>> b = ub.SetDict({k: 'B_' + chr(97 + k) for k in [2, 4, 0, 7]})
>>> c = ub.SetDict({k: 'C_' + chr(97 + k) for k in [2, 8, 3]})
>>> d = ub.SetDict({k: 'D_' + chr(97 + k) for k in [9, 10, 11]})
>>> e = ub.SetDict({k: 'E_' + chr(97 + k) for k in []})
>>> assert a | b == {2: 'B_c', 3: 'A_d', 5: 'A_f', 7: 'B_h', 4: 'B_e', 0: 'B_a'}
>>> a.union(b)
>>> a | b | c
>>> res = ub.SetDict.union(a, b, c, d, e)
>>> print(ub.repr2(res, sort=1, nl=0, si=1))
{0: B_a, 2: C_c, 3: C_d, 4: B_e, 5: A_f, 7: B_h, 8: C_i, 9: D_j, 10: D_k, 11: D_l}
intersection(*others, cls=None)[source]

Return the key-wise intersection of two or more dictionaries.

All items returned will be from the first dictionary for keys that exist in all other dictionaries / sets provided.

Parameters
  • self (SetDict | dict) – if called as a static method this must be provided.

  • *others – other dictionary or set like objects that can be coerced into a set of keys.

  • cls (type) – the desired return dictionary type.

Returns

whatever the dictionary type of the first argument is

Return type

dict

Example

>>> import ubelt as ub
>>> a = ub.SetDict({k: 'A_' + chr(97 + k) for k in [2, 3, 5, 7]})
>>> b = ub.SetDict({k: 'B_' + chr(97 + k) for k in [2, 4, 0, 7]})
>>> c = ub.SetDict({k: 'C_' + chr(97 + k) for k in [2, 8, 3]})
>>> d = ub.SetDict({k: 'D_' + chr(97 + k) for k in [9, 10, 11]})
>>> e = ub.SetDict({k: 'E_' + chr(97 + k) for k in []})
>>> assert a & b == {2: 'A_c', 7: 'A_h'}
>>> a.intersection(b)
>>> a & b & c
>>> res = ub.SetDict.intersection(a, b, c, d, e)
>>> print(ub.repr2(res, sort=1, nl=0, si=1))
{}
difference(*others, cls=None)[source]

Return the key-wise difference between this dictionary and one or more other dictionary / keys.

The returned items will be from the first dictionary, and will only contain keys that do not appear in any of the other dictionaries / sets.

Parameters
  • self (SetDict | dict) – if called as a static method this must be provided.

  • *others – other dictionary or set like objects that can be coerced into a set of keys.

  • cls (type) – the desired return dictionary type.

Returns

whatever the dictionary type of the first argument is

Return type

dict

Example

>>> import ubelt as ub
>>> a = ub.SetDict({k: 'A_' + chr(97 + k) for k in [2, 3, 5, 7]})
>>> b = ub.SetDict({k: 'B_' + chr(97 + k) for k in [2, 4, 0, 7]})
>>> c = ub.SetDict({k: 'C_' + chr(97 + k) for k in [2, 8, 3]})
>>> d = ub.SetDict({k: 'D_' + chr(97 + k) for k in [9, 10, 11]})
>>> e = ub.SetDict({k: 'E_' + chr(97 + k) for k in []})
>>> assert a - b == {3: 'A_d', 5: 'A_f'}
>>> a.difference(b)
>>> a - b - c
>>> res = ub.SetDict.difference(a, b, c, d, e)
>>> print(ub.repr2(res, sort=1, nl=0, si=1))
{5: A_f}
symmetric_difference(*others, cls=None)[source]

Return the key-wise symmetric difference between this dictionary and one or more other dictionaries.

Returns items that are (key-wise) in an odd number of the given dictionaries. This is consistent with the standard n-ary definition of symmetric difference [WikiSymDiff] and corresponds with the xor operation.

Parameters
  • self (SetDict | dict) – if called as a static method this must be provided.

  • *others – other dictionary or set like objects that can be coerced into a set of keys.

  • cls (type) – the desired return dictionary type.

Returns

whatever the dictionary type of the first argument is

Return type

dict

References

WikiSymDiff

https://en.wikipedia.org/wiki/Symmetric_difference

Example

>>> import ubelt as ub
>>> a = ub.SetDict({k: 'A_' + chr(97 + k) for k in [2, 3, 5, 7]})
>>> b = ub.SetDict({k: 'B_' + chr(97 + k) for k in [2, 4, 0, 7]})
>>> c = ub.SetDict({k: 'C_' + chr(97 + k) for k in [2, 8, 3]})
>>> d = ub.SetDict({k: 'D_' + chr(97 + k) for k in [9, 10, 11]})
>>> e = ub.SetDict({k: 'E_' + chr(97 + k) for k in []})
>>> a ^ b
{3: 'A_d', 5: 'A_f', 4: 'B_e', 0: 'B_a'}
>>> a.symmetric_difference(b)
>>> a - b - c
>>> res = ub.SetDict.symmetric_difference(a, b, c, d, e)
>>> print(ub.repr2(res, sort=1, nl=0, si=1))
{0: B_a, 2: C_c, 4: B_e, 5: A_f, 8: C_i, 9: D_j, 10: D_k, 11: D_l}
class ubelt.util_dict.UDict[source]

Bases: SetDict

A subclass of dict with ubelt enhancements

This builds on top of SetDict which itself is a simple extension that contains only that extra functionality. The extra invert, map, sorted, and peek functions are less fundamental and there are at least reasonable workarounds when they are not available.

The UDict class is a simple subclass of dict that provides the following upgrades:

  • set operations - inherited from SetDict
    • intersection - find items in common

    • union - merge dicts

    • difference - find items in one but not the other

    • symmetric_difference - find items that appear an odd number of times

  • subdict - take a subset with optional default values. (similar to intersection, but the later ignores non-common values)

  • inversion -
    • invert - swaps a dictionary keys and values (with options for dealing with duplicates).

  • mapping -
    • map_keys - applies a function over each key and keeps the values the same

    • map_values - applies a function over each key and keeps the values the same

  • sorting -
    • sorted_keys - returns a dictionary ordered by the keys

    • sorted_values - returns a dictionary ordered by the values

IMO key-wise set operations on dictionaries are fundamentaly and sorely missing from the stdlib, mapping is super convinient, sorting and inversion are less common, but still useful to have.

Todo

  • [ ] UbeltDict, UltraDict, not sure what the name is. We may just rename this to Dict,

Example

>>> import ubelt as ub
>>> a = ub.udict({1: 20, 2: 20, 3: 30, 4: 40})
>>> b = ub.udict({0: 0, 2: 20, 4: 42})
>>> c = ub.udict({3: -1, 5: -1})
>>> # Demo key-wise set operations
>>> assert a & b == {2: 20, 4: 40}
>>> assert a - b == {1: 20, 3: 30}
>>> assert a ^ b == {1: 20, 3: 30, 0: 0}
>>> assert a | b == {1: 20, 2: 20, 3: 30, 4: 42, 0: 0}
>>> # Demo new n-ary set methods
>>> a.union(b, c) == {1: 20, 2: 20, 3: -1, 4: 42, 0: 0, 5: -1}
>>> a.intersection(b, c) == {}
>>> a.difference(b, c) == {1: 20}
>>> a.symmetric_difference(b, c) == {1: 20, 0: 0, 5: -1}
>>> # Demo new quality of life methods
>>> assert a.subdict({2, 4, 6, 8}, default=None) == {8: None, 2: 20, 4: 40, 6: None}
>>> assert a.invert() == {20: 2, 30: 3, 40: 4}
>>> assert a.invert(unique_vals=0) == {20: {1, 2}, 30: {3}, 40: {4}}
>>> assert a.peek_key() == ub.peek(a.keys())
>>> assert a.peek_value() == ub.peek(a.values())
>>> assert a.map_keys(lambda x: x * 10) == {10: 20, 20: 20, 30: 30, 40: 40}
>>> assert a.map_values(lambda x: x * 10) == {1: 200, 2: 200, 3: 300, 4: 400}
subdict(keys, default=NoParam)[source]

Get a subset of a dictionary

Parameters
  • self (Dict[KT, VT]) – dictionary or the implicit instance

  • keys (Iterable[KT]) – keys to take from self

  • default (Optional[object] | NoParamType) – if specified uses default if keys are missing.

Raises

KeyError – if a key does not exist and default is not specified

SeeAlso:

ubelt.util_dict.dict_subset() ubelt.UDict.take()

Example

>>> import ubelt as ub
>>> a = ub.udict({k: 'A_' + chr(97 + k) for k in [2, 3, 5, 7]})
>>> s = a.subdict({2, 5})
>>> print('s = {}'.format(ub.repr2(s, nl=0, sort=1)))
s = {2: 'A_c', 5: 'A_f'}
>>> import pytest
>>> with pytest.raises(KeyError):
>>>     s = a.subdict({2, 5, 100})
>>> s = a.subdict({2, 5, 100}, default='DEF')
>>> print('s = {}'.format(ub.repr2(s, nl=0, sort=1)))
s = {2: 'A_c', 5: 'A_f', 100: 'DEF'}
take(keys, default=NoParam)[source]

Get values of an iterable of keys.

Parameters
  • self (Dict[KT, VT]) – dictionary or the implicit instance

  • keys (Iterable[KT]) – keys to take from self

  • default (Optional[object] | NoParamType) – if specified uses default if keys are missing.

Yields

VT – a selected value within the dictionary

Raises

KeyError – if a key does not exist and default is not specified

SeeAlso:

ubelt.util_list.take() ubelt.UDict.subdict()

Example

>>> import ubelt as ub
>>> a = ub.udict({k: 'A_' + chr(97 + k) for k in [2, 3, 5, 7]})
>>> s = list(a.take({2, 5}))
>>> print('s = {}'.format(ub.repr2(s, nl=0, sort=1)))
s = ['A_c', 'A_f']
>>> import pytest
>>> with pytest.raises(KeyError):
>>>     s = a.subdict({2, 5, 100})
>>> s = list(a.take({2, 5, 100}, default='DEF'))
>>> print('s = {}'.format(ub.repr2(s, nl=0, sort=1)))
s = ['A_c', 'A_f', 'DEF']
invert(unique_vals=True)[source]

Swaps the keys and values in a dictionary.

Parameters
  • self (Dict[KT, VT]) – dictionary or the implicit instance to invert

  • unique_vals (bool, default=True) – if False, the values of the new dictionary are sets of the original keys.

  • cls (type | None) – specifies the dict subclassof the result. if unspecified will be dict or OrderedDict. This behavior may change.

Returns

the inverted dictionary

Return type

Dict[VT, KT] | Dict[VT, Set[KT]]

Note

The must values be hashable.

If the original dictionary contains duplicate values, then only one of the corresponding keys will be returned and the others will be discarded. This can be prevented by setting unique_vals=False, causing the inverted keys to be returned in a set.

Example

>>> import ubelt as ub
>>> inverted = ub.udict({'a': 1, 'b': 2}).invert()
>>> assert inverted == {1: 'a', 2: 'b'}
map_keys(func)[source]

Apply a function to every value in a dictionary.

Creates a new dictionary with the same keys and modified values.

Parameters
  • self (Dict[KT, VT]) – a dictionary or the implicit instance.

  • func (Callable[[VT], T] | Mapping[VT, T]) – a function or indexable object

Returns

transformed dictionary

Return type

Dict[KT, T]

Example

>>> import ubelt as ub
>>> new = ub.udict({'a': [1, 2, 3], 'b': []}).map_keys(ord)
>>> assert new == {97: [1, 2, 3], 98: []}
map_values(func)[source]

Apply a function to every value in a dictionary.

Creates a new dictionary with the same keys and modified values.

Parameters
  • self (Dict[KT, VT]) – a dictionary or the implicit instance.

  • func (Callable[[VT], T] | Mapping[VT, T]) – a function or indexable object

Returns

transformed dictionary

Return type

Dict[KT, T]

Example

>>> import ubelt as ub
>>> newdict = ub.udict({'a': [1, 2, 3], 'b': []}).map_values(len)
>>> assert newdict ==  {'a': 3, 'b': 0}
sorted_keys(key=None, reverse=False)[source]

Return an ordered dictionary sorted by its keys

Parameters
  • self (Dict[KT, VT]) – dictionary to sort or the implicit instance. The keys must be of comparable types.

  • key (Callable[[KT], Any] | None) – If given as a callable, customizes the sorting by ordering using transformed keys.

  • reverse (bool, default=False) – if True returns in descending order

Returns

new dictionary where the keys are ordered

Return type

OrderedDict[KT, VT]

Example

>>> import ubelt as ub
>>> new = ub.udict({'spam': 2.62, 'eggs': 1.20, 'jam': 2.92}).sorted_keys()
>>> assert new == ub.odict([('eggs', 1.2), ('jam', 2.92), ('spam', 2.62)])
sorted_values(key=None, reverse=False)[source]

Return an ordered dictionary sorted by its values

Parameters
  • self (Dict[KT, VT]) – dictionary to sort or the implicit instance. The values must be of comparable types.

  • key (Callable[[VT], Any] | None) – If given as a callable, customizes the sorting by ordering using transformed values.

  • reverse (bool, default=False) – if True returns in descending order

Returns

new dictionary where the values are ordered

Return type

OrderedDict[KT, VT]

Example

>>> import ubelt as ub
>>> new = ub.udict({'spam': 2.62, 'eggs': 1.20, 'jam': 2.92}).sorted_values()
>>> assert new == ub.odict([('eggs', 1.2), ('spam', 2.62), ('jam', 2.92)])
peek_key(default=NoParam)[source]

Get the first key in the dictionary

Parameters
  • self (Dict) – a dictionary or the implicit instance

  • default (T | NoParamType) – default item to return if the iterable is empty, otherwise a StopIteration error is raised

Returns

the first value or the default

Return type

KT

Example

>>> import ubelt as ub
>>> assert ub.udict({1: 2}).peek_key() == 1
peek_value(default=NoParam)[source]

Get the first value in the dictionary

Parameters
  • self (Dict[KT, VT]) – a dictionary or the implicit instance

  • default (T | NoParamType) – default item to return if the iterable is empty, otherwise a StopIteration error is raised

Returns

the first value or the default

Return type

VT

Example

>>> import ubelt as ub
>>> assert ub.udict({1: 2}).peek_value() == 2
ubelt.util_dict.sdict

alias of SetDict

ubelt.util_dict.udict

alias of UDict