ubelt.util_list module

Utility functions for manipulating iterables, lists, and sequences.

The chunks() function splits a list into smaller parts. There are different strategies for how to do this.

The flatten() function take a list of lists and removees the inner lists. This only removes one level of nesting.

The iterable() function checks if an object is iterable or not. Similar to the callable() builtin function.

The argmax(), argmin(), and argsort() work similarly to the analogous numpy functions, except they operate on dictionaries and other Python builtin types.

The take() and compress() are generators, and also similar to their lesser known, but very useful numpy equivalents.

There are also other numpy inspired functions: unique(), argunique(), unique_flags(), and boolmask().

ubelt.util_list.allsame(iterable, eq=<built-in function eq>)[source]

Determine if all items in a sequence are the same

Parameters
  • iterable (Iterable[T]) – items to determine if they are all the same

  • eq (Callable[[T, T], bool], default=operator.eq) – function used to test for equality

Returns

True if all items are equal, otherwise False

Return type

bool

Example

>>> import ubelt as ub
>>> ub.allsame([1, 1, 1, 1])
True
>>> ub.allsame([])
True
>>> ub.allsame([0, 1])
False
>>> iterable = iter([0, 1, 1, 1])
>>> next(iterable)
>>> ub.allsame(iterable)
True
>>> ub.allsame(range(10))
False
>>> ub.allsame(range(10), lambda a, b: True)
True
ubelt.util_list.argmax(indexable, key=None)[source]

Returns index / key of the item with the largest value.

This is similar to numpy.argmax(), but it is written in pure python and works on both lists and dictionaries.

Parameters
  • indexable (Iterable[VT] | Mapping[KT, VT]) – indexable to sort by

  • key (Callable[[VT], Any], default=None) – customizes the ordering of the indexable

Returns

the index of the item with the maximum value.

Return type

int | KT

Example

>>> import ubelt as ub
>>> assert ub.argmax({'a': 3, 'b': 2, 'c': 100}) == 'c'
>>> assert ub.argmax(['a', 'c', 'b', 'z', 'f']) == 3
>>> assert ub.argmax([[0, 1], [2, 3, 4], [5]], key=len) == 1
>>> assert ub.argmax({'a': 3, 'b': 2, 3: 100, 4: 4}) == 3
>>> assert ub.argmax(iter(['a', 'c', 'b', 'z', 'f'])) == 3
ubelt.util_list.argmin(indexable, key=None)[source]

Returns index / key of the item with the smallest value.

This is similar to numpy.argmin(), but it is written in pure python and works on both lists and dictionaries.

Parameters
  • indexable (Iterable[VT] | Mapping[KT, VT]) – indexable to sort by

  • key (Callable[[VT], VT], default=None) – customizes the ordering of the indexable

Returns

the index of the item with the minimum value.

Return type

int | KT

Example

>>> import ubelt as ub
>>> assert ub.argmin({'a': 3, 'b': 2, 'c': 100}) == 'b'
>>> assert ub.argmin(['a', 'c', 'b', 'z', 'f']) == 0
>>> assert ub.argmin([[0, 1], [2, 3, 4], [5]], key=len) == 2
>>> assert ub.argmin({'a': 3, 'b': 2, 3: 100, 4: 4}) == 'b'
>>> assert ub.argmin(iter(['a', 'c', 'A', 'z', 'f'])) == 2
ubelt.util_list.argsort(indexable, key=None, reverse=False)[source]

Returns the indices that would sort a indexable object.

This is similar to numpy.argsort(), but it is written in pure python and works on both lists and dictionaries.

Parameters
  • indexable (Iterable[VT] | Mapping[KT, VT]) – indexable to sort by

  • key (Callable[[VT], VT] | None, default=None) – customizes the ordering of the indexable

  • reverse (bool, default=False) – if True returns in descending order

Returns

indices - list of indices that sorts the indexable

Return type

List[int] | List[KT]

Example

>>> import ubelt as ub
>>> # argsort works on dicts by returning keys
>>> dict_ = {'a': 3, 'b': 2, 'c': 100}
>>> indices = ub.argsort(dict_)
>>> assert list(ub.take(dict_, indices)) == sorted(dict_.values())
>>> # argsort works on lists by returning indices
>>> indexable = [100, 2, 432, 10]
>>> indices = ub.argsort(indexable)
>>> assert list(ub.take(indexable, indices)) == sorted(indexable)
>>> # Can use iterators, but be careful. It exhausts them.
>>> indexable = reversed(range(100))
>>> indices = ub.argsort(indexable)
>>> assert indices[0] == 99
>>> # Can use key just like sorted
>>> indexable = [[0, 1, 2], [3, 4], [5]]
>>> indices = ub.argsort(indexable, key=len)
>>> assert indices == [2, 1, 0]
>>> # Can use reverse just like sorted
>>> indexable = [0, 2, 1]
>>> indices = ub.argsort(indexable, reverse=True)
>>> assert indices == [1, 2, 0]
ubelt.util_list.argunique(items, key=None)[source]

Returns indices corresponding to the first instance of each unique item.

Parameters
  • items (Sequence[VT]) – indexable collection of items

  • key (Callable[[VT], Any], default=None) – custom normalization function. If specified returns items where key(item) is unique.

Returns

indices of the unique items

Return type

Iterator[int]

Example

>>> import ubelt as ub
>>> items = [0, 2, 5, 1, 1, 0, 2, 4]
>>> indices = list(ub.argunique(items))
>>> assert indices == [0, 1, 2, 3, 7]
>>> indices = list(ub.argunique(items, key=lambda x: x % 2 == 0))
>>> assert indices == [0, 2]
ubelt.util_list.boolmask(indices, maxval=None)[source]

Constructs a list of booleans where an item is True if its position is in indices otherwise it is False.

Parameters
  • indices (List[int]) – list of integer indices

  • maxval (int) – length of the returned list. If not specified this is inferred using max(indices)

Returns

mask - a list of booleans. mask[idx] is True if idx in indices

Return type

List[bool]

Note

In the future the arg maxval may change its name to shape

Example

>>> import ubelt as ub
>>> indices = [0, 1, 4]
>>> mask = ub.boolmask(indices, maxval=6)
>>> assert mask == [True, True, False, False, True, False]
>>> mask = ub.boolmask(indices)
>>> assert mask == [True, True, False, False, True]
class ubelt.util_list.chunks(items, chunksize=None, nchunks=None, total=None, bordermode='none', legacy=False)[source]

Bases: object

Generates successive n-sized chunks from items.

If the last chunk has less than n elements, bordermode is used to determine fill values.

Parameters
  • items (Iterable[T]) – input to iterate over

  • chunksize (int) – size of each sublist yielded

  • nchunks (int) – number of chunks to create ( cannot be specified if chunksize is specified)

  • bordermode (str) – determines how to handle the last case if the length of the input is not divisible by chunksize valid values are: {‘none’, ‘cycle’, ‘replicate’}

  • total (int) – hints about the length of the input

Note

FIXME:

When nchunks is given, thats how many chunks we should get but the issue is that chunksize is not well defined in that instance For instance how do we turn a list with 4 elements into 3 chunks where does the extra item go?

In ubelt <= 0.10.3 there is a bug when specifying nchunks, where it chooses a chunksize that is too large. Specify legacy=True to get the old buggy behavior if needed.

Yields

List[T] – subsequent non-overlapping chunks of the input items

References

http://stackoverflow.com/questions/434287/iterate-over-a-list-in-chunks

Example

>>> import ubelt as ub
>>> items = '1234567'
>>> genresult = ub.chunks(items, chunksize=3)
>>> list(genresult)
[['1', '2', '3'], ['4', '5', '6'], ['7']]

Example

>>> import ubelt as ub
>>> items = [1, 2, 3, 4, 5, 6, 7]
>>> genresult = ub.chunks(items, chunksize=3, bordermode='none')
>>> assert list(genresult) == [[1, 2, 3], [4, 5, 6], [7]]
>>> genresult = ub.chunks(items, chunksize=3, bordermode='cycle')
>>> assert list(genresult) == [[1, 2, 3], [4, 5, 6], [7, 1, 2]]
>>> genresult = ub.chunks(items, chunksize=3, bordermode='replicate')
>>> assert list(genresult) == [[1, 2, 3], [4, 5, 6], [7, 7, 7]]

Example

>>> import ubelt as ub
>>> assert len(list(ub.chunks(range(2), nchunks=2))) == 2
>>> assert len(list(ub.chunks(range(3), nchunks=2))) == 2
>>> # Note: ub.chunks will not do the 2,1,1 split
>>> assert len(list(ub.chunks(range(4), nchunks=3))) == 3
>>> assert len(list(ub.chunks([], 2, bordermode='none'))) == 0
>>> assert len(list(ub.chunks([], 2, bordermode='cycle'))) == 0
>>> assert len(list(ub.chunks([], 2, None, bordermode='replicate'))) == 0

Example

>>> from ubelt.util_list import *  # NOQA
>>> def _check_len(self):
...     assert len(self) == len(list(self))
>>> _check_len(chunks(list(range(3)), nchunks=2))
>>> _check_len(chunks(list(range(2)), nchunks=2))
>>> _check_len(chunks(list(range(2)), nchunks=3))

Example

>>> from ubelt.util_list import *  # NOQA
>>> import pytest
>>> assert pytest.raises(ValueError, chunks, range(9))
>>> assert pytest.raises(ValueError, chunks, range(9), chunksize=2, nchunks=2)
>>> assert pytest.raises(TypeError, len, chunks((_ for _ in range(2)), 2))

Example

>>> from ubelt.util_list import *  # NOQA
>>> import ubelt as ub
>>> basis = {
>>>     'legacy': [False, True],
>>>     'chunker': [{'nchunks': 3}, {'nchunks': 4}, {'nchunks': 5}, {'nchunks': 7}, {'chunksize': 3}],
>>>     'items': [range(2), range(4), range(5), range(7), range(9)],
>>>     'bordermode': ['none', 'cycle', 'replicate'],
>>> }
>>> grid_items = list(ub.named_product(basis))
>>> rows = []
>>> for grid_item in ub.ProgIter(grid_items):
>>>     chunker = grid_item.get('chunker')
>>>     grid_item.update(chunker)
>>>     kw = ub.dict_diff(grid_item, {'chunker'})
>>>     self = chunk_iter = ub.chunks(**kw)
>>>     chunked = list(chunk_iter)
>>>     chunk_lens = list(map(len, chunked))
>>>     row = ub.dict_union(grid_item, {'chunk_lens': chunk_lens, 'chunks': chunked})
>>>     row['chunker'] = str(row['chunker'])
>>>     if not row['legacy'] and 'nchunks' in kw:
>>>         assert kw['nchunks'] == row['nchunks']
>>>     row.update(chunk_iter.__dict__)
>>>     rows.append(row)
>>> # xdoctest: +REQUIRES(module:pandas)
>>> import pandas as pd
>>> df = pd.DataFrame(rows)
>>> for _, subdf in df.groupby('chunker'):
>>>     print(subdf)
static noborder(items, chunksize)[source]
static cycle(items, chunksize)[source]
static replicate(items, chunksize)[source]
ubelt.util_list.compress(items, flags)[source]

Selects from items where the corresponding value in flags is True. This is similar to numpy.compress().

This is actually a simple alias for itertools.compress().

Parameters
  • items (Iterable[Any]) – a sequence to select items from

  • flags (Iterable[bool]) – corresponding sequence of bools

Returns

a subset of masked items

Return type

Iterable[Any]

Example

>>> import ubelt as ub
>>> items = [1, 2, 3, 4, 5]
>>> flags = [False, True, True, False, True]
>>> list(ub.compress(items, flags))
[2, 3, 5]
ubelt.util_list.flatten(nested)[source]

Transforms a nested iterable into a flat iterable.

This is simply an alias for itertools.chain.from_iterable().

Parameters

nested (Iterable[Iterable[Any]]) – list of lists

Returns

flattened items

Return type

Iterable[Any]

Example

>>> import ubelt as ub
>>> nested = [['a', 'b'], ['c', 'd']]
>>> list(ub.flatten(nested))
['a', 'b', 'c', 'd']
ubelt.util_list.iter_window(iterable, size=2, step=1, wrap=False)[source]

Iterates through iterable with a window size. This is essentially a 1D sliding window.

Parameters
  • iterable (Iterable[T]) – an iterable sequence

  • size (int, default=2) – sliding window size

  • step (int, default=1) – sliding step size

  • wrap (bool, default=False) – wraparound flag

Returns

returns a possibly overlapping windows in a sequence

Return type

Iterable[T]

Example

>>> import ubelt as ub
>>> iterable = [1, 2, 3, 4, 5, 6]
>>> size, step, wrap = 3, 1, True
>>> window_iter = ub.iter_window(iterable, size, step, wrap)
>>> window_list = list(window_iter)
>>> print('window_list = %r' % (window_list,))
window_list = [(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 1), (6, 1, 2)]

Example

>>> import ubelt as ub
>>> iterable = [1, 2, 3, 4, 5, 6]
>>> size, step, wrap = 3, 2, True
>>> window_iter = ub.iter_window(iterable, size, step, wrap)
>>> window_list = list(window_iter)
>>> print('window_list = {!r}'.format(window_list))
window_list = [(1, 2, 3), (3, 4, 5), (5, 6, 1)]

Example

>>> import ubelt as ub
>>> iterable = [1, 2, 3, 4, 5, 6]
>>> size, step, wrap = 3, 2, False
>>> window_iter = ub.iter_window(iterable, size, step, wrap)
>>> window_list = list(window_iter)
>>> print('window_list = {!r}'.format(window_list))
window_list = [(1, 2, 3), (3, 4, 5)]

Example

>>> import ubelt as ub
>>> iterable = []
>>> size, step, wrap = 3, 2, False
>>> window_iter = ub.iter_window(iterable, size, step, wrap)
>>> window_list = list(window_iter)
>>> print('window_list = {!r}'.format(window_list))
window_list = []
ubelt.util_list.iterable(obj, strok=False)[source]

Checks if the input implements the iterator interface. An exception is made for strings, which return False unless strok is True

Parameters
  • obj (object) – a scalar or iterable input

  • strok (bool, default=False) – if True allow strings to be interpreted as iterable

Returns

True if the input is iterable

Return type

bool

Example

>>> import ubelt as ub
>>> obj_list = [3, [3], '3', (3,), [3, 4, 5], {}]
>>> result = [ub.iterable(obj) for obj in obj_list]
>>> assert result == [False, True, False, True, True, True]
>>> result = [ub.iterable(obj, strok=True) for obj in obj_list]
>>> assert result == [False, True, True, True, True, True]
ubelt.util_list.peek(iterable, default=NoParam)[source]

Look at the first item of an iterable. If the input is an iterator, then the next element is exhausted (i.e. a pop operation).

Parameters
  • iterable (Iterable[T]) – an iterable

  • default (T) – default item to return if the iterable is empty, otherwise a StopIteration error is raised

Returns

item - the first item of ordered sequence, a popped item from an

iterator, or an arbitrary item from an unordered collection.

Return type

T

Example

>>> import ubelt as ub
>>> data = [0, 1, 2]
>>> ub.peek(data)
0
>>> iterator = iter(data)
>>> print(ub.peek(iterator))
0
>>> print(ub.peek(iterator))
1
>>> print(ub.peek(iterator))
2
>>> ub.peek(range(3))
0
>>> ub.peek([], 3)
3
ubelt.util_list.take(items, indices, default=NoParam)[source]

Selects a subset of a list based on a list of indices.

This is similar to np.take, but pure python. This also supports specifying a default element if items is an iterable of dictionaries.

Parameters
  • items (Sequence[VT] | Mapping[KT, VT]) – An indexable object to select items from

  • indices (Iterable[int | KT]) – sequence of indexes into items

  • default (Any, default=NoParam) – if specified items must support the get method.

Yields

VT – a selected item within the list

SeeAlso:

ubelt.dict_subset()

Note

ub.take(items, indices) is equivalent to (items[i] for i in indices) when default is unspecified.

Example

>>> import ubelt as ub
>>> items = [0, 1, 2, 3]
>>> indices = [2, 0]
>>> list(ub.take(items, indices))
[2, 0]

Example

>>> import ubelt as ub
>>> dict_ = {1: 'a', 2: 'b', 3: 'c'}
>>> keys = [1, 2, 3, 4, 5]
>>> result = list(ub.take(dict_, keys, None))
>>> assert result == ['a', 'b', 'c', None, None]

Example

>>> import ubelt as ub
>>> dict_ = {1: 'a', 2: 'b', 3: 'c'}
>>> keys = [1, 2, 3, 4, 5]
>>> try:
>>>     print(list(ub.take(dict_, keys)))
>>>     raise AssertionError('did not get key error')
>>> except KeyError:
>>>     print('correctly got key error')
ubelt.util_list.unique(items, key=None)[source]

Generates unique items in the order they appear.

Parameters
  • items (Iterable[T]) – list of items

  • key (Callable[[T], Any], default=None) – custom normalization function. If specified returns items where key(item) is unique.

Yields

T – a unique item from the input sequence

Example

>>> import ubelt as ub
>>> items = [4, 6, 6, 0, 6, 1, 0, 2, 2, 1]
>>> unique_items = list(ub.unique(items))
>>> assert unique_items == [4, 6, 0, 1, 2]

Example

>>> import ubelt as ub
>>> import six
>>> items = ['A', 'a', 'b', 'B', 'C', 'c', 'D', 'e', 'D', 'E']
>>> unique_items = list(ub.unique(items, key=six.text_type.lower))
>>> assert unique_items == ['A', 'b', 'C', 'D', 'e']
>>> unique_items = list(ub.unique(items))
>>> assert unique_items == ['A', 'a', 'b', 'B', 'C', 'c', 'D', 'e', 'E']
ubelt.util_list.unique_flags(items, key=None)[source]

Returns a list of booleans corresponding to the first instance of each unique item.

Parameters
  • items (Sequence[VT]) – indexable collection of items

  • key (Callable[[VT], Any] | None, default=None) – custom normalization function. If specified returns items where key(item) is unique.

Returns

flags the items that are unique

Return type

List[bool]

Example

>>> import ubelt as ub
>>> items = [0, 2, 1, 1, 0, 9, 2]
>>> flags = ub.unique_flags(items)
>>> assert flags == [True, True, True, False, False, True, False]
>>> flags = ub.unique_flags(items, key=lambda x: x % 2 == 0)
>>> assert flags == [True, False, True, False, False, False, False]