ubelt.util_list module¶
Utility functions for manipulating iterables, lists, and sequences.
The chunks()
function splits a list into smaller parts. There are different strategies for how to do this.
The flatten()
function take a list of lists and removees the inner lists. This
only removes one level of nesting.
The iterable()
function checks if an object is iterable or not. Similar to the
callable()
builtin function.
The argmax()
, argmin()
, and argsort()
work similarly to the
analogous numpy
functions, except they operate on dictionaries and other
Python builtin types.
The take()
and compress()
are generators, and also similar to their
lesser known, but very useful numpy equivalents.
There are also other numpy inspired functions: unique()
,
argunique()
, unique_flags()
, and boolmask()
.
- ubelt.util_list.allsame(iterable, eq=<built-in function eq>)[source]¶
Determine if all items in a sequence are the same
- Parameters:
iterable (Iterable[T]) – items to determine if they are all the same
eq (Callable[[T, T], bool], default=operator.eq) – function used to test for equality
- Returns:
True if all items are equal, otherwise False
- Return type:
Notes
Similar to
more_itertools.all_equal()
Example
>>> import ubelt as ub >>> ub.allsame([1, 1, 1, 1]) True >>> ub.allsame([]) True >>> ub.allsame([0, 1]) False >>> iterable = iter([0, 1, 1, 1]) >>> next(iterable) >>> ub.allsame(iterable) True >>> ub.allsame(range(10)) False >>> ub.allsame(range(10), lambda a, b: True) True
- ubelt.util_list.argmax(indexable, key=None)[source]¶
Returns index / key of the item with the largest value.
This is similar to
numpy.argmax()
, but it is written in pure python and works on both lists and dictionaries.- Parameters:
indexable (Iterable[VT] | Mapping[KT, VT]) – indexable to sort by
key (Callable[[VT], Any] | None, default=None) – customizes the ordering of the indexable
- Returns:
the index of the item with the maximum value.
- Return type:
int | KT
Example
>>> import ubelt as ub >>> assert ub.argmax({'a': 3, 'b': 2, 'c': 100}) == 'c' >>> assert ub.argmax(['a', 'c', 'b', 'z', 'f']) == 3 >>> assert ub.argmax([[0, 1], [2, 3, 4], [5]], key=len) == 1 >>> assert ub.argmax({'a': 3, 'b': 2, 3: 100, 4: 4}) == 3 >>> assert ub.argmax(iter(['a', 'c', 'b', 'z', 'f'])) == 3
- ubelt.util_list.argmin(indexable, key=None)[source]¶
Returns index / key of the item with the smallest value.
This is similar to
numpy.argmin()
, but it is written in pure python and works on both lists and dictionaries.- Parameters:
indexable (Iterable[VT] | Mapping[KT, VT]) – indexable to sort by
key (Callable[[VT], VT] | None, default=None) – customizes the ordering of the indexable
- Returns:
the index of the item with the minimum value.
- Return type:
int | KT
Example
>>> import ubelt as ub >>> assert ub.argmin({'a': 3, 'b': 2, 'c': 100}) == 'b' >>> assert ub.argmin(['a', 'c', 'b', 'z', 'f']) == 0 >>> assert ub.argmin([[0, 1], [2, 3, 4], [5]], key=len) == 2 >>> assert ub.argmin({'a': 3, 'b': 2, 3: 100, 4: 4}) == 'b' >>> assert ub.argmin(iter(['a', 'c', 'A', 'z', 'f'])) == 2
- ubelt.util_list.argsort(indexable, key=None, reverse=False)[source]¶
Returns the indices that would sort a indexable object.
This is similar to
numpy.argsort()
, but it is written in pure python and works on both lists and dictionaries.- Parameters:
indexable (Iterable[VT] | Mapping[KT, VT]) – indexable to sort by
key (Callable[[VT], VT] | None, default=None) – customizes the ordering of the indexable
reverse (bool, default=False) – if True returns in descending order
- Returns:
indices - list of indices that sorts the indexable
- Return type:
List[int] | List[KT]
Example
>>> import ubelt as ub >>> # argsort works on dicts by returning keys >>> dict_ = {'a': 3, 'b': 2, 'c': 100} >>> indices = ub.argsort(dict_) >>> assert list(ub.take(dict_, indices)) == sorted(dict_.values()) >>> # argsort works on lists by returning indices >>> indexable = [100, 2, 432, 10] >>> indices = ub.argsort(indexable) >>> assert list(ub.take(indexable, indices)) == sorted(indexable) >>> # Can use iterators, but be careful. It exhausts them. >>> indexable = reversed(range(100)) >>> indices = ub.argsort(indexable) >>> assert indices[0] == 99 >>> # Can use key just like sorted >>> indexable = [[0, 1, 2], [3, 4], [5]] >>> indices = ub.argsort(indexable, key=len) >>> assert indices == [2, 1, 0] >>> # Can use reverse just like sorted >>> indexable = [0, 2, 1] >>> indices = ub.argsort(indexable, reverse=True) >>> assert indices == [1, 2, 0]
- ubelt.util_list.argunique(items, key=None)[source]¶
Returns indices corresponding to the first instance of each unique item.
- Parameters:
items (Sequence[VT]) – indexable collection of items
key (Callable[[VT], Any] | None, default=None) – custom normalization function. If specified returns items where
key(item)
is unique.
- Returns:
indices of the unique items
- Return type:
Iterator[int]
Example
>>> import ubelt as ub >>> items = [0, 2, 5, 1, 1, 0, 2, 4] >>> indices = list(ub.argunique(items)) >>> assert indices == [0, 1, 2, 3, 7] >>> indices = list(ub.argunique(items, key=lambda x: x % 2 == 0)) >>> assert indices == [0, 2]
- ubelt.util_list.boolmask(indices, maxval=None)[source]¶
Constructs a list of booleans where an item is True if its position is in
indices
otherwise it is False.- Parameters:
indices (List[int]) – list of integer indices
maxval (int | None) – length of the returned list. If not specified this is inferred using
max(indices)
- Returns:
mask - a list of booleans. mask[idx] is True if idx in indices
- Return type:
List[bool]
Note
In the future the arg
maxval
may change its name toshape
Example
>>> import ubelt as ub >>> indices = [0, 1, 4] >>> mask = ub.boolmask(indices, maxval=6) >>> assert mask == [True, True, False, False, True, False] >>> mask = ub.boolmask(indices) >>> assert mask == [True, True, False, False, True]
- class ubelt.util_list.chunks(items, chunksize=None, nchunks=None, total=None, bordermode='none', legacy=False)[source]¶
Bases:
object
Generates successive n-sized chunks from
items
.If the last chunk has less than n elements,
bordermode
is used to determine fill values.Note
- FIXME:
When nchunks is given, that’s how many chunks we should get but the issue is that chunksize is not well defined in that instance For instance how do we turn a list with 4 elements into 3 chunks where does the extra item go?
In ubelt <= 0.10.3 there is a bug when specifying nchunks, where it chooses a chunksize that is too large. Specify
legacy=True
to get the old buggy behavior if needed.Notes
- This is similar to functionality provided by
more_itertools.chunked()
,more_itertools.chunked_even()
,more_itertools.sliced()
,more_itertools.divide()
,
- Yields:
List[T] – subsequent non-overlapping chunks of the input items
References
Example
>>> import ubelt as ub >>> items = '1234567' >>> genresult = ub.chunks(items, chunksize=3) >>> list(genresult) [['1', '2', '3'], ['4', '5', '6'], ['7']]
Example
>>> import ubelt as ub >>> items = [1, 2, 3, 4, 5, 6, 7] >>> genresult = ub.chunks(items, chunksize=3, bordermode='none') >>> assert list(genresult) == [[1, 2, 3], [4, 5, 6], [7]] >>> genresult = ub.chunks(items, chunksize=3, bordermode='cycle') >>> assert list(genresult) == [[1, 2, 3], [4, 5, 6], [7, 1, 2]] >>> genresult = ub.chunks(items, chunksize=3, bordermode='replicate') >>> assert list(genresult) == [[1, 2, 3], [4, 5, 6], [7, 7, 7]]
Example
>>> import ubelt as ub >>> assert len(list(ub.chunks(range(2), nchunks=2))) == 2 >>> assert len(list(ub.chunks(range(3), nchunks=2))) == 2 >>> # Note: ub.chunks will not do the 2,1,1 split >>> assert len(list(ub.chunks(range(4), nchunks=3))) == 3 >>> assert len(list(ub.chunks([], 2, bordermode='none'))) == 0 >>> assert len(list(ub.chunks([], 2, bordermode='cycle'))) == 0 >>> assert len(list(ub.chunks([], 2, None, bordermode='replicate'))) == 0
Example
>>> from ubelt.util_list import * # NOQA >>> def _check_len(self): ... assert len(self) == len(list(self)) >>> _check_len(chunks(list(range(3)), nchunks=2)) >>> _check_len(chunks(list(range(2)), nchunks=2)) >>> _check_len(chunks(list(range(2)), nchunks=3))
Example
>>> from ubelt.util_list import * # NOQA >>> import pytest >>> assert pytest.raises(ValueError, chunks, range(9)) >>> assert pytest.raises(ValueError, chunks, range(9), chunksize=2, nchunks=2) >>> assert pytest.raises(TypeError, len, chunks((_ for _ in range(2)), 2))
Example
>>> from ubelt.util_list import * # NOQA >>> import ubelt as ub >>> basis = { >>> 'legacy': [False, True], >>> 'chunker': [{'nchunks': 3}, {'nchunks': 4}, {'nchunks': 5}, {'nchunks': 7}, {'chunksize': 3}], >>> 'items': [range(2), range(4), range(5), range(7), range(9)], >>> 'bordermode': ['none', 'cycle', 'replicate'], >>> } >>> grid_items = list(ub.named_product(basis)) >>> rows = [] >>> for grid_item in ub.ProgIter(grid_items): >>> chunker = grid_item.get('chunker') >>> grid_item.update(chunker) >>> kw = ub.dict_diff(grid_item, {'chunker'}) >>> self = chunk_iter = ub.chunks(**kw) >>> chunked = list(chunk_iter) >>> chunk_lens = list(map(len, chunked)) >>> row = ub.dict_union(grid_item, {'chunk_lens': chunk_lens, 'chunks': chunked}) >>> row['chunker'] = str(row['chunker']) >>> if not row['legacy'] and 'nchunks' in kw: >>> assert kw['nchunks'] == row['nchunks'] >>> row.update(chunk_iter.__dict__) >>> rows.append(row) >>> # xdoctest: +SKIP >>> import pandas as pd >>> df = pd.DataFrame(rows) >>> for _, subdf in df.groupby('chunker'): >>> print(subdf)
- Parameters:
items (Iterable) – input to iterate over
chunksize (int | None) – size of each sublist yielded
nchunks (int | None) – number of chunks to create ( cannot be specified if chunksize is specified)
bordermode (str) – determines how to handle the last case if the length of the input is not divisible by chunksize valid values are: {‘none’, ‘cycle’, ‘replicate’}
total (int | None) – hints about the length of the input
legacy (bool) – if True use old behavior, defaults to False. This will be removed in the future.
- ubelt.util_list.compress(items, flags)[source]¶
Selects from
items
where the corresponding value inflags
is True.- Parameters:
items (Iterable[Any]) – a sequence to select items from
flags (Iterable[bool]) – corresponding sequence of bools
- Returns:
a subset of masked items
- Return type:
Iterable[Any]
Notes
This function is based on
numpy.compress()
, but is pure Python and swaps the condition and array argument to be consistent withubelt.take()
.This is equivalent to
itertools.compress()
.Example
>>> import ubelt as ub >>> items = [1, 2, 3, 4, 5] >>> flags = [False, True, True, False, True] >>> list(ub.compress(items, flags)) [2, 3, 5]
- ubelt.util_list.flatten(nested)[source]¶
Transforms a nested iterable into a flat iterable.
- Parameters:
nested (Iterable[Iterable[Any]]) – list of lists
- Returns:
flattened items
- Return type:
Iterable[Any]
Notes
Equivalent to
more_itertools.flatten()
anditertools.chain.from_iterable()
.Example
>>> import ubelt as ub >>> nested = [['a', 'b'], ['c', 'd']] >>> list(ub.flatten(nested)) ['a', 'b', 'c', 'd']
- ubelt.util_list.iter_window(iterable, size=2, step=1, wrap=False)[source]¶
Iterates through iterable with a window size. This is essentially a 1D sliding window.
- Parameters:
iterable (Iterable[T]) – an iterable sequence
size (int, default=2) – sliding window size
step (int, default=1) – sliding step size
wrap (bool, default=False) – wraparound flag
- Returns:
returns a possibly overlapping windows in a sequence
- Return type:
Iterable[T]
Notes
Similar to
more_itertools.windowed()
, Similar tomore_itertools.pairwise()
, Similar tomore_itertools.triplewise()
, Similar tomore_itertools.sliding_window()
Example
>>> import ubelt as ub >>> iterable = [1, 2, 3, 4, 5, 6] >>> size, step, wrap = 3, 1, True >>> window_iter = ub.iter_window(iterable, size, step, wrap) >>> window_list = list(window_iter) >>> print('window_list = %r' % (window_list,)) window_list = [(1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 1), (6, 1, 2)]
Example
>>> import ubelt as ub >>> iterable = [1, 2, 3, 4, 5, 6] >>> size, step, wrap = 3, 2, True >>> window_iter = ub.iter_window(iterable, size, step, wrap) >>> window_list = list(window_iter) >>> print('window_list = {!r}'.format(window_list)) window_list = [(1, 2, 3), (3, 4, 5), (5, 6, 1)]
Example
>>> import ubelt as ub >>> iterable = [1, 2, 3, 4, 5, 6] >>> size, step, wrap = 3, 2, False >>> window_iter = ub.iter_window(iterable, size, step, wrap) >>> window_list = list(window_iter) >>> print('window_list = {!r}'.format(window_list)) window_list = [(1, 2, 3), (3, 4, 5)]
Example
>>> import ubelt as ub >>> iterable = [] >>> size, step, wrap = 3, 2, False >>> window_iter = ub.iter_window(iterable, size, step, wrap) >>> window_list = list(window_iter) >>> print('window_list = {!r}'.format(window_list)) window_list = []
- ubelt.util_list.iterable(obj, strok=False)[source]¶
Checks if the input implements the iterator interface. An exception is made for strings, which return False unless
strok
is True- Parameters:
obj (object) – a scalar or iterable input
strok (bool, default=False) – if True allow strings to be interpreted as iterable
- Returns:
True if the input is iterable
- Return type:
Example
>>> import ubelt as ub >>> obj_list = [3, [3], '3', (3,), [3, 4, 5], {}] >>> result = [ub.iterable(obj) for obj in obj_list] >>> assert result == [False, True, False, True, True, True] >>> result = [ub.iterable(obj, strok=True) for obj in obj_list] >>> assert result == [False, True, True, True, True, True]
- ubelt.util_list.peek(iterable, default=NoParam)[source]¶
Look at the first item of an iterable. If the input is an iterator, then the next element is exhausted (i.e. a pop operation).
- Parameters:
iterable (Iterable[T]) – an iterable
default (T) – default item to return if the iterable is empty, otherwise a StopIteration error is raised
- Returns:
- item - the first item of ordered sequence, a popped item from an
iterator, or an arbitrary item from an unordered collection.
- Return type:
T
Notes
Similar to
more_itertools.peekable()
Example
>>> import ubelt as ub >>> data = [0, 1, 2] >>> ub.peek(data) 0 >>> iterator = iter(data) >>> print(ub.peek(iterator)) 0 >>> print(ub.peek(iterator)) 1 >>> print(ub.peek(iterator)) 2 >>> ub.peek(range(3)) 0 >>> ub.peek([], 3) 3
- ubelt.util_list.take(items, indices, default=NoParam)[source]¶
Lookup a subset of an indexable object using a sequence of indices.
The
items
input is usually a list or dictionary. Whenitems
is a list, this should be a sequence of integers. Whenitems
is a dict, this is a list of keys to lookup in that dictionary.For dictionaries, a default may be specified as a placeholder to use if a key from
indices
is not initems
.- Parameters:
items (Sequence[VT] | Mapping[KT, VT]) – An indexable object to select items from.
indices (Iterable[int | KT]) – A sequence of indexes into
items
.default (Any, default=NoParam) – if specified
items
must support theget
method.
- Yields:
VT – a selected item within the list
- SeeAlso:
Note
ub.take(items, indices)
is equivalent to(items[i] for i in indices)
whendefault
is unspecified.Notes
This is based on the
numpy.take()
function, but written in pure python.Do not confuse this with
more_itertools.take()
, the behavior is very different.Example
>>> import ubelt as ub >>> items = [0, 1, 2, 3] >>> indices = [2, 0] >>> list(ub.take(items, indices)) [2, 0]
Example
>>> import ubelt as ub >>> dict_ = {1: 'a', 2: 'b', 3: 'c'} >>> keys = [1, 2, 3, 4, 5] >>> result = list(ub.take(dict_, keys, None)) >>> assert result == ['a', 'b', 'c', None, None]
Example
>>> import ubelt as ub >>> dict_ = {1: 'a', 2: 'b', 3: 'c'} >>> keys = [1, 2, 3, 4, 5] >>> try: >>> print(list(ub.take(dict_, keys))) >>> raise AssertionError('did not get key error') >>> except KeyError: >>> print('correctly got key error')
- ubelt.util_list.unique(items, key=None)[source]¶
Generates unique items in the order they appear.
- Parameters:
items (Iterable[T]) – list of items
key (Callable[[T], Any] | None, default=None) – custom normalization function. If specified returns items where
key(item)
is unique.
- Yields:
T – a unique item from the input sequence
Notes
Functionally equivalent to
more_itertools.unique_everseen()
.Example
>>> import ubelt as ub >>> items = [4, 6, 6, 0, 6, 1, 0, 2, 2, 1] >>> unique_items = list(ub.unique(items)) >>> assert unique_items == [4, 6, 0, 1, 2]
Example
>>> import ubelt as ub >>> items = ['A', 'a', 'b', 'B', 'C', 'c', 'D', 'e', 'D', 'E'] >>> unique_items = list(ub.unique(items, key=str.lower)) >>> assert unique_items == ['A', 'b', 'C', 'D', 'e'] >>> unique_items = list(ub.unique(items)) >>> assert unique_items == ['A', 'a', 'b', 'B', 'C', 'c', 'D', 'e', 'E']
- ubelt.util_list.unique_flags(items, key=None)[source]¶
Returns a list of booleans corresponding to the first instance of each unique item.
- Parameters:
items (Sequence[VT]) – indexable collection of items
key (Callable[[VT], Any] | None, default=None) – custom normalization function. If specified returns items where
key(item)
is unique.
- Returns:
flags the items that are unique
- Return type:
List[bool]
Example
>>> import ubelt as ub >>> items = [0, 2, 1, 1, 0, 9, 2] >>> flags = ub.unique_flags(items) >>> assert flags == [True, True, True, False, False, True, False] >>> flags = ub.unique_flags(items, key=lambda x: x % 2 == 0) >>> assert flags == [True, False, True, False, False, False, False]