Select¶

dabstract.abstract.abstract.Select(data, selector: Union[List[int], Callable, numbers.Integral], eval_data: Any = None, lazy: bool = True, workers: int = 1, buffer_len: int = 3, *args: List, **kwargs: Dict) → Union[dabstract.abstract.abstract.SelectAbstract, dabstract.abstract.abstract.DataAbstract, numpy.ndarray, list]¶

Factory function to allow for choice between lazy and direct example selection.

For both an instance of SelectAbstract is created. Different from lazy selecting, is that with direct selecting all examples are immediately evaluated.

For more information on the functionality of Select please check the docstring of SelectAbstract().

Parameters

dataIterable: input data to perform selection on, if eval_data is None
selectorList[int] OR Callable OR numbers.Integral: selection criterium
eval_dataAny: if eval_data not None, then selection will be performed on eval_data, else data (default = None)
lazybool: apply lazily or not (default = True)
workersint: amount of workers used for loading the data (default = 1)
buffer_lenint: buffer_len of the pool (default = 3)
arg/kwargs:: additional param to provide to the function if needed

Returns

SelectAbstract OR DataAbstract OR np.ndarray OR list

class dabstract.abstract.abstract.SelectAbstract(data: Iterable, selector: Union[List[int], Callable, numbers.Integral], eval_data: Any = None, *args, **kwargs: Dict)¶

Bases: dabstract.abstract.abstract.Abstract

Select a subset of your input sequence.

Selection is based on a so called ‘selector’ which may have the form of a Callable or a list/np.ndarray of integers. Important for these Callables is that they accept two arguments: (1) data to base selection on and (2) index of the variable to be evaluated.

Regarding the selector one can use set of build-in selectors in dabstract.dataset.select, lambda function, an own custom function or indices. For example:

random subsampling with:

$  SelectAbstract(data, dabstract.dataset.select.random_subsample('ratio': 0.5))

select based on a key and a particular value:

$  SelectAbstract(data, dabstract.dataset.select.subsample_by_str('ratio': 0.5))

use the lambda function such as:

$  SelectAbstract(data, (lambda x,k: x['data']['subdb'][k]))

directly use indices:

$  indices = np.array[0,1,2,3,4])
$  SelectAbstract(data, indices)

If no ‘eval_data’ is used, the evaluation is performed on data available in ‘data’. If ‘eval_data’ is available the evaluation is performed on ‘eval_data’

The SelectAbstract contains the following methods:

.get - return entry from SelectAbstract
.keys - return the list of keys

The full explanation for each method is provided as a docstring at each method.

Parameters

dataIterable: input data to perform selection on, if eval_data is None
selectorList[int] OR Callable OR numbers.Integral: selection criterium
eval_dataAny: if eval_data not None, then selection will be performed on eval_data, else data (default = None)
kwargsDict: additional param to provide to the function if needed

Returns

SelectAbstract class

get(index: int, return_info: bool = False, *args: List, **kwargs: Dict) → Union[List, numpy.ndarray, Any]¶

Parameters

indexint: index to retrieve data from
return_infobool: return tuple (data, info) if True else data (default = False)
argList: additional param to provide to the function if needed
kwargsDict: additional param to provide to the function if needed

Returns

List OR np.ndarray OR Any

get_indices()¶

set_indices(selector, *args, **kwargs)¶