Select

dabstract.abstract.abstract.Select(data, selector: Union[List[int], Callable, numbers.Integral], eval_data: Any = None, lazy: bool = True, workers: int = 1, buffer_len: int = 3, *args: List, **kwargs: Dict) → Union[dabstract.abstract.abstract.SelectAbstract, dabstract.abstract.abstract.DataAbstract, numpy.ndarray, list]

Factory function to allow for choice between lazy and direct example selection.

For both an instance of SelectAbstract is created. Different from lazy selecting, is that with direct selecting all examples are immediately evaluated.

For more information on the functionality of Select please check the docstring of SelectAbstract().

Parameters
dataIterable

input data to perform selection on, if eval_data is None

selectorList[int] OR Callable OR numbers.Integral

selection criterium

eval_dataAny

if eval_data not None, then selection will be performed on eval_data, else data (default = None)

lazybool

apply lazily or not (default = True)

workersint

amount of workers used for loading the data (default = 1)

buffer_lenint

buffer_len of the pool (default = 3)

arg/kwargs:

additional param to provide to the function if needed

Returns
SelectAbstract OR DataAbstract OR np.ndarray OR list
class dabstract.abstract.abstract.SelectAbstract(data: Iterable, selector: Union[List[int], Callable, numbers.Integral], eval_data: Any = None, *args, **kwargs: Dict)

Bases: dabstract.abstract.abstract.Abstract

Select a subset of your input sequence.

Selection is based on a so called ‘selector’ which may have the form of a Callable or a list/np.ndarray of integers. Important for these Callables is that they accept two arguments: (1) data to base selection on and (2) index of the variable to be evaluated.

Regarding the selector one can use set of build-in selectors in dabstract.dataset.select, lambda function, an own custom function or indices. For example:

  1. random subsampling with:

    $  SelectAbstract(data, dabstract.dataset.select.random_subsample('ratio': 0.5))
    
  2. select based on a key and a particular value:

    $  SelectAbstract(data, dabstract.dataset.select.subsample_by_str('ratio': 0.5))
    
  3. use the lambda function such as:

    $  SelectAbstract(data, (lambda x,k: x['data']['subdb'][k]))
    
  4. directly use indices:

    $  indices = np.array[0,1,2,3,4])
    $  SelectAbstract(data, indices)
    

If no ‘eval_data’ is used, the evaluation is performed on data available in ‘data’. If ‘eval_data’ is available the evaluation is performed on ‘eval_data’

The SelectAbstract contains the following methods:

.get - return entry from SelectAbstract
.keys - return the list of keys

The full explanation for each method is provided as a docstring at each method.

Parameters
dataIterable

input data to perform selection on, if eval_data is None

selectorList[int] OR Callable OR numbers.Integral

selection criterium

eval_dataAny

if eval_data not None, then selection will be performed on eval_data, else data (default = None)

kwargsDict

additional param to provide to the function if needed

Returns
SelectAbstract class
get(index: int, return_info: bool = False, *args: List, **kwargs: Dict) → Union[List, numpy.ndarray, Any]
Parameters
indexint

index to retrieve data from

return_infobool

return tuple (data, info) if True else data (default = False)

argList

additional param to provide to the function if needed

kwargsDict

additional param to provide to the function if needed

Returns
List OR np.ndarray OR Any
get_indices()
set_indices(selector, *args, **kwargs)