Split¶

dabstract.abstract.abstract.Split(data: Iterable, split_size: int = None, constraint: str = None, sample_len: int = None, sample_period: int = None, type: str = 'seconds', lazy: bool = True, workers: bool = 1, buffer_len: int = 3, *args: List, **kwargs: Dict) → Union[dabstract.abstract.abstract.SplitAbstract, dabstract.abstract.abstract.DataAbstract, numpy.ndarray, list]¶

Factory function to allow for choice between lazy and direct example splitting.

For both an instance of SplitAbstract is created. Different from lazy splitting, is that with direct splitting all examples are immediately evaluated.

To have more information on splitting, please read the docstring of SplitAbstract().

Parameters

dataIterable: Iterable object to be splitted
split_sizeint: split size in seconds/samples depending on ‘metric’
constraintstr: option ‘power2’ creates sizes with a order of 2 (used for autoencoders)
sample_lenint: sample length (default = None)
sample_periodint: sample period (default = None)
typestr: split_size type (‘seconds’,’samples’) (default = ‘seconds’)
lazybool: apply lazily or not (default = True)
workersint: amount of workers used for loading the data (default = 1)
buffer_lenint: buffer_len of the pool (default = 3)
argList: additional param to provide to the function if needed
kwargsDict: additional param to provide to the function if needed

Returns

SplitAbstract OR DataAbstract OR np.ndarray OR list

class dabstract.abstract.abstract.SplitAbstract(data: Iterable, split_size: int = None, constraint: str = None, sample_len: Union[int, List[int]] = None, sample_period: int = None, type: str = 'seconds')¶

Bases: dabstract.abstract.abstract.Abstract

The class is an abstract wrapper around an iterable to split this iterable in a lazy manner. Splitting refers to dividing the a particular example in multiple chunks, i.e. 60s examples are divided into 1s segments.

Splitting is based on the parameters split_size, constraint, sample_len, sample_period and type.

If type is set to ‘samples’ one has to define ‘sample_len’ and ‘split_size’. In that case ‘sample_len’ refers to the amount of samples in one example, and split_size the size of one segment. ‘sample_len’ can be set as an integer if all examples are of the same size OR a list of integers if these are different between examples.

If type is set to ‘seconds’ one has to define ‘sample_len’, ‘split_size’ and ‘sample_period’. In this case each of these variables are not samples but defined in terms of seconds. ‘sample_period’ additionally specifies the sample period of these samples in order to properly split.

The SplitAbstract contains the following methods:

.get - return entry from SplitAbstract
.keys - return attribute keys of data

The full explanation for each method is provided as a docstring at each method.

Parameters

dataIterable: Iterable object to be splitted
split_sizeint: split size in seconds/samples depending on ‘metric’
constraintstr: option ‘power2’ creates sizes with a order of 2 (used for autoencoders)
sample_lenint or List[int]: sample length (default = None)
sample_periodint: sample period (default = None)
typestr: split_size type (‘seconds’,’samples’) (default = ‘seconds’)

Returns

SplitAbstract class

get(index: int, return_info: bool = False, *args: List, **kwargs: Dict) → Union[List, numpy.ndarray, Any]¶

Parameters

indexint: index to retrieve data from
return_infobool: return tuple (data, info) if True else data (default = False) info contains the information that has been propagated through the chain of operations
argList: additional param to provide to the function if needed
kwargsDict: additional param to provide to the function if needed
Returns
——-
List OR np.ndarray OR Any

get_param()¶