Helpers¶

FolderDictSeqAbstract¶

class dabstract.dataset.helpers.FolderDictSeqAbstract(path: str, extension: str = '.wav', map_fct: Callable = None, file_info_save_path: bool = None, filepath: str = None, overwrite_file_info: bool = False, info: List[Dict] = None, **kwargs)¶

Bases: dabstract.abstract.abstract.DictSeqAbstract

Get meta information of the files in a directory and place them in a DictSeq

This function gets meta information (e.g. sampling frequency, length) of files in your provided directory. It return a FolderDictSeq with the filenames/information/subfolders. A FolderDictSeq is inherited from DictSeq and has similar functionality. However, for a FolderDictSeq the active_keys are fixed to ‘data’. In essence FolderDictSeq is a data container showing information of a walk through a folder. Additionally, this format keeps track of relevant information to either wav or numpy files. prepare_feat and add_split only work on data fields that have this structure.

Parameters

pathstr: path to the directory to check
extensionstr: only evaluate files with that extension
map_fctCallable: add a mapping function y = f(x) to the ‘data’
filepathstr: in case you already have the files you want to obtain information from, the dir tree search is not done and this is used instead
file_info_save_path:str: save the information to this location this function can be costly, so saving is useful
overwrite_file_infobool: overwrite file info file

Returns

DictSeqAbstractDictSeqAbstract

dictseq containing file information as a list, formatted as:

output['filepath'] = list of paths to files
output['example'] = example string (i.e. filename without extension)
output['filename'] = filename
output['subdb'] = relative subdirectory (starting from 'path') to file
output['info'][file_id] = { 'output_shape': .., #output shape of the wav file
                            'fs': .., # sampling frequency
                            'time_step' ..: # sample period
                            }

reset_active_keys() → None¶: Disables reset of active keys

set_active_keys(keys: List[str]) → None¶: Disables set of active keys

dataset_from_config¶

dabstract.dataset.helpers.dataset_from_config(config: Dict, overwrite_xval: bool = False) → Dataset¶

Create a dataset from configuration

This function creates a dataset class from a dictionary definition. It is advised to define your configuration in yaml, read it using dabstract.utils.yaml_from_config and utilise it as such:

$  data = load_yaml_config(filename=path_to_dir, path=path_to_yaml, walk=True/False,
$  post_process=dataset_from_config, **kwargs)

As a configuration one is advised to check the examples in dabstract/examples/introduction. e.g. introduction/configs/dbs/EXAMPLE_anomaly.yaml:

A full format is defined as with placeholders:
$ datasets:
$  - name: dataset0
$     parameters:
$       paths: ..
$       select: ..
$       split: ..
$       other: ..
$       test_only: 1/0
$   - name: dataset1
$     parameters:
$       paths: ..
$       select: ..
$       split: ..
$       other: ..
$       test_only: 1/0
$ select:
$ split:
$ xval:

For each dataset, name and parameters/path is mandatory. Select/split/test_only are default options to subsample, split or define that the dataset is for testing only (1/0) respectively. Select/split/xval are all defined in the following way:

$   name:
$   parameters:

name refers to either a string of a class that can be found in dataset.select/dataset.xval respectively or in the custom folder defined by os.environ[“dabstract_CUSTOM_DIR”]. parameters is not mandatory.

More information on the possibilities of select, split and xval the reader is referred to: dataset.dataset.add_select() dataset.dataset.add_split() dataset.dataset.set_xval()

Parameters

configstr: dictionary configuration
overwrite_xvalbool: overwrite xval file

dataset_factory¶

dabstract.dataset.helpers.dataset_factory(name: (<class 'str'>, ~Dataset, <class 'type'>) = None, paths: Dict[str, str] = None, xval: Optional[Dict[str, Union[str, int, Dict]]] = None, split: Optional[Dict[str, Union[str, int, Dict]]] = None, select: Optional[Dict[str, Union[str, int, Dict]]] = None, test_only: Optional[bool] = 0, **kwargs) → Dataset¶

Dataset factory

This function creates a dataset class from name and parameters. Specifically, this is used to search by name for that particular database class in - environment variable folder: os.environ[“dabstract_CUSTOM_DIR”] = your_dir - dabstract.dataset.dbs folder

If name is defined as a class object, than it uses this to init the dataset with the given kwargs. This function is mostly used by dataset_from_config(). One is advised to directly import the desired dataset class instead of using dataset_factory. This is only handy for configuration based experiments, which need a load from string. For example:

$  data = dataset_factory(name='DCASE2020Task1B',
$                         paths={'data': path_to_data,
$                                'meta': path_to_meta,
$                                'feat': path_to_feat},

One is advised to check the examples in dabstract/examples/introduction on how to work with datasets

Parameters

select: Dict[str,Union[str,int, Dict]]: selector configuration
split: Dict[str,Union[str,int, Dict]]: split configuration
xvalDict[str,Union[str,int, Dict]]: xval configuration
test_onlybool: use the dataset for test (test_only=1) or both train and test (test_only=0)
namestr/instance/object: name of the class (or the class directly)
pathsdict[str]: dictionary containing paths to the data
kwargs: ToDo, not defined as this should be used only by load_from_config()

Returns

datasetDataset class

get_dir_info¶

dabstract.dataset.helpers.get_dir_info(path: str, extension: str = '.wav', file_info_save_path: bool = None, filepath: str = None, overwrite_file_info: bool = False, **kwargs) → Dict[str, List[Any]]¶

Get meta information of the files in a directory.

This function gets meta information (e.g. sampling frequency, length) of files in your provided directory. It return a dictionary with the filenames/information/subfolders. This is mainly useful for obtaining apriori information of wav files such that they can be splitted in a lazy manner from disk.

Parameters

pathstr: path to the directory to check
extensionstr: only evaluate files with that extension
map_fctCallable: add a mapping function y = f(x) to the ‘data’
filepathstr: in case you already have the files you want to obtain information from, the dir tree search is not done and this is used instead
file_info_save_path:str: save the information to this location this function can be costly, so saving is useful
overwrite_file_infobool: overwrite file info file

Returns

dictdict

dict containing file information as a list, formatted as:

output['filepath'] = list of paths to files
output['example'] = example string (i.e. filename without extension)
output['filename'] = filename
output['subdb'] = relative subdirectory (starting from 'path') to file
output['info'][file_id] = { 'output_shape': .., #output shape of the wav file
                            'fs': .., # sampling frequency
                            'time_step' ..: # sample period
                            }