Helpers¶
FolderDictSeqAbstract¶
-
class
dabstract.dataset.helpers.FolderDictSeqAbstract(path: str, extension: str = '.wav', map_fct: Callable = None, file_info_save_path: bool = None, filepath: str = None, overwrite_file_info: bool = False, info: List[Dict] = None, **kwargs)¶ Bases:
dabstract.abstract.abstract.DictSeqAbstractGet meta information of the files in a directory and place them in a DictSeq
This function gets meta information (e.g. sampling frequency, length) of files in your provided directory. It return a FolderDictSeq with the filenames/information/subfolders. A FolderDictSeq is inherited from DictSeq and has similar functionality. However, for a FolderDictSeq the active_keys are fixed to ‘data’. In essence FolderDictSeq is a data container showing information of a walk through a folder. Additionally, this format keeps track of relevant information to either wav or numpy files. prepare_feat and add_split only work on data fields that have this structure.
- Parameters
- pathstr
path to the directory to check
- extensionstr
only evaluate files with that extension
- map_fctCallable
add a mapping function y = f(x) to the ‘data’
- filepathstr
in case you already have the files you want to obtain information from, the dir tree search is not done and this is used instead
- file_info_save_path:str
save the information to this location this function can be costly, so saving is useful
- overwrite_file_infobool
overwrite file info file
- Returns
- DictSeqAbstractDictSeqAbstract
dictseq containing file information as a list, formatted as:
output['filepath'] = list of paths to files output['example'] = example string (i.e. filename without extension) output['filename'] = filename output['subdb'] = relative subdirectory (starting from 'path') to file output['info'][file_id] = { 'output_shape': .., #output shape of the wav file 'fs': .., # sampling frequency 'time_step' ..: # sample period }
-
reset_active_keys() → None¶ Disables reset of active keys
-
set_active_keys(keys: List[str]) → None¶ Disables set of active keys
dataset_from_config¶
-
dabstract.dataset.helpers.dataset_from_config(config: Dict, overwrite_xval: bool = False) → Dataset¶ Create a dataset from configuration
This function creates a dataset class from a dictionary definition. It is advised to define your configuration in yaml, read it using dabstract.utils.yaml_from_config and utilise it as such:
$ data = load_yaml_config(filename=path_to_dir, path=path_to_yaml, walk=True/False, $ post_process=dataset_from_config, **kwargs)
As a configuration one is advised to check the examples in dabstract/examples/introduction. e.g. introduction/configs/dbs/EXAMPLE_anomaly.yaml:
A full format is defined as with placeholders: $ datasets: $ - name: dataset0 $ parameters: $ paths: .. $ select: .. $ split: .. $ other: .. $ test_only: 1/0 $ - name: dataset1 $ parameters: $ paths: .. $ select: .. $ split: .. $ other: .. $ test_only: 1/0 $ select: $ split: $ xval:
For each dataset, name and parameters/path is mandatory. Select/split/test_only are default options to subsample, split or define that the dataset is for testing only (1/0) respectively. Select/split/xval are all defined in the following way:
$ name: $ parameters:
name refers to either a string of a class that can be found in dataset.select/dataset.xval respectively or in the custom folder defined by os.environ[“dabstract_CUSTOM_DIR”]. parameters is not mandatory.
More information on the possibilities of select, split and xval the reader is referred to: dataset.dataset.add_select() dataset.dataset.add_split() dataset.dataset.set_xval()
- Parameters
- configstr
dictionary configuration
- overwrite_xvalbool
overwrite xval file
dataset_factory¶
-
dabstract.dataset.helpers.dataset_factory(name: (<class 'str'>, ~Dataset, <class 'type'>) = None, paths: Dict[str, str] = None, xval: Optional[Dict[str, Union[str, int, Dict]]] = None, split: Optional[Dict[str, Union[str, int, Dict]]] = None, select: Optional[Dict[str, Union[str, int, Dict]]] = None, test_only: Optional[bool] = 0, **kwargs) → Dataset¶ Dataset factory
This function creates a dataset class from name and parameters. Specifically, this is used to search by name for that particular database class in - environment variable folder: os.environ[“dabstract_CUSTOM_DIR”] = your_dir - dabstract.dataset.dbs folder
If name is defined as a class object, than it uses this to init the dataset with the given kwargs. This function is mostly used by dataset_from_config(). One is advised to directly import the desired dataset class instead of using dataset_factory. This is only handy for configuration based experiments, which need a load from string. For example:
$ data = dataset_factory(name='DCASE2020Task1B', $ paths={'data': path_to_data, $ 'meta': path_to_meta, $ 'feat': path_to_feat},One is advised to check the examples in dabstract/examples/introduction on how to work with datasets
- Parameters
- select: Dict[str,Union[str,int, Dict]]
selector configuration
- split: Dict[str,Union[str,int, Dict]]
split configuration
- xvalDict[str,Union[str,int, Dict]]
xval configuration
- test_onlybool
use the dataset for test (test_only=1) or both train and test (test_only=0)
- namestr/instance/object
name of the class (or the class directly)
- pathsdict[str]
dictionary containing paths to the data
- kwargs: ToDo, not defined as this should be used only by load_from_config()
- Returns
- datasetDataset class
get_dir_info¶
-
dabstract.dataset.helpers.get_dir_info(path: str, extension: str = '.wav', file_info_save_path: bool = None, filepath: str = None, overwrite_file_info: bool = False, **kwargs) → Dict[str, List[Any]]¶ Get meta information of the files in a directory.
This function gets meta information (e.g. sampling frequency, length) of files in your provided directory. It return a dictionary with the filenames/information/subfolders. This is mainly useful for obtaining apriori information of wav files such that they can be splitted in a lazy manner from disk.
- Parameters
- pathstr
path to the directory to check
- extensionstr
only evaluate files with that extension
- map_fctCallable
add a mapping function y = f(x) to the ‘data’
- filepathstr
in case you already have the files you want to obtain information from, the dir tree search is not done and this is used instead
- file_info_save_path:str
save the information to this location this function can be costly, so saving is useful
- overwrite_file_infobool
overwrite file info file
- Returns
- dictdict
dict containing file information as a list, formatted as:
output['filepath'] = list of paths to files output['example'] = example string (i.e. filename without extension) output['filename'] = filename output['subdb'] = relative subdirectory (starting from 'path') to file output['info'][file_id] = { 'output_shape': .., #output shape of the wav file 'fs': .., # sampling frequency 'time_step' ..: # sample period }