Database Interface Overhaul
This merge request introduces a significant overhaul to the database_interface
and selection_table
modules, replacing it with the new hdf5_interface
and annotation_handling
modules. These changes are all in preparation for the upcoming create_db
CLI.
Database Interface Overhaul:
-
New Module:
annotation_handling
- This module replaces the
selection_table
module, introducing more streamlined and robust functionality:-
Function
standardize
: Standardizes annotation tables, ensuring consistent column names and label mappings. -
Function
adjust_segment_interval
: Adjusts the start and end times of segments while maintaining a specified duration. -
Function
generate_time_shifted_segments
: Generates time-shifted annotation versions with a user-defined minimum overlap and step sizes. -
Function
create_random_segments
: Creates random, non-overlapping segments from audio files, validated against annotations. -
Function
validate_segment
: Validates whether a segment overlaps with existing annotations for a file, considering optional buffer margins.
-
Function
Important Changes:
- The
standardize
function, and ketos in general will move away from multi-level indexing. Standardized annotation tables now default to single-level row indices, improving compatibility with pandas' standard practices.
- This module replaces the
-
New Module:
hdf5_interface
- This module replaces the
database_interface
and introduces functions for interacting with HDF5 files at a more granular level:-
Function
create_table
: Creates or retrieves an HDF5 table, with automatic parent group creation if needed. -
Function
save_attributes
: Saves metadata attributes to HDF5 leaf nodes like tables or arrays. -
Function
insert_representation_data
: Inserts representation data (e.g., spectrograms or waveforms) into an HDF5 table. -
Function
generate_table_description
: Dynamically creates a PyTables table description class based on a sample's shape.
-
Function
- This module replaces the
In general, there is no longer a single "do it all" function like the old create_database
. However, with the upcoming adition of the create_db
command line function I think this scenario will be covered for users that do not have the technical background to work with the more low-level functions of the hdf5_interface
to create a new database.