Skip to content

Database Interface Overhaul

Bruno Padovese requested to merge database_interface_overhaul into development

This merge request introduces a significant overhaul to the database_interface and selection_table modules, replacing it with the new hdf5_interface and annotation_handling modules. These changes are all in preparation for the upcoming create_db CLI.

Database Interface Overhaul:

  • New Module: annotation_handling

    • This module replaces the selection_table module, introducing more streamlined and robust functionality:
      • Function standardize: Standardizes annotation tables, ensuring consistent column names and label mappings.
      • Function adjust_segment_interval: Adjusts the start and end times of segments while maintaining a specified duration.
      • Function generate_time_shifted_segments: Generates time-shifted annotation versions with a user-defined minimum overlap and step sizes.
      • Function create_random_segments: Creates random, non-overlapping segments from audio files, validated against annotations.
      • Function validate_segment: Validates whether a segment overlaps with existing annotations for a file, considering optional buffer margins.

    Important Changes:

    • The standardize function, and ketos in general will move away from multi-level indexing. Standardized annotation tables now default to single-level row indices, improving compatibility with pandas' standard practices.
  • New Module: hdf5_interface

    • This module replaces the database_interface and introduces functions for interacting with HDF5 files at a more granular level:
      • Function create_table: Creates or retrieves an HDF5 table, with automatic parent group creation if needed.
      • Function save_attributes: Saves metadata attributes to HDF5 leaf nodes like tables or arrays.
      • Function insert_representation_data: Inserts representation data (e.g., spectrograms or waveforms) into an HDF5 table.
      • Function generate_table_description: Dynamically creates a PyTables table description class based on a sample's shape.

In general, there is no longer a single "do it all" function like the old create_database . However, with the upcoming adition of the create_db command line function I think this scenario will be covered for users that do not have the technical background to work with the more low-level functions of the hdf5_interface to create a new database.

Merge request reports

Loading