Refactor standardize function for enhanced flexibility (!13) · Merge requests · MERIDIAN / ketos

Bruno Padovese requested to merge StandardizeLabelUpdate into development Feb 23, 2024

This merge request contains a couple of significant changes to the standardize function. While the overall functionally remains similar, I have simplified the function, removed a couple of paramters, deprecated others, and chaged the funcitonality of the 'labels' paramter

Removed the start_labels_at_1 in favour of an enhanced 'labels' paramter
Unified 'table' and 'path' parameters into a single 'annotations' parameter to simplify the function's interface. Added deprecation warnings for 'table' and 'path' to maintain backward compatibility. The annotations parameter can receive either a pandas df like the 'table' argument or a path to a csv like the 'path' parameter.
Removed 'mapper' argument and deprecated '_create_label_dict' to encourage direct manipulation of DataFrame columns before calling standardize, streamlining the function’s operation.
Removed te missing_columns function in favour of just checking the missing columns in hte standardize function code.
Improved label handling logic to dynamically adapt to provided 'labels' configurations, such as custom mappings, and automatic integer mapping. The following options are possible:
- auto (default): the function will automatically map all the labels in the table to integers starting from 0.
- None: No mapping is done, this is useful if hte user has already the label configuration they want
- list: maps the labels in the list to integers starting from 0. Any remaining label is mapped to -1.
- dict: Full control of how the labels are mapped. Any label not in the dict is mapped to -1.
Updated documentation, tests and examples to reflect new functionality and argument handling.

This merge request does not handle the modifications dicussed in #31

Refactor standardize function for enhanced flexibility

Merge request reports