Skip to content

Refactor standardize function for enhanced flexibility

Bruno Padovese requested to merge StandardizeLabelUpdate into development

This merge request contains a couple of significant changes to the standardize function. While the overall functionally remains similar, I have simplified the function, removed a couple of paramters, deprecated others, and chaged the funcitonality of the 'labels' paramter

  • Removed the start_labels_at_1 in favour of an enhanced 'labels' paramter
  • Unified 'table' and 'path' parameters into a single 'annotations' parameter to simplify the function's interface. Added deprecation warnings for 'table' and 'path' to maintain backward compatibility. The annotations parameter can receive either a pandas df like the 'table' argument or a path to a csv like the 'path' parameter.
  • Removed 'mapper' argument and deprecated '_create_label_dict' to encourage direct manipulation of DataFrame columns before calling standardize, streamlining the function’s operation.
  • Removed te missing_columns function in favour of just checking the missing columns in hte standardize function code.
  • Improved label handling logic to dynamically adapt to provided 'labels' configurations, such as custom mappings, and automatic integer mapping. The following options are possible:
    • auto (default): the function will automatically map all the labels in the table to integers starting from 0.
    • None: No mapping is done, this is useful if hte user has already the label configuration they want
    • list: maps the labels in the list to integers starting from 0. Any remaining label is mapped to -1.
    • dict: Full control of how the labels are mapped. Any label not in the dict is mapped to -1.
  • Updated documentation, tests and examples to reflect new functionality and argument handling.

This merge request does not handle the modifications dicussed in #31

Merge request reports