Revert internal label mapping made in the BatchGenerator class and JointBatchGen Class
With this merge request, I am in favor of using the transform input function to do the label mapping instead of doing it internally in the BatchGenerator. In general, i found the way we were doing the mapping very finicky, and more often than not, i was disabling the map_label argument. In adition, it will save us hundreds of lines of code maintenance with a solution that is quite simply, a lot more elegant and powerful.
When i originally suggested the mapping changes, it came from not really understanding too well the purpose of the transform function. This is an issue that still exists today for two main reason as i see it:
- Poor documentation of the function/parameter. It tells you that you can do whatever you want in it, but provide little explanation on what this means or how to use it and few examples, and most examples are very very basic.
- Existance of the default transform function in the NN classes. What ends up happening is that users rely heavily on these functions that we provide as default since they work for several use cases. This turns these convenience default functions into crutches as users don`t really have to learn what the transform function does, or even forget their existance and when a use case appears that could be easily solved by adding some lines of code to that function, they will come up with complicated solutions to solve a problem that could be easily solved. An exemple of this is what I did with the label_mapping.
Trying to solve these issues, I created a small tutorial for the ketos-tutorials that focuses on the output_transform_function of the BatchGenerator class. In aditional, we should consider if we should remove the default transform function from the classes, as they ARE supposed to be customized.
Regardless, after understanding it better, using the transofrm function to do the mapping is simply better than what we were doing before. Here are a couple of advantages:
- It is more transparent to the user, adter all, they will be the ones to configure the function
- The label mapping can be achieved with just two lines of code when using the transform function. This replaces the hundreds of line that we had to add in both the BatchGenerator and JointBatchGen classes. See the tutorial for an example of label mapping.
- Makes the code simpler and easier to maintain
- Makes the BatchGenerator class simpler to the user and with less options. This is another point in ketos that I think is hard for new users. Just to many functions and options that it is overwhelming.
Another advantage of using the transform function is that it gives the user more freedom on which label to map to which integer, and also on any aditional tranformation they wish to do first. In fact, we dont even have to map string labels to integers in the standardize function, as this can also be easily done on the fly with the transform function.
Please review both this merge request and the tutorial one together as they complement one another. You can find it here.