Design universal data loader
Currently, lots of stuff in the pipeline is still hardcoded for H.E.S.S. Improve DataLoading and general DataProcessing. -> Open door for DiskDatasets
Try to support the following experiments:
- H.E.S.S. / CTA (ctapipe like format)
- Auger (design for tricky Univ. HDF5 set)
- SWGO (XML-based detector config --> handled py pyswgo --> insert as config to astro_dl)
Implement Recipe
-
Solid concept: connection of features and positions -
Flexible concept for detector geometries for each feature? -
Enable to share geometries -
Support changing detector geometries? -
renaming of features -
solid representation of data (graph / multigraphs / image(s) / trace(s) / (scalars)) -
handling of zeros in data (dynamic graphs (remove sensors), dynamic detectors (remove graphs))
Need to support the following data formats:
-
pure HDF5 tree -
HF5 with pandas -
NPZ
Implement DataLoader
-
Shape detection -
File scanning (look for broken files) -
n_samples scanning -
Data Stacking -
cuts based on observables (np functions) -
pre-transforms (numpy-based)