-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Copy link
Labels
Description
Currently load_data.load_biological_data will load a set of CSV files into a dictionary of dataframes, which contain data that will be used to:
- compute mean sigma_bs and average weight
- compute number and weight proportions for reapportioning biomass
echopop/echopop/workflow/feat_hake.py
Lines 169 to 209 in 82970fb
| # ================================================================================================== | |
| # Load in the biolodical data | |
| # --------------------------- | |
| BIODATA_SHEET_MAP: Dict[str, str] = { | |
| "catch": "biodata_catch", | |
| "length": "biodata_length", | |
| "specimen": "biodata_specimen", | |
| } | |
| SUBSET_DICT: Dict[Any, Any] = { | |
| "ships": { | |
| 160: { | |
| "survey": 201906 | |
| }, | |
| 584: { | |
| "survey": 2019097, | |
| "haul_offset": 200 | |
| } | |
| }, | |
| "species_code": [22500] | |
| } | |
| FEAT_TO_ECHOPOP_BIODATA_COLUMNS = { | |
| "frequency": "length_count", | |
| "haul": "haul_num", | |
| "weight_in_haul": "weight", | |
| } | |
| BIODATA_LABEL_MAP: Dict[Any, Dict] = { | |
| "sex": { | |
| 1: "male", | |
| 2: "female", | |
| 3: "unsexed" | |
| } | |
| } | |
| # | |
| dict_df_bio = load_data.load_biological_data( | |
| biodata_filepath=DATA_ROOT / "Biological/1995-2023_biodata_redo.xlsx", | |
| biodata_sheet_map=BIODATA_SHEET_MAP, | |
| column_name_map=FEAT_TO_ECHOPOP_BIODATA_COLUMNS, | |
| subset_dict=SUBSET_DICT, | |
| biodata_label_map=BIODATA_LABEL_MAP | |
| ) |
This set of CSV files are generated from database queries, so it would be useful to cut out the CSV step and get these data directly via database views. This will make the workflow more streamlined and more robust against arbitrary changes in CSV files.
We should still keep the current CSV biological data loading function, so that we can easily create a comparison function to double check the database view function is producing the same results.