Skip to content

Add the functionality to ingest biological data directly from database views #389

@leewujung

Description

@leewujung

Currently load_data.load_biological_data will load a set of CSV files into a dictionary of dataframes, which contain data that will be used to:

  • compute mean sigma_bs and average weight
  • compute number and weight proportions for reapportioning biomass

# ==================================================================================================
# Load in the biolodical data
# ---------------------------
BIODATA_SHEET_MAP: Dict[str, str] = {
"catch": "biodata_catch",
"length": "biodata_length",
"specimen": "biodata_specimen",
}
SUBSET_DICT: Dict[Any, Any] = {
"ships": {
160: {
"survey": 201906
},
584: {
"survey": 2019097,
"haul_offset": 200
}
},
"species_code": [22500]
}
FEAT_TO_ECHOPOP_BIODATA_COLUMNS = {
"frequency": "length_count",
"haul": "haul_num",
"weight_in_haul": "weight",
}
BIODATA_LABEL_MAP: Dict[Any, Dict] = {
"sex": {
1: "male",
2: "female",
3: "unsexed"
}
}
#
dict_df_bio = load_data.load_biological_data(
biodata_filepath=DATA_ROOT / "Biological/1995-2023_biodata_redo.xlsx",
biodata_sheet_map=BIODATA_SHEET_MAP,
column_name_map=FEAT_TO_ECHOPOP_BIODATA_COLUMNS,
subset_dict=SUBSET_DICT,
biodata_label_map=BIODATA_LABEL_MAP
)

This set of CSV files are generated from database queries, so it would be useful to cut out the CSV step and get these data directly via database views. This will make the workflow more streamlined and more robust against arbitrary changes in CSV files.

We should still keep the current CSV biological data loading function, so that we can easily create a comparison function to double check the database view function is producing the same results.

cc @aliciabillings-noaa @ElizabethMPhillips

Metadata

Metadata

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions