Add the functionality to ingest biological data directly from database views

Currently `load_data.load_biological_data` will load a set of CSV files into a dictionary of dataframes, which contain data that will be used to:
- compute mean sigma_bs and average weight
- compute number and weight proportions for reapportioning biomass

https://github.com/OSOceanAcoustics/echopop/blob/82970fb2afeb53daddd3248c302b569a04cc24bd/echopop/workflow/feat_hake.py#L169-L209

This set of CSV files are generated from database queries, so it would be useful to cut out the CSV step and get these data directly via database views. This will make the workflow more streamlined and more robust against arbitrary changes in CSV files.

We should still keep the current CSV biological data loading function, so that we can easily create a comparison function to double check the database view function is producing the same results.

cc @aliciabillings-noaa @ElizabethMPhillips 

	# ==================================================================================================
	# Load in the biolodical data
	# ---------------------------
	BIODATA_SHEET_MAP: Dict[str, str] = {
	"catch": "biodata_catch",
	"length": "biodata_length",
	"specimen": "biodata_specimen",
	}
	SUBSET_DICT: Dict[Any, Any] = {
	"ships": {
	160: {
	"survey": 201906
	},
	584: {
	"survey": 2019097,
	"haul_offset": 200
	}
	},
	"species_code": [22500]
	}
	FEAT_TO_ECHOPOP_BIODATA_COLUMNS = {
	"frequency": "length_count",
	"haul": "haul_num",
	"weight_in_haul": "weight",
	}
	BIODATA_LABEL_MAP: Dict[Any, Dict] = {
	"sex": {
	1: "male",
	2: "female",
	3: "unsexed"
	}
	}

	#
	dict_df_bio = load_data.load_biological_data(
	biodata_filepath=DATA_ROOT / "Biological/1995-2023_biodata_redo.xlsx",
	biodata_sheet_map=BIODATA_SHEET_MAP,
	column_name_map=FEAT_TO_ECHOPOP_BIODATA_COLUMNS,
	subset_dict=SUBSET_DICT,
	biodata_label_map=BIODATA_LABEL_MAP
	)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add the functionality to ingest biological data directly from database views #389

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add the functionality to ingest biological data directly from database views #389

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions