Skip to content

[ENH] Re-implementation of Triangle.to_dict() #723

@genedan

Description

@genedan

Description

Triangle.to_dict() exists as a passthrough of pandas.DataFrame. However, the dict structure varies depending whether the triangle is single-dimensional, multidimensional, or a pattern, and as far as I can tell is not currently ingestible by the Triangle constructor without heavy modification. That is, I would like:

my_dict = my_triangle.to_dict()
triangle = Triangle(**my_dict)

to be possible. Furthermore, patterns that are exported as to_dict, aren't ingestible by cl.DevelopmentConstant, whose first argument accepts a dict, that is, I would like:

genins = cl.load_sample("genins")
dev = cl.Development().fit(genins)

pattern_dict = dev.ldf_.to_dict()
cl.DevelopmentConstant(pattern=pattern_dict, ...)

to be possible. Triangle.to_dict() is also not documented, so currently the only way to figure out what it does is to run it. So far, I've reverse engineered the top-level keys for the following cases:

  1. 1-D triangle: Origin period
  2. Multidimensional triangle: Column
  3. Pattern: Development period

Which means there is a different hierarchy for each case, even though they are all of class Triangle. By creating our own implementation, documenting what the method accepts, outputs, and adding our own custom arguments, we can accomplish greater control over what it does and improve the consistency of the output that users would expect from it.

Is your feature request aligned with the scope of the package?

  • Yes, absolutely!
  • No, but it's still worth discussing.
  • N/A (this request is not a codebase enhancement).

Describe the solution you'd like, or your current workaround.

A rough sketch of what I propose as output for a non-pattern Triangle would be:

{
    "data": {
        "grname": {"Ballstate", ...},
        "lob": {"wkcomp", ...},
        "origin": {1999, ...},
        "development": {1999, ...},
        "paid": {1234, ...},
        "reported": {4567, ...},
    }
    "origin": "origin",
    "development": "development",
    "columns": ["paid", "reported"],
    "index": ["grname", "lob"],
    "origin_format": ...
    "development_format": ...
    "cumulative": True,
    "array_backend": ...
    "pattern": False
    "trailing": True
}

This would be ingestible by the Triangle constructor by unpacking the keyword arguments.

The pattern case is tbd, but one example currently looks like:

genins = cl.load_sample("genins")
dev = cl.Development().fit(genins)
dev.ldf_.to_dict()
{'12-24': {'(All)': 3.4906065479322863},
 '24-36': {'(All)': 1.7473326421004893},
 '36-48': {'(All)': 1.4574128360182361},
 '48-60': {'(All)': 1.1738517093997867},
 '60-72': {'(All)': 1.103823532244344},
 '72-84': {'(All)': 1.0862693644363943},
 '84-96': {'(All)': 1.0538743555048127},
 '96-108': {'(All)': 1.0765551783529383},
 '108-120': {'(All)': 1.017724725219544}}

Each key contains two pieces of information, the start and end development periods that the pattern applies to. Extracting either one requires parsing the string and extracting the left and right side of the dash. By making things more atomic, we can reduce the amount of data cleaning users may have, especially if they are using this output in downstream pipelines.

Do you have any additional supporting notes?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Effort > Moderate 🐕Mid-sized tasks estimated to take a few days to a few weeks.Impact > Significant 💠High impact changes. Should only be done in response with community inputs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions