Skip to content

Cross-section implementationΒ #720

@genedan

Description

@genedan

Description

I came across this cool pandas DataFrame method called pandas.DataFrame.xs which returns a cross-section of a DataFrame. For example:

d = {
    "num_legs": [4, 4, 2, 2],
    "num_wings": [0, 0, 2, 2],
    "class": ["mammal", "mammal", "mammal", "bird"],
    "animal": ["cat", "dog", "bat", "penguin"],
    "locomotion": ["walks", "walks", "flies", "walks"],
}
df = pd.DataFrame(data=d)
df = df.set_index(["class", "animal", "locomotion"])
df
                           num_legs  num_wings
class  animal  locomotion
mammal cat     walks              4          0
       dog     walks              4          0
       bat     flies              2          2
bird   penguin walks              2          2
df.xs("mammal")
                   num_legs  num_wings
animal locomotion
cat    walks              4          0
dog    walks              4          0
bat    flies              2          2
df.xs(("mammal", "dog", "walks"))
num_legs     4
num_wings    0

Since chainladder is built on multi-dimensional triangles, porting this feature over from pandas will provide an intutive alternative to, but won't replace .loc, and .iloc for slicing a triangle.

Is your feature request aligned with the scope of the package?

  • Yes, absolutely!
  • No, but it's still worth discussing.
  • N/A (this request is not a codebase enhancement).

Describe the solution you'd like, or your current workaround.

One existing way to extract a 1-D triangle from the clrd sample data is:

clrd.loc["Allstate Ins Co Grp"].iloc[-1]["CumPaidLoss"]
          12        24        36        48        60        72        84        96        108       120
1988  70571.0  155905.0  220744.0  251595.0  274156.0  287676.0  298499.0  304873.0  321808.0  325322.0
1989  66547.0  136447.0  179142.0  211343.0  231430.0  244750.0  254557.0  270059.0  273873.0       NaN
1990  52233.0  133370.0  178444.0  204442.0  222193.0  232940.0  253337.0  256788.0       NaN       NaN
1991  59315.0  128051.0  169793.0  196685.0  213165.0  234676.0  239195.0       NaN       NaN       NaN
1992  39991.0   89873.0  114117.0  133003.0  154362.0  159496.0       NaN       NaN       NaN       NaN
1993  19744.0   47229.0   61909.0   85099.0   87215.0       NaN       NaN       NaN       NaN       NaN
1994  20379.0   46773.0   88636.0   91077.0       NaN       NaN       NaN       NaN       NaN       NaN
1995  18756.0   84712.0   87311.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN
1996  42609.0   44916.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN
1997    691.0       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN

The chainladder analogue would look something like:

clrd.xs(("Allstate Ins Co Grp", "wkcomp")).xs("CumPaidLoss", axis=1)

Which should return the same 1-D triangle.

Do you have any additional supporting notes?

I wouldn't expect a perfect 1-1 translation of the pandas method. First, I think we should aim to be able to extract the desired 1-D triangle using chainladder.xs, and just focusing on the key parameter for now.

This is to spark the "Wow! This is just like pandas!" imagination in the user and encourage them to keep going. Once we get a rudimentary method going, we can refine it gradually as we identify more needed functionality.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Effort > Moderate πŸ•Mid-sized tasks estimated to take a few days to a few weeks.Impact > Moderate πŸ”ΆUser-visible but non-breaking change. Treated like a minor version bump (e.g., 0.6.5 β†’ 0.7.0).

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions