[WIP] Add datasets from "Encoding high-cardinality string categorical variables" paper #4

TwsThomas · 2019-09-13T09:20:26Z

Adding the 24 datasets from "Encoding high-cardinality string categorical variables" paper.

Datasets active on OpenML:

I've drop few rows (in colleges and la_crime) when label was missing. (It seems it was required from openml).
In public_procurement, I replace nan by -1 in the label award_value_euro

GaelVaroquaux · 2019-09-13T14:57:48Z

Would the code here be simplified once those datasets make it to openML?

TwsThomas · 2019-09-16T07:55:27Z

Would the code here be simplified once those datasets make it to openML?

Yes!
This code is just to record the upload process.
Once in openML, this will be a one line like:
sklearn.datasets.fetch_openml(data_id=xxx)

GaelVaroquaux · 2019-09-16T13:33:46Z

This code is just to record the upload process.

Of course! Great, super useful!!

GaelVaroquaux · 2019-09-28T00:31:23Z

Should we merge this with what you have, and we open another PR?

TwsThomas · 2019-10-01T12:11:48Z

Should we merge this with what you have, and we open another PR?

Yes. I'll open a new one to upload the last datasets.

TwsThomas added 6 commits August 14, 2019 17:51

add severals dataset

3268170

wip

2f6e953

wip

6c8b378

iter

74ec403

from master to branch new_dataset

9c568a4

iter

ad0eff9

TwsThomas mentioned this pull request Sep 13, 2019

[WIP] Add datasets from "Encoding high-cardinality string categorical variables" paper #3

Closed

25 tasks

iter with dragostore

4b43208

TwsThomas added 3 commits October 4, 2019 11:26

remove few rows chan label was missing, -1 in award euro if misssing

9291aff

iter

0002eed

iter

d465eb2

TwsThomas mentioned this pull request May 14, 2020

[WIP] fix bug in examples skrub-data/skrub#119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add datasets from "Encoding high-cardinality string categorical variables" paper #4

[WIP] Add datasets from "Encoding high-cardinality string categorical variables" paper #4

Uh oh!

TwsThomas commented Sep 13, 2019 •

edited

Loading

Uh oh!

GaelVaroquaux commented Sep 13, 2019

Uh oh!

TwsThomas commented Sep 16, 2019 •

edited

Loading

Uh oh!

GaelVaroquaux commented Sep 16, 2019 via email

Uh oh!

GaelVaroquaux commented Sep 28, 2019

Uh oh!

TwsThomas commented Oct 1, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[WIP] Add datasets from "Encoding high-cardinality string categorical variables" paper #4

Are you sure you want to change the base?

[WIP] Add datasets from "Encoding high-cardinality string categorical variables" paper #4

Uh oh!

Conversation

TwsThomas commented Sep 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented Sep 13, 2019

Uh oh!

TwsThomas commented Sep 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented Sep 16, 2019 via email

Uh oh!

GaelVaroquaux commented Sep 28, 2019

Uh oh!

TwsThomas commented Oct 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TwsThomas commented Sep 13, 2019 •

edited

Loading

TwsThomas commented Sep 16, 2019 •

edited

Loading

TwsThomas commented Oct 1, 2019 •

edited

Loading