Test TracIn's effectiveness in text classification

Hello,

I adopt the code from https://github.com/frederick0329/TracIn/blob/master/imagenet/resnet50_imagenet_proponents_opponents.ipynb
to text classification.

The primary goal of  my task is to rank the training samples based on their positive or negative impacts on the clean validation set. The core metrics can be accuracy or cross entropy loss for my task. Quite straightforward. Where the training samples could be 100-200 and validation set contains no more than 100 samples. This is a low-data regime.

Validation set is of no error. It Is clean.

Labels include politics business tech entertainment etc. Just a public news topic classification task: AG NEWS.
 
As for the classifier, similar to your resnet in the image example, I am using CMLM from tensorflow hub and vectorize all samples to 1024 sentence embeddings. Therefore the classifier is quite simple: a single layer network.

here is my implementation
https://github.com/yananchen1989/topic_classification_augmentation/blob/main/cmlm_proponents_opponents.py

I use AUC in the last, to test the effectiveness: high auc indicate that samples of no labelling noise get higher influence score, while samples wrongly labelled, get lower, negative score. 
However, the auc is 0.55. Quite woeful.

I am not sure if there is a bug in my implementation, or I have not using TracIn in a appropriate manner.
 




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test TracIn's effectiveness in text classification #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Test TracIn's effectiveness in text classification #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions