Skip to content

This repository contains materials for the project of hugging face datasets and models reuse analysis.

Notifications You must be signed in to change notification settings

park-jay/huggingface-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

huggingface-models

This repository contains materials for the project of hugging face datasets and models reuse analysis.

Abstract

This study empirically explores how Natural Language Processing (NLP) and Computer Vision (CV) datasets and models are reused in the Hugging Face community. We find that NLP tasks - such as Zero-shot-classification, Sentence-similarity, and Feature-extraction - require more diverse datasets compared to CV tasks on average. On the other hand, NLP datasets were reused less frequently than CV datasets. In addition, CV models were reused frequently to develop other models compared to NLP models. In conclusion, NLP models reused diverse datasets for training, while CV datasets and models were reused more and layered up together to develop other models. This study contributes to the understudied area of dataset and model reuse in computing and the broader data reuse subfield under Information Science.

About

This repository contains materials for the project of hugging face datasets and models reuse analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published