huggingface-models

This repository contains materials for the project of hugging face datasets and models reuse analysis.

Abstract

This study empirically explores how Natural Language Processing (NLP) and Computer Vision (CV) datasets and models are reused in the Hugging Face community. We find that NLP tasks - such as Zero-shot-classification, Sentence-similarity, and Feature-extraction - require more diverse datasets compared to CV tasks on average. On the other hand, NLP datasets were reused less frequently than CV datasets. In addition, CV models were reused frequently to develop other models compared to NLP models. In conclusion, NLP models reused diverse datasets for training, while CV datasets and models were reused more and layered up together to develop other models. This study contributes to the understudied area of dataset and model reuse in computing and the broader data reuse subfield under Information Science.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
poster		poster
README.md		README.md
hf-api-data-collection.ipynb		hf-api-data-collection.ipynb
hf-api-rqs.ipynb		hf-api-rqs.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

huggingface-models

Abstract

About

Uh oh!

Releases

Packages

Languages

park-jay/huggingface-models

Folders and files

Latest commit

History

Repository files navigation

huggingface-models

Abstract

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages