Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets

This repository contains the code for the paper Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets published in the Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web (DHOW '25).

In this work we show that by simply fine-tunning embedding models on various hate speech detection datasets, we can achieve state-of-the-art performance. We also investigated the use of emotion features and context generated using a Llama2 model to enhance BERT-based models for hate speech detection. To integrate these with text input, we explored several feature fusions approaches, such as concatenation, adaptive fusion, mixture of experts, and shared learnable query, but find no significant improvement over simple concatenation.

Please find below the instructions on how to run all benchmarks.

Setup

We use uv to manage the packages. It creates an isolated environment from pyproject.toml and uses it to run the script. Please find the installation instructions here.

Clone the repository:

git clone https://github.com/idiap/implicit-hsd.git
cd implicit-hsd

Data preparation

Once you have installed uv, the next step is to prepare the dataset. Please follow instructions in the data/ folder where you will find a data.md file with the instructions. Please note that model training will not work unless the data is correctly downloaded and processed.

Running the benchmarks

Once the data is prepared, you can run the following setups:

General text embedding (gte) models.
BERT-based models with various feature fusion strategies (Bert fusion): concatenation, adaptive fusion, mixture of experts, and shared learnable query.

General text embeddings models (gte)

List of supported models:

Please refer to the official pages of the respective models for details on usage and licensing.

Please note that jasper and nvembed models should not be used for any commercial purpose. Moreover, llama3 can be used under LLama 3 community license agreement. gemma3 can be used under Gemma Terms of Use.

List of supported datasets:

Implicit Hate Corpus: as ihc_pure
Social Bias Inference Corpus: as SBIC
Dynamically Generated Hate Speech Dataset: as dynahate
ToxiGen: as TOXIGEN

Please refer to the official dataset pages for details on licensing. Also see the instructions in the data/data.md file.

To launch a run with a default configuration, you can simply run uv run gte_run.py
If you want a finer control over training, we support :

specifying the models with -m, --models flag
specifying the datasets with d, --datasets flag

For example: uv run gte_run.py -m e5 stella -d ihc_pure SBIC

BERT-based models

Launching emotion and context generation

For Bert-based models we only support emotions and context generations for the Implicit Hate Corpus dataset.

After preparing the data, you should see the training, testing, and validation splits inside the ihc_pure folder. Use cd .. to relocated to the parent directory, and then run uv run generate_emo_context.py to extract the emotion features and generate the context. This script will add the emotion vector and context generated by Llama 2 to the IHC dataset.

We use the Emotion Analysis in English model to generate emotion features (e.g., probabilities). To generate the context we use a LLama2 model. Please refer to their corresponding pages for details about the license.

Fusion

The BERT-based model with concatenation can be trained/tested by using the command uv run run.py. Additionnally, a flag --run_ablation can be set to run the model with/without the emotion and context part: uv run run.py --run_ablation.

The BERT-based models with other fusion strategies can be used by setting the argument --use_model, and setting the value to either: concat for the concatenation model, query for the shared learnable query model, moe for the mixture of experts model or adaptive for the adaptive fusions model.

Citation

Vassiliy Cheremetiev, Quang Long Ho Ngo, Chau Ying Kot, Alina Elena Baia, and Andrea Cavallaro. 2025. Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets. In Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web (DHOW '25). Association for Computing Machinery, New York, NY, USA, 23–36. https://doi.org/10.1145/3746275.3762209

Bibtex

@inproceedings{10.1145/3746275.3762209,
  author = {Cheremetiev, Vassiliy and Ngo, Quang Long Ho and Kot, Chau Ying and Baia, Alina Elena and Cavallaro, Andrea},
  title = {Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets},
  year = {2025},
  isbn = {9798400720574},
  publisher = {Association for Computing Machinery},
  address = {New York, NY, USA},
  url = {https://doi.org/10.1145/3746275.3762209},
  doi = {10.1145/3746275.3762209},
  booktitle = {Proceedings of the 2nd International Workshop on Diffusion of Harmful Content on Online Web},
  pages = {23–36},
  numpages = {14},
  location = {Ireland},
  series = {DHOW '25}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
bert		bert
data		data
gte		gte
results		results
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
generate_emo_context.py		generate_emo_context.py
gte_run.py		gte_run.py
pyproject.toml		pyproject.toml
run.py		run.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets

Setup

Data preparation

Running the benchmarks

General text embeddings models (gte)

BERT-based models

Launching emotion and context generation

Fusion

Citation

Bibtex

About

Uh oh!

Releases

Packages

Languages

License

idiap/implicit-hsd

Folders and files

Latest commit

History

Repository files navigation

Specializing General-purpose LLM Embeddings for Implicit Hate Speech Detection across Datasets

Setup

Data preparation

Running the benchmarks

General text embeddings models (gte)

BERT-based models

Launching emotion and context generation

Fusion

Citation

Bibtex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages