Identifying AI’s Environmental Risks: Using NLP to Analyzing Public Consultation Feedback on the AI Act

This repository contains code and resources for analyzing public consultation feedback on the AI Act, focusing on environmental and climate-related concerns identified by stakeholders. The analysis involves data scraping, cleaning, preprocessing, and textual analysis to extract insights from feedback submitted by various stakeholders.

The workflow follows key Natural Language Processing (NLP) techniques, data scraping, sentence extraction, keyword filtering, and textual analysis to identify and analyze recurring themes and patterns in the feedback data.

The research note can be consulted here.

Repository Structure

data/

Contains raw and processed datasets.

notebooks/

01_scrapping.ipynb

Scrapes public consultation feedback data from the European Commission's website.

02_data_cleaning.ipynb

Cleans and standardizes the scraped text data.

03_preprocessing.ipynb

Performs Exploratory Data Analysis, extracts environmental risk mentions, generates word clouds, trains a Word2Vec model, and visualizes word embeddings

04_analysis.ipynb

Applies sentence embeddings, UMAP dimensionality reduction, and KMeans clustering to group environmental risk feedback into semantic clusters.
Summarizes cluster content using a pre-trained summarization model (BART) to highlight key insights

output/

Stores outputs generated from the analysis notebooks,

Technologies Used

Data Collection and Scraping: selenium
Data Cleaning and Preprocessing: pandas, numpy, re
Natural Language Processing (NLP): nltk, Word2Vec, transformers, langdetect, langid
Exploratory Data Analysis (EDA): CountVectorizer
Machine Learning Models:
- Clustering: KMeans
- Dimensionality Reduction: umap-learn
- Sentence Embeddings: Word2Vec, transformers
- Summarization: torch, transformers
Visualization: matplotlib, seaborn, wordcloud

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
notebooks		notebooks
output		output
.gitignore		.gitignore
README.md		README.md
ai_act_research_note.pdf		ai_act_research_note.pdf
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Identifying AI’s Environmental Risks: Using NLP to Analyzing Public Consultation Feedback on the AI Act

Repository Structure

data/

notebooks/

01_scrapping.ipynb

02_data_cleaning.ipynb

03_preprocessing.ipynb

04_analysis.ipynb

output/

Technologies Used

About

Uh oh!

Releases

Packages

Languages

Monlo/nlp-research-note-ai-act

Folders and files

Latest commit

History

Repository files navigation

Identifying AI’s Environmental Risks: Using NLP to Analyzing Public Consultation Feedback on the AI Act

Repository Structure

data/

notebooks/

01_scrapping.ipynb

02_data_cleaning.ipynb

03_preprocessing.ipynb

04_analysis.ipynb

output/

Technologies Used

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages