Skip to content

skrub-data/skrub-tutorials

Repository files navigation

Introduction to the course

This is the website for the Inria Academy course on the skrub package: it contains all the material used for the course, including the datasets and exercises used during the session.

Beta warning

If you are reading this, then you will be attending the Beta version of this course. As a Beta version, this is not the final version of the course and it will be tweaked according to the feedback provided after the session.

Both the presentation and the content of the book are liable to be changed based on feedback.

Structure of the course

The course covers the main features of skrub, from data exploration to pipeline construction, with the notable exclusion of the Data Ops.

Each chapter includes a section that describes how a specific feature may assist in building a machine learning pipeline, along with practical code examples.

Some chapters include exercises for participants to work with the explained features. These exercises are made available in content/exercises, as well as at the end of the respective lesson in content/notebooks.

The content of the book is split in sections, and each section includes a "final quiz" that covers the subjects covered up to that point.

Prepration and setup

First of all, clone the GitHub repo of this book to have access to the exercises. In a future version, Jupyterlite will be made available.

Setting up a local environment

Finding the material

Following any of the following commands should let you open a Jupyter lab or notebook instance in the root of the folder. Then, you will find all the course material as notebooks in content/notebooks, and only the exercises in content/exercises.

All the datasets are made available to the notebooks by cloning the repo.

Using pixi

The easiest way to set up the environment is by installing and using pixi. Follow the platform-specific instructions in the link to install pixi, then open a terminal window.

Run

pixi install

to create the environment, followed by

pixi run lab

to start a Jupyter lab instance.

Using pip

Create the and activate the environment:

python -m venv skrub-tutorial
source skrub-tutorial/bin/activate

Install the required dependencies using the requirements.txt file:

pip install -r requirements.txt

Start the Jupyter lab instance:

jupyter lab

Using conda

An environment.yaml file is provided to create a conda environment.

Create and activate the environment with

conda env create -f environment.yaml
conda activate skrub-tutorial

Then, start a jupyter lab instance:

jupyter lab

Using uv

Create the environment using pyproject.toml as the requirement file.

uv venv 
uv pip install -r pyproject.toml

Activate the environment that was created in the folder.

source .venv/bin/activate

Start the Jupyter lab instance:

jupyter lab 

About

This repository contains material used for tutorials, courses and MOOCs on skrub

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published