diff --git a/Gemma/[Gemma_2]Finetune_with_LORA.ipynb b/Gemma/[Gemma_2]Finetune_with_LORA.ipynb new file mode 100644 index 0000000..7866259 --- /dev/null +++ b/Gemma/[Gemma_2]Finetune_with_LORA.ipynb @@ -0,0 +1,1159 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright 2025 Google LLC\n", + "#\n", + "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", + "# you may not use this file except in compliance with the License.\n", + "# You may obtain a copy of the License at\n", + "#\n", + "# https://www.apache.org/licenses/LICENSE-2.0\n", + "#\n", + "# Unless required by applicable law or agreed to in writing, software\n", + "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", + "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", + "# See the License for the specific language governing permissions and\n", + "# limitations under the License." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using PEFT (parameter efficient fine-tuning technique) with huggingface to fine-tune Gemma model\n", + " \n", + "\n", + " \n", + " \n", + " \n", + "
\n", + " \n", + " \"Google
Open in Colab\n", + "
\n", + "
\n", + " \n", + " \"GitHub
View on GitHub\n", + "
\n", + "
\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "| Author(s) |\n", + "| --- |\n", + "| [Shivam Ghuge](https://github.com/Shiv-am-04) |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Overview\n", + "\n", + "This notebook demonstrates how to fine-tune LLM and SLM using the PEFT (parameter efficient finetuning technique) which is LORA (Low Rank Adaptation) in our case.\n", + "\n", + "### Objective\n", + "\n", + "The Goal is to use fine-tune the model in the environment where we have less compute resources like smaller GPUs, less RAM and less storage. We are fine-tuning google's open source gemma2 model using LORA technique.\n", + "\n", + "**We will cover the following steps:**\n", + "\n", + "1. ***Loading Model*** : We are using huggingface to load the model in the notebook using 4-bit quantization, which leads to a smaller model size, lower memory usage, faster inference speed, and reduced energy consumption.\n", + "\n", + "2. ***Configure BitsAnsBytes*** : Using bitsandbytes config to load the model from huggingface in 4-bit.\n", + "\n", + "3. ***Prepare the Dataset*** : Download the SQl dataset from huggingface and convert it to Huggingface Dataset.\n", + "\n", + "4. ***Perform fine-tuning*** : Using LORA to do the fine-tuning of the model on the dataset\n", + "\n", + "5. ***Deploy*** : Push the model to the huggingface hub from where we can use it.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "qprM8Fl5L2xh" + }, + "source": [ + "#### ***Install PEFT (parameter efficient fine tuning), bitsandbytes and other required packages***\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%pip install peft bitsandbytes transformers accelerate datasets trl google" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# import tensorflow\n", + "import torch\n", + "from datasets import load_dataset\n", + "from transformers import AutoTokenizer,AutoModelForCausalLM,BitsAndBytesConfig,TrainingArguments,logging\n", + "from trl import SFTTrainer\n", + "from peft import LoraConfig" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Yb5PfLChPL6f" + }, + "source": [ + "#### ***BitsAndBytes Configuration***\n" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "### bitsandbytes parameters ###\n", + "\n", + "# The bitsandbytes library is a lightweight Python wrapper around CUDA custom functions, particularly designed for 8-bit optimizers,matrix multiplication (LLM.int8()), and 8-bit and 4-bit quantization functions\n", + "\n", + "bnb4bit_compute_dtype = 'float16'\n", + "\n", + "# Quantization type (fp4 or nf4)\n", + "# fp4 : A standard, 4-bit floating-point format that uses a 1-bit sign, a 2-bit exponent, and a 1-bit mantissa.\n", + "# nf4 : Same as fp4 but it is normalized 4-bit and optimized for normally distributed data like the weights in large language model.\n", + "# This makes it more efficient for training and inference of LLM models.\n", + "bnb4bit_quant_type = 'nf4'\n", + "\n", + "use_nested_quant = False" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# fetch the value of bnb4bit_compute_dtype from the torch module.\n", + "\n", + "compute_dtype = getattr(torch,bnb4bit_compute_dtype)\n", + "\n", + "# getattr is a built-in Python function that retrieves an attribute from an object." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "bitsAndbytes_config = BitsAndBytesConfig(load_in_4bit=True,\n", + " bnb_4bit_compute_dtype=compute_dtype,\n", + " bnb_4bit_quant_type=bnb4bit_quant_type,\n", + " bnb_4bit_use_double_quant=False,\n", + " )" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ik_AmgzMPShr" + }, + "source": [ + "#### ***Loading gemma-2-2b model from huggingface***" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [], + "source": [ + "from google.colab import userdata\n", + "\n", + "access_token = userdata.get('HF_TOKEN')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model_name = 'google/gemma-2-2b'\n", + "\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)\n", + "\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "tokenizer.padding_side = 'right'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model = AutoModelForCausalLM.from_pretrained(model_name,\n", + " quantization_config=bitsAndbytes_config,\n", + " device_map='auto', # device_map is where to load the entire model (0:gpu,'auto':whichever available)\n", + " attn_implementation = 'eager', # type of self-attention technique\n", + " token=access_token)\n", + "\n", + "\n", + "# Disables the use of caching during model inference.\n", + "model.config.use_cache = False\n", + "# Caching stores intermediate results to speed up future computations. Turning it off might be necessary if caching leads to high memory consumption\n", + "# or isn't beneficial for our task.\n", + "\n", + "# Sets the degree of tensor parallelism for pretraining.\n", + "model.config.pretraining_tp = 1\n", + "# Tensor parallelism splits the model tensors across multiple devices (e.g., GPUs) to speed up training. A value of 1 means no tensor splitting" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2.2 GB\n" + ] + } + ], + "source": [ + "print(f\"{model.get_memory_footprint()/1e9:,.1f} GB\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# help(AutoModelForCausalLM)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DCKqDhOLs_tZ" + }, + "source": [ + "***Generating before fine-tuning***" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "question = 'there is a table name Employee containing two columns employee_id and salary. Give me only sql query to fetch the highest and lowest salary along with employee id'\n", + "device = 'cuda'\n", + "input_ = tokenizer.encode(question,return_tensors='pt').to(device)\n", + "response = model.generate(input_).to('cuda')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "there is a table name Employee containing two columns employee_id and salary. Give me only sql query to fetch the highest and lowest salary along with employee id.\n", + "\n", + "SELECT MAX(salary) AS max_salary, MIN(salary) AS min_\n" + ] + } + ], + "source": [ + "response = tokenizer.decode(response[0],skip_special_tokens=True)\n", + "print(response)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mdm7c8klZZnw" + }, + "source": [ + "***PEFT***\n", + "\n", + "***Parameter-Efficient Fine-Tuning, is a technique used to adapt pre-trained language models (LLMs) for specific tasks by only training a small subset of the model's parameters. This is a much more efficient and less resource-intensive alternative to traditional fine-tuning, which would update every parameter in a large model.***\n", + "\n", + "***By freezing most of the original model's weights and training a small number of new or existing parameters, PEFT methods achieve comparable performance while saving significant computational power and memory.***\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "H7mqtR45tFfH" + }, + "source": [ + "#### ***Tuning Phase***" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "model" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [], + "source": [ + "Target_modules = ['q_proj','k_proj','v_proj','o_proj']" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [], + "source": [ + "### QLORA hyperparameters ###\n", + "\n", + "lora_learning_rate = 1e-4\n", + "lora_rank = 8\n", + "lora_dropout = 0.2\n", + "lora_alpha = 16 # double of lora rank\n", + "\n", + "# even using QLORA lora config is required because LORA low rank optimization is applied after quantization and alpha should be double the rank" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "peft_config = LoraConfig(r=lora_rank,\n", + " lora_alpha=lora_alpha,\n", + " lora_dropout=lora_dropout, # A regularization technique used during training to prevent overfitting of the small, trainable LoRA matrices.\n", + " bias='none',\n", + " task_type='CAUSAL_LM', # CAUSAL_LM are those model that generates text by predicting the next word (or token) in a sequence based only on the words that have come before it\n", + " target_modules=Target_modules)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IPTZ3jKIKUQ6" + }, + "source": [ + "***Data Preparation***" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "\n", + "splits = {'train': 'data/train-00000-of-00001-36a24700f19484dc.parquet', 'validation': 'data/validation-00000-of-00001-fa01d04c056ac579.parquet'}\n", + "df_train = pd.read_parquet(\"hf://datasets/lamini/spider_text_to_sql/\" + splits[\"train\"])\n", + "df_test = pd.read_parquet(\"hf://datasets/lamini/spider_text_to_sql/\" + splits[\"validation\"])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "df = pd.merge(df_train,df_test,how ='outer')" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "def remove(row):\n", + " return row.split('\\n\\n')[-1].replace('[/INST]','')" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "df['input'] = df['input'].apply(remove)" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "data = []\n", + "for txt,query in zip(df['input'],df['output']):\n", + " template = f\" {txt.split(':')[-1]} , {query}\"\n", + " data.append(template)" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(8034, 2)" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "8034" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(data)" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "# we are only training on 2000 for quick training\n", + "\n", + "data_for_training = data[:2000]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "data_for_training" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "from datasets import Dataset\n", + "import pandas as pd\n", + "\n", + "pd_data = pd.DataFrame(data_for_training,columns=['text'])\n", + "hf_dataset = Dataset.from_pandas(pd_data)" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Dataset({\n", + " features: ['text'],\n", + " num_rows: 2000\n", + "})" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "hf_dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-z1aVME3KPPU" + }, + "source": [ + "***Training Phase***" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "### training configuration ###\n", + "\n", + "output_dir = \"./results\"\n", + "\n", + "# Number of training epochs\n", + "num_train_epochs = 1\n", + "\n", + "# Batch size per GPU for training\n", + "train_batch_size_perGPU = 1\n", + "\n", + "# Batch size per GPU for evaluation\n", + "eval_batch_size_perGPU = 1\n", + "\n", + "# Number of update steps to accumulate the gradients for if our setup can manage it, keeping it simple with 1 works fine\n", + "gradient_accumulation_steps = 1\n", + "\n", + "# Enable gradient checkpointing\n", + "gradient_checkpointing = True\n", + "\n", + "# Maximum gradient normal (gradient clipping)\n", + "max_grad_norm = 0.3\n", + "\n", + "# Optimizer to use\n", + "optimizer_ = \"paged_adamw_32bit\"\n", + "\n", + "# learning rate (AdamW optimizer), lower learning rates tend to provide more stable and gradual learning.\n", + "learning_rate = 2e-4\n", + "\n", + "# Weight decay to apply to all layers except bias/LayerNorm weights\n", + "weight_decay = 0.001\n", + "\n", + "# Learning rate schedule\n", + "lr_scheduler_type = \"cosine\"\n", + "\n", + "# Number of training steps (overrides num_train_epochs)\n", + "max_steps = -1\n", + "\n", + "# Ratio of steps for a linear warmup (from 0 to learning rate) (optional)\n", + "warmup_ratio = 0.03" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "training_args = TrainingArguments(output_dir=output_dir,\n", + " num_train_epochs=num_train_epochs,\n", + " per_device_train_batch_size=train_batch_size_perGPU,\n", + " per_device_eval_batch_size=eval_batch_size_perGPU,\n", + " gradient_accumulation_steps=gradient_accumulation_steps,\n", + " optim=optimizer_,\n", + " save_steps=0,\n", + " logging_steps=25,\n", + " learning_rate=learning_rate,\n", + " weight_decay=weight_decay,\n", + " fp16=False,\n", + " bf16=True,\n", + " max_grad_norm=max_grad_norm,\n", + " max_steps=max_steps,\n", + " # warmup_ratio=warmup_ratio,\n", + " group_by_length=True, # Group sequences into batches with same length\n", + " lr_scheduler_type=lr_scheduler_type,\n", + " report_to=\"tensorboard\"\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "trainer = SFTTrainer(model=model,\n", + " args=training_args,\n", + " peft_config=peft_config,\n", + " train_dataset=hf_dataset,\n", + " processing_class=tokenizer,\n", + " )" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "torch.cuda.empty_cache()" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "148" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import gc\n", + "\n", + "gc.collect()" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'pad_token_id': 1}.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " [2000/2000 24:59, Epoch 1/1]\n", + "
\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
StepTraining Loss
251.581900
501.224500
750.967200
1001.154800
1250.941500
1501.078100
1750.981500
2001.025800
2250.897000
2501.017000
2750.802200
3000.902100
3250.936700
3501.037900
3750.816500
4001.083700
4250.743400
4500.984600
4750.811000
5000.966900
5250.802600
5501.042400
5750.708000
6000.915200
6250.666700
6500.913100
6750.663700
7000.863700
7250.641200
7500.883200
7750.694100
8000.772600
8250.588200
8500.949100
8750.674700
9000.875200
9250.635000
9500.783500
9750.659000
10000.886500
10250.688900
10500.840300
10750.745900
11000.714500
11250.716300
11500.885100
11750.694600
12000.854000
12250.749700
12500.850900
12750.694900
13000.824100
13250.651900
13500.749800
13750.611700
14000.906700
14250.509800
14500.784100
14750.634900
15000.904100
15250.637300
15500.861400
15750.621300
16000.782500
16250.556800
16500.869100
16750.620900
17000.778200
17250.521900
17500.923000
17750.616000
18000.840300
18250.520800
18500.806700
18750.694200
19000.875100
19250.623100
19500.775000
19750.632900
20000.787400

" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "TrainOutput(global_step=2000, training_loss=0.8115992393493653, metrics={'train_runtime': 1502.325, 'train_samples_per_second': 1.331, 'train_steps_per_second': 1.331, 'total_flos': 1577734916802048.0, 'train_loss': 0.8115992393493653, 'epoch': 1.0})" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trainer.train()" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "application/javascript": "\n (async () => {\n const url = new URL(await google.colab.kernel.proxyPort(6006, {'cache': true}));\n url.searchParams.set('tensorboardColab', 'true');\n const iframe = document.createElement('iframe');\n iframe.src = url;\n iframe.setAttribute('width', '100%');\n iframe.setAttribute('height', '800');\n iframe.setAttribute('frameborder', 0);\n document.body.appendChild(iframe);\n })();\n ", + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "%load_ext tensorboard\n", + "%tensorboard --logdir results/runs" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [], + "source": [ + "# save model to the local folder\n", + "\n", + "trainer.model.save_pretrained('finetuned_model')" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "del model\n", + "del trainer\n", + "gc.collect()\n", + "gc.collect()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "eI0I71t5Jfxw" + }, + "source": [ + "#### ***Merging Weights of Lora Config with Base model and Pushing to huggingfacehub models***" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from peft import PeftModel\n", + "\n", + "base_model = AutoModelForCausalLM.from_pretrained(\n", + " model_name,\n", + " low_cpu_mem_usage=True,\n", + " return_dict=True,\n", + " torch_dtype=torch.float16,\n", + " device_map='auto',\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [], + "source": [ + "model = PeftModel.from_pretrained(base_model,r'/content/finetuned_model') # This path is only for google colab\n", + "model = model.merge_and_unload()\n", + "\n", + "# reloading tokenizer\n", + "tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)\n", + "tokenizer.pad_token = tokenizer.eos_token\n", + "tokenizer.padding_side = 'right'" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import locale\n", + "\n", + "locale.preferred_encoding = lambda: \"UTF-8\"" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "name = \"shiv-am-04/gemma2-2b-SQL\"\n", + "\n", + "! huggingface-cli login\n", + "\n", + "model.push_to_hub(name, check_pr=True)\n", + "\n", + "tokenizer.push_to_hub(name,check_pr=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11.4" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}