-
Notifications
You must be signed in to change notification settings - Fork 372
Gemma2 Finetuning with LORA #264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Summary of ChangesHello @Shiv-am-04, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new Jupyter notebook that provides a comprehensive guide to fine-tuning the Gemma-2-2b large language model using the Parameter-Efficient Fine-Tuning (PEFT) technique, specifically LORA. The primary goal is to enable efficient model adaptation for a text-to-SQL task within environments that have limited computational resources, leveraging 4-bit quantization for optimal performance and reduced memory usage. The notebook covers the entire workflow from model loading and configuration to dataset preparation, training, and final deployment to the HuggingFace Hub. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces a new Jupyter notebook for fine-tuning the Gemma2 model using the LORA technique. The notebook covers model loading, data preparation, fine-tuning, and pushing the model to Hugging Face Hub. The overall structure is clear, but there are several areas where code clarity, efficiency, and best practices could be improved, particularly regarding variable naming, comment accuracy, and configurability. A significant issue is the use of an interactive login command, which can interrupt automated workflows.
| "source": [ | ||
| "name = \"shiv-am-04/gemma2-2b-SQL\"\n", | ||
| "\n", | ||
| "! huggingface-cli login\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ! huggingface-cli login command is interactive and will prompt the user for input, which can interrupt automated execution of the notebook. It's recommended to use huggingface_hub.login(token=access_token) instead, as the access_token is already retrieved from userdata (line 186) and can be passed directly, enabling non-interactive login.
| "outputs": [], | ||
| "source": [ | ||
| "### training configuration ###\n", | ||
| "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "train_batch_size_perGPU = 1\n", | ||
| "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A per_device_train_batch_size of 1 is very small and can lead to inefficient GPU utilization and slower convergence. While it helps with memory constraints, it's often not the most efficient for training speed. Consider increasing this value if hardware resources allow, possibly in conjunction with gradient_accumulation_steps.
| "model = AutoModelForCausalLM.from_pretrained(model_name,\n", | ||
| " quantization_config=bitsAndbytes_config,\n", | ||
| " device_map='auto', # device_map is where to load the entire model (0:gpu,'auto':whichever available)\n", | ||
| " attn_implementation = 'eager', # type of self-attention technique\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using attn_implementation = 'eager' can be less performant than more optimized implementations like flash_attention or sdpa (if supported by the hardware). For fine-tuning LLMs, optimizing attention can significantly improve training speed and efficiency. Consider exploring more efficient attention mechanisms if performance is a concern.
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "Target_modules = ['q_proj','k_proj','v_proj','o_proj']" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "\n", | ||
| "# learning rate (AdamW optimizer), lower learning rates tend to provide more stable and gradual learning.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| " bf16=True,\n", | ||
| " max_grad_norm=max_grad_norm,\n", | ||
| " max_steps=max_steps,\n", | ||
| " # warmup_ratio=warmup_ratio,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The warmup_ratio variable is defined earlier (line 574) but is commented out when passed to TrainingArguments. This creates an unused variable and an inconsistency. Either remove the warmup_ratio definition if it's not intended to be used, or uncomment the line in TrainingArguments to apply it during training.
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "model = PeftModel.from_pretrained(base_model,r'/content/finetuned_model') # This path is only for google colab\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path r'/content/finetuned_model' is hardcoded and specific to the Colab environment. This reduces the portability of the notebook. It would be better to use a relative path or derive it from the output_dir variable defined earlier to make the notebook more adaptable to different execution environments.
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "name = \"shiv-am-04/gemma2-2b-SQL\"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "\n", | ||
| "# The bitsandbytes library is a lightweight Python wrapper around CUDA custom functions, particularly designed for 8-bit optimizers,matrix multiplication (LLM.int8()), and 8-bit and 4-bit quantization functions\n", | ||
| "\n", | ||
| "bnb4bit_compute_dtype = 'float16'\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment here describes fp4 as a quantization type, but bnb4bit_compute_dtype is set to 'float16', which is a compute data type. This creates a discrepancy between the variable's value and the explanatory comment. Please clarify or correct the comment to accurately reflect float16 as a compute dtype.
No description provided.