Skip to content

Commit eab8271

Browse files
committed
cleaned up readme
1 parent 6c8454a commit eab8271

File tree

2 files changed

+352
-0
lines changed

2 files changed

+352
-0
lines changed

20-mistral/README.md

Lines changed: 348 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,348 @@
1+
# Building with Mistral Models
2+
3+
## Introduction
4+
5+
This lesson will cover:
6+
- Exploring the different Mistral Models
7+
- Understanding the use-cases and scenarios for each model
8+
- Code samples show the unique features of each model.
9+
10+
## The Mistral Models
11+
12+
In this lesson, we will explore 3 different Mistral models:
13+
**Mistral Large**, **Mistral Small** and **Mistral Nemo**.
14+
15+
Each of these models are available free on the Github Model marketplace. The code in this notebook will be using this models to run the code. Here are more details on using Github Models to [prototype with AI models](https://docs.github.com/en/github-models/prototyping-with-ai-models?WT.mc_id=academic-105485-koreyst).
16+
17+
18+
## Mistral Large 2 (2407)
19+
Mistral Large 2 is currently the flagship model from Mistral and is designed for enterprise use.
20+
21+
The model is an upgrade to the original Mistral Large by offering
22+
- Larger Context Window - 128k vs 32k
23+
- Better performance on Math and Coding Tasks - 76.9% average accuracy vs 60.4%
24+
- Increased multilingual performance - languages include: English, French, German, Spanish, Italian, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, and Hindi.
25+
26+
With these features, Mistral Large excels at
27+
- *Retrieval Augmented Generation (RAG)* - due to the larger context window
28+
- *Function Calling* - this model has native function calling which allows integration with external tools and APIs. These calls can be made both in parallel or one after another in a sequential order.
29+
- *Code Generation* - this model excels on Python, Java, TypeScript and C++ generation.
30+
31+
### RAG Example using Mistral Large 2
32+
33+
In this example, we are using Mistral Large 2 to run a RAG pattern over a text document. The question is written in Korean and asks about the author's activities before college.
34+
35+
It uses Cohere Embeddings Model to create embeddings of the text document as well as the question. For this sample, it uses the faiss Python package as a vector store.
36+
37+
The prompt sent to the Mistral model includes both the questions and the retrieved chunks that are similar to the question. The Model then provides a natural language response.
38+
39+
```python
40+
pip install faiss-cpu
41+
```
42+
43+
```python
44+
import requests
45+
import numpy as np
46+
import faiss
47+
import os
48+
49+
from azure.ai.inference import ChatCompletionsClient
50+
from azure.ai.inference.models import SystemMessage, UserMessage
51+
from azure.core.credentials import AzureKeyCredential
52+
from azure.ai.inference import EmbeddingsClient
53+
54+
endpoint = "https://models.inference.ai.azure.com"
55+
model_name = "Mistral-large"
56+
token = os.environ["GITHUB_TOKEN"]
57+
58+
client = ChatCompletionsClient(
59+
endpoint=endpoint,
60+
credential=AzureKeyCredential(token),
61+
)
62+
63+
response = requests.get('https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt')
64+
text = response.text
65+
66+
chunk_size = 2048
67+
chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
68+
len(chunks)
69+
70+
embed_model_name = "cohere-embed-v3-multilingual"
71+
72+
embed_client = EmbeddingsClient(
73+
endpoint=endpoint,
74+
credential=AzureKeyCredential(token)
75+
)
76+
77+
embed_response = embed_client.embed(
78+
input=chunks,
79+
model=embed_model_name
80+
)
81+
82+
83+
84+
text_embeddings = []
85+
for item in embed_response.data:
86+
length = len(item.embedding)
87+
text_embeddings.append(item.embedding)
88+
text_embeddings = np.array(text_embeddings)
89+
90+
91+
d = text_embeddings.shape[1]
92+
index = faiss.IndexFlatL2(d)
93+
index.add(text_embeddings)
94+
95+
question = "저자가 대학에 오기 전에 주로 했던 두 가지 일은 무엇이었나요??"
96+
97+
question_embedding = embed_client.embed(
98+
input=[question],
99+
model=embed_model_name
100+
)
101+
102+
question_embeddings = np.array(question_embedding.data[0].embedding)
103+
104+
105+
D, I = index.search(question_embeddings.reshape(1, -1), k=2) # distance, index
106+
retrieved_chunks = [chunks[i] for i in I.tolist()[0]]
107+
108+
prompt = f"""
109+
Context information is below.
110+
---------------------
111+
{retrieved_chunks}
112+
---------------------
113+
Given the context information and not prior knowledge, answer the query.
114+
Query: {question}
115+
Answer:
116+
"""
117+
118+
119+
chat_response = client.complete(
120+
messages=[
121+
SystemMessage(content="You are a helpful assistant."),
122+
UserMessage(content=prompt),
123+
],
124+
temperature=1.0,
125+
top_p=1.0,
126+
max_tokens=1000,
127+
model=model_name
128+
)
129+
130+
print(chat_response.choices[0].message.content)
131+
```
132+
133+
## Mistral Small
134+
Mistral Small is another model in the Mistral family of models under the premier/enterprise category. As the name implies, this model is a Small Language Model (SLM). The advantages of using Mistral Small are that it is:
135+
- Cost Saving compared to Mistral LLMs like Mistral Large and NeMo - 80% price drop
136+
- Low latency - faster response compared to Mistral's LLMs
137+
- Flexible - can be deployed across different environments with less restrictions on required resources.
138+
139+
140+
Mistral Small is great for:
141+
- Text based tasks such as summarization, sentiment analysis and translation.
142+
- Applications where frequent requests are made due to its cost effectiveness
143+
- Low latency code tasks like review and code suggestions
144+
145+
## Comparing Mistral Small and Mistral Large
146+
147+
To show differences in latency between Mistral Small and Large, run the below cells.
148+
149+
You should see a difference in response times between 3-5 seconds. Also not the response lengths and style over the smae prompt.
150+
151+
```python
152+
153+
import os
154+
endpoint = "https://models.inference.ai.azure.com"
155+
model_name = "Mistral-small"
156+
token = os.environ["GITHUB_TOKEN"]
157+
158+
client = ChatCompletionsClient(
159+
endpoint=endpoint,
160+
credential=AzureKeyCredential(token),
161+
)
162+
163+
response = client.complete(
164+
messages=[
165+
SystemMessage(content="You are a helpful coding assistant."),
166+
UserMessage(content="Can you write a Python function to the fizz buzz test?"),
167+
],
168+
temperature=1.0,
169+
top_p=1.0,
170+
max_tokens=1000,
171+
model=model_name
172+
)
173+
174+
print(response.choices[0].message.content)
175+
176+
```
177+
178+
```python
179+
180+
import os
181+
from azure.ai.inference import ChatCompletionsClient
182+
from azure.ai.inference.models import SystemMessage, UserMessage
183+
from azure.core.credentials import AzureKeyCredential
184+
185+
endpoint = "https://models.inference.ai.azure.com"
186+
model_name = "Mistral-large"
187+
token = os.environ["GITHUB_TOKEN"]
188+
189+
client = ChatCompletionsClient(
190+
endpoint=endpoint,
191+
credential=AzureKeyCredential(token),
192+
)
193+
194+
response = client.complete(
195+
messages=[
196+
SystemMessage(content="You are a helpful coding assistant."),
197+
UserMessage(content="Can you write a Python function to the fizz buzz test?"),
198+
],
199+
temperature=1.0,
200+
top_p=1.0,
201+
max_tokens=1000,
202+
model=model_name
203+
)
204+
205+
print(response.choices[0].message.content)
206+
207+
```
208+
209+
## Mistral NeMo
210+
211+
Compared to the other two models discussed in this lesson, Mistral NeMo is the only free model with an Apache2 License.
212+
213+
It is viewed as an upgrade to the earlier open source LLM from Mistral, Mistral 7B.
214+
215+
Some other feature of the NeMo model are:
216+
217+
- *More efficient tokenization:* This model using the Tekken tokenizer over the more commonly used tiktoken. This allows for better performance over more languages and code.
218+
219+
- *Finetuning:* The base model is available for finetuning. This allows for more flexibility for use-cases where finetuning may be needed.
220+
221+
- *Native Function Calling* - Like Mistral Large, this model has been trained on function calling. This makes it unique as being one of the first open source models to do so.
222+
223+
224+
### Comparing Tokenizers
225+
226+
In this sample, we will look at how Mistral NeMo handles tokenization compared to Mistral Large.
227+
228+
Both samples take the same prompt but you shoud see that NeMo returns back less tokens vs Mistral Large.
229+
230+
```bash
231+
pip install mistral-common
232+
```
233+
234+
```python
235+
# Import needed packages:
236+
from mistral_common.protocol.instruct.messages import (
237+
UserMessage,
238+
)
239+
from mistral_common.protocol.instruct.request import ChatCompletionRequest
240+
from mistral_common.protocol.instruct.tool_calls import (
241+
Function,
242+
Tool,
243+
)
244+
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
245+
246+
# Load Mistral tokenizer
247+
248+
model_name = "open-mistral-nemo "
249+
250+
tokenizer = MistralTokenizer.from_model(model_name)
251+
252+
# Tokenize a list of messages
253+
tokenized = tokenizer.encode_chat_completion(
254+
ChatCompletionRequest(
255+
tools=[
256+
Tool(
257+
function=Function(
258+
name="get_current_weather",
259+
description="Get the current weather",
260+
parameters={
261+
"type": "object",
262+
"properties": {
263+
"location": {
264+
"type": "string",
265+
"description": "The city and state, e.g. San Francisco, CA",
266+
},
267+
"format": {
268+
"type": "string",
269+
"enum": ["celsius", "fahrenheit"],
270+
"description": "The temperature unit to use. Infer this from the users location.",
271+
},
272+
},
273+
"required": ["location", "format"],
274+
},
275+
)
276+
)
277+
],
278+
messages=[
279+
UserMessage(content="What's the weather like today in Paris"),
280+
],
281+
model=model_name,
282+
)
283+
)
284+
tokens, text = tokenized.tokens, tokenized.text
285+
286+
# Count the number of tokens
287+
print(len(tokens))
288+
```
289+
290+
```python
291+
# Import needed packages:
292+
from mistral_common.protocol.instruct.messages import (
293+
UserMessage,
294+
)
295+
from mistral_common.protocol.instruct.request import ChatCompletionRequest
296+
from mistral_common.protocol.instruct.tool_calls import (
297+
Function,
298+
Tool,
299+
)
300+
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
301+
302+
# Load Mistral tokenizer
303+
304+
model_name = "mistral-large-latest"
305+
306+
tokenizer = MistralTokenizer.from_model(model_name)
307+
308+
# Tokenize a list of messages
309+
tokenized = tokenizer.encode_chat_completion(
310+
ChatCompletionRequest(
311+
tools=[
312+
Tool(
313+
function=Function(
314+
name="get_current_weather",
315+
description="Get the current weather",
316+
parameters={
317+
"type": "object",
318+
"properties": {
319+
"location": {
320+
"type": "string",
321+
"description": "The city and state, e.g. San Francisco, CA",
322+
},
323+
"format": {
324+
"type": "string",
325+
"enum": ["celsius", "fahrenheit"],
326+
"description": "The temperature unit to use. Infer this from the users location.",
327+
},
328+
},
329+
"required": ["location", "format"],
330+
},
331+
)
332+
)
333+
],
334+
messages=[
335+
UserMessage(content="What's the weather like today in Paris"),
336+
],
337+
model=model_name,
338+
)
339+
)
340+
tokens, text = tokenized.tokens, tokenized.text
341+
342+
# Count the number of tokens
343+
print(len(tokens))
344+
```
345+
346+
## Learning does not stop here, continue the Journey
347+
348+
After completing this lesson, check out our [Generative AI Learning collection](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) to continue leveling up your Generative AI knowledge!

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,11 +82,15 @@ Do you have suggestions or found spelling or code errors? [Raise an issue](https
8282
| 16 | [Open Source Models and Hugging Face](./16-open-source-models/README.md?WT.mc_id=academic-105485-koreyst) | **Build:** An application using open source models available on Hugging Face | [Video](https://aka.ms/gen-ai-lesson16-gh?WT.mc_id=academic-105485-koreyst) | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) |
8383
| 17 | [AI Agents](./17-ai-agents/README.md?WT.mc_id=academic-105485-koreyst) | **Build:** An application using an AI Agent Framework | [Video](https://aka.ms/gen-ai-lesson17-gh?WT.mc_id=academic-105485-koreyst) | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) |
8484
| 18 | [Fine-Tuning LLMs](./18-fine-tuning/README.md?WT.mc_id=academic-105485-koreyst) | **Learn:** The what, why and how of fine-tuning LLMs | [Video](https://aka.ms/gen-ai-lesson18-gh?WT.mc_id=academic-105485-koreyst) | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) |
85+
| 18 | [Building with SLMs](./19-slm/README.md?WT.mc_id=academic-105485-koreyst) | **Learn:** The benefits of building with Small Language Models | Video Coming Soon | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) |
86+
| 18 | [Building with Mistral Models](./20-mistral/README.md?WT.mc_id=academic-105485-koreyst) | **Learn:** The features and differences of the Mistral Family Models | Video Coming Soon | [Learn More](https://aka.ms/genai-collection?WT.mc_id=academic-105485-koreyst) |
8587

8688
### 🌟 Special thanks
8789

8890
Special thanks to [**John Aziz**](https://www.linkedin.com/in/john0isaac/) for creating all of the GitHub Actions and workflows
8991

92+
[**Bernhard Merkle**](https://www.linkedin.com/in/bernhard-merkle-738b73/) for making key contributions to each lesson to improve the learner and code experience.
93+
9094
## 🎒 Other Courses
9195

9296
Our team produces other courses! Check out:

0 commit comments

Comments
 (0)