Skip to content
This repository was archived by the owner on Mar 19, 2024. It is now read-only.

Commit b5b7d30

Browse files
Celebiofacebook-github-bot
authored andcommitted
release 0.9.1
Summary: Documentation update with the release 0.9.1 Reviewed By: EdouardGrave Differential Revision: D16120112 fbshipit-source-id: 55373d02b202bd35368a8307a1c904bcae3d739a
1 parent 979d8a9 commit b5b7d30

File tree

3 files changed

+18
-18
lines changed

3 files changed

+18
-18
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -89,9 +89,9 @@ There is also the master branch that contains all of our most recent work, but c
8989
### Building fastText using make (preferred)
9090

9191
```
92-
$ wget https://github.com/facebookresearch/fastText/archive/v0.2.0.zip
93-
$ unzip v0.2.0.zip
94-
$ cd fastText-0.2.0
92+
$ wget https://github.com/facebookresearch/fastText/archive/v0.9.1.zip
93+
$ unzip v0.9.1.zip
94+
$ cd fastText-0.9.1
9595
$ make
9696
```
9797

docs/supervised-tutorial.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ id: supervised-tutorial
33
title: Text classification
44
---
55

6-
Text classification is a core problem to many applications, like spam detection, sentiment analysis or smart replies. In this tutorial, we describe how to build a text classifier with the fastText tool.
6+
Text classification is a core problem to many applications, like spam detection, sentiment analysis or smart replies. In this tutorial, we describe how to build a text classifier with the fastText tool.
77

88
## What is text classification?
99

@@ -18,14 +18,14 @@ The first step of this tutorial is to install and build fastText. It only requir
1818
Let us start by downloading the [most recent release](https://github.com/facebookresearch/fastText/releases):
1919

2020
```bash
21-
$ wget https://github.com/facebookresearch/fastText/archive/v0.2.0.zip
22-
$ unzip v0.2.0.zip
21+
$ wget https://github.com/facebookresearch/fastText/archive/v0.9.1.zip
22+
$ unzip v0.9.1.zip
2323
```
2424

2525
Move to the fastText directory and build it:
2626

2727
```bash
28-
$ cd fastText-0.2.0
28+
$ cd fastText-0.9.1
2929
$ make
3030
```
3131

@@ -87,7 +87,7 @@ We are now ready to train our first classifier:
8787
Read 0M words
8888
Number of words: 14598
8989
Number of labels: 734
90-
Progress: 100.0% words/sec/thread: 75109 lr: 0.000000 loss: 5.708354 eta: 0h0m
90+
Progress: 100.0% words/sec/thread: 75109 lr: 0.000000 loss: 5.708354 eta: 0h0m
9191
```
9292

9393
The `-input` command line option indicates the file containing the training examples, while the `-output` option indicates where to save the model. At the end of training, a file `model_cooking.bin`, containing the trained classifier, is created in the current directory.
@@ -155,7 +155,7 @@ Looking at the data, we observe that some words contain uppercase letter or punc
155155
```bash
156156
>> cat cooking.stackexchange.txt | sed -e "s/\([.\!?,'/()]\)/ \1 /g" | tr "[:upper:]" "[:lower:]" > cooking.preprocessed.txt
157157
>> head -n 12404 cooking.preprocessed.txt > cooking.train
158-
>> tail -n 3000 cooking.preprocessed.txt > cooking.valid
158+
>> tail -n 3000 cooking.preprocessed.txt > cooking.valid
159159
```
160160

161161
Let's train a new model on the pre-processed data:
@@ -165,9 +165,9 @@ Let's train a new model on the pre-processed data:
165165
Read 0M words
166166
Number of words: 9012
167167
Number of labels: 734
168-
Progress: 100.0% words/sec/thread: 82041 lr: 0.000000 loss: 5.671649 eta: 0h0m h-14m
168+
Progress: 100.0% words/sec/thread: 82041 lr: 0.000000 loss: 5.671649 eta: 0h0m h-14m
169169

170-
>> ./fasttext test model_cooking.bin cooking.valid
170+
>> ./fasttext test model_cooking.bin cooking.valid
171171
N 3000
172172
P@1 0.164
173173
R@1 0.0717
@@ -181,7 +181,7 @@ We observe that thanks to the pre-processing, the vocabulary is smaller (from 14
181181
By default, fastText sees each training example only five times during training, which is pretty small, given that our training set only have 12k training examples. The number of times each examples is seen (also known as the number of epochs), can be increased using the `-epoch` option:
182182

183183
```bash
184-
>> ./fasttext supervised -input cooking.train -output model_cooking -epoch 25
184+
>> ./fasttext supervised -input cooking.train -output model_cooking -epoch 25
185185
Read 0M words
186186
Number of words: 9012
187187
Number of labels: 734
@@ -241,7 +241,7 @@ Finally, we can improve the performance of a model by using word bigrams, instea
241241
Read 0M words
242242
Number of words: 9012
243243
Number of labels: 734
244-
Progress: 100.0% words/sec/thread: 75366 lr: 0.000000 loss: 3.226064 eta: 0h0m
244+
Progress: 100.0% words/sec/thread: 75366 lr: 0.000000 loss: 3.226064 eta: 0h0m
245245

246246
>> ./fasttext test model_cooking.bin cooking.valid
247247
N 3000
@@ -261,14 +261,14 @@ With a few steps, we were able to go from a precision at one of 12.4% to 59.9%.
261261

262262
A 'unigram' refers to a single undividing unit, or token, usually used as an input to a model. For example a unigram can be a word or a letter depending on the model. In fastText, we work at the word level and thus unigrams are words.
263263

264-
Similarly we denote by 'bigram' the concatenation of 2 consecutive tokens or words. Similarly we often talk about n-gram to refer to the concatenation any n consecutive tokens.
264+
Similarly we denote by 'bigram' the concatenation of 2 consecutive tokens or words. Similarly we often talk about n-gram to refer to the concatenation any n consecutive tokens.
265265

266266
For example, in the sentence, 'Last donut of the night', the unigrams are 'last', 'donut', 'of', 'the' and 'night'. The bigrams are: 'Last donut', 'donut of', 'of the' and 'the night'.
267267

268-
Bigrams are particularly interesting because, for most sentences, you can reconstruct the order of the words just by looking at a bag of n-grams.
268+
Bigrams are particularly interesting because, for most sentences, you can reconstruct the order of the words just by looking at a bag of n-grams.
269269

270270
Let us illustrate this by a simple exercise, given the following bigrams, try to reconstruct the original sentence: 'all out', 'I am', 'of bubblegum', 'out of' and 'am all'.
271-
It is common to refer to a word as a unigram.
271+
It is common to refer to a word as a unigram.
272272

273273
## Scaling things up
274274

@@ -279,7 +279,7 @@ Since we are training our model on a few thousands of examples, the training onl
279279
Read 0M words
280280
Number of words: 9012
281281
Number of labels: 734
282-
Progress: 100.0% words/sec/thread: 2199406 lr: 0.000000 loss: 1.718807 eta: 0h0m
282+
Progress: 100.0% words/sec/thread: 2199406 lr: 0.000000 loss: 1.718807 eta: 0h0m
283283
```
284284

285285
Training should now take less than a second.

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
import platform
2222
import io
2323

24-
__version__ = '0.9'
24+
__version__ = '0.9.1'
2525
FASTTEXT_SRC = "src"
2626

2727
# Based on https://github.com/pybind/python_example

0 commit comments

Comments
 (0)