You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 19, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: docs/supervised-tutorial.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@ id: supervised-tutorial
3
3
title: Text classification
4
4
---
5
5
6
-
Text classification is a core problem to many applications, like spam detection, sentiment analysis or smart replies. In this tutorial, we describe how to build a text classifier with the fastText tool.
6
+
Text classification is a core problem to many applications, like spam detection, sentiment analysis or smart replies. In this tutorial, we describe how to build a text classifier with the fastText tool.
7
7
8
8
## What is text classification?
9
9
@@ -18,14 +18,14 @@ The first step of this tutorial is to install and build fastText. It only requir
18
18
Let us start by downloading the [most recent release](https://github.com/facebookresearch/fastText/releases):
The `-input` command line option indicates the file containing the training examples, while the `-output` option indicates where to save the model. At the end of training, a file `model_cooking.bin`, containing the trained classifier, is created in the current directory.
@@ -155,7 +155,7 @@ Looking at the data, we observe that some words contain uppercase letter or punc
>> ./fasttext test model_cooking.bin cooking.valid
170
+
>> ./fasttext test model_cooking.bin cooking.valid
171
171
N 3000
172
172
P@1 0.164
173
173
R@1 0.0717
@@ -181,7 +181,7 @@ We observe that thanks to the pre-processing, the vocabulary is smaller (from 14
181
181
By default, fastText sees each training example only five times during training, which is pretty small, given that our training set only have 12k training examples. The number of times each examples is seen (also known as the number of epochs), can be increased using the `-epoch` option:
>> ./fasttext test model_cooking.bin cooking.valid
247
247
N 3000
@@ -261,14 +261,14 @@ With a few steps, we were able to go from a precision at one of 12.4% to 59.9%.
261
261
262
262
A 'unigram' refers to a single undividing unit, or token, usually used as an input to a model. For example a unigram can be a word or a letter depending on the model. In fastText, we work at the word level and thus unigrams are words.
263
263
264
-
Similarly we denote by 'bigram' the concatenation of 2 consecutive tokens or words. Similarly we often talk about n-gram to refer to the concatenation any n consecutive tokens.
264
+
Similarly we denote by 'bigram' the concatenation of 2 consecutive tokens or words. Similarly we often talk about n-gram to refer to the concatenation any n consecutive tokens.
265
265
266
266
For example, in the sentence, 'Last donut of the night', the unigrams are 'last', 'donut', 'of', 'the' and 'night'. The bigrams are: 'Last donut', 'donut of', 'of the' and 'the night'.
267
267
268
-
Bigrams are particularly interesting because, for most sentences, you can reconstruct the order of the words just by looking at a bag of n-grams.
268
+
Bigrams are particularly interesting because, for most sentences, you can reconstruct the order of the words just by looking at a bag of n-grams.
269
269
270
270
Let us illustrate this by a simple exercise, given the following bigrams, try to reconstruct the original sentence: 'all out', 'I am', 'of bubblegum', 'out of' and 'am all'.
271
-
It is common to refer to a word as a unigram.
271
+
It is common to refer to a word as a unigram.
272
272
273
273
## Scaling things up
274
274
@@ -279,7 +279,7 @@ Since we are training our model on a few thousands of examples, the training onl
0 commit comments