Skip to content
This repository was archived by the owner on May 3, 2023. It is now read-only.

Commit ecf69d1

Browse files
author
Markus Konrad
committed
Merge branch 'develop'
2 parents 81cf22b + dcaaee1 commit ecf69d1

18 files changed

+134
-47
lines changed

.github/workflows/runtests.yml

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,30 +20,32 @@ jobs:
2020
matrix:
2121
os: [ubuntu-latest, macos-latest, windows-latest]
2222
python-version: ["3.8", "3.9", "3.10"]
23+
testsuite: ["minimal", "full"]
2324
steps:
2425
- uses: actions/checkout@v2
2526
- name: set up python ${{ matrix.python-version }}
2627
uses: actions/setup-python@v2
2728
with:
2829
python-version: ${{ matrix.python-version }}
2930
cache: 'pip'
30-
- name: install system dependencies
31-
if: runner.os != 'Windows'
31+
- name: install system dependencies (linux)
32+
if: runner.os == 'Linux'
3233
# only managed to install system dependencies on Linux runners
3334
run: |
34-
if [ "$RUNNER_OS" == "Linux" ]; then
35-
sudo apt update
36-
sudo apt install libgmp-dev libmpfr-dev libmpc-dev
37-
fi
35+
sudo apt update
36+
sudo apt install libgmp-dev libmpfr-dev libmpc-dev
3837
- name: install python dependencies
3938
run: |
4039
python -m pip install --upgrade pip
4140
pip install tox
4241
- name: run tox (linux)
4342
# since system dependencies could only be installed on Linux runners, we run the "full" suite only on Linux ...
4443
if: runner.os == 'Linux'
45-
run: tox -e py-full -- --hypothesis-profile=ci
46-
- name: run tox (macos or windows)
47-
# ... on all other OS we run the "recommendedextra" suite
48-
if: runner.os != 'Linux'
44+
run: tox -e py-${{ matrix.testsuite }} -- --hypothesis-profile=ci
45+
- name: run tox (macos or windows - minimal)
46+
if: runner.os != 'Linux' && matrix.testsuite == 'minimal'
47+
run: tox -e py-minimal -- --hypothesis-profile=ci
48+
- name: run tox (macos or windows - recommendedextra)
49+
# ... on all other OS we run the "recommendedextra" suite instead of the "full" suite
50+
if: runner.os != 'Linux' && matrix.testsuite == 'full'
4951
run: tox -e py-recommendedextra -- --hypothesis-profile=ci

README.rst

Lines changed: 32 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,37 @@ The documentation for tmtoolkit is available on `tmtoolkit.readthedocs.org <http
1414
the GitHub code repository is on
1515
`github.com/WZBSocialScienceCenter/tmtoolkit <https://github.com/WZBSocialScienceCenter/tmtoolkit>`_.
1616

17-
.. note:: Since Feb 8 2022, the newest version 0.11.0 of tmtoolkit is available on PyPI. This version features a new API
18-
for text processing and mining which is incompatible with prior versions. It's advisable to first read the
19-
first three chapters of the `tutorial <https://tmtoolkit.readthedocs.io/en/latest/getting_started.html>`_
20-
to get used to the new API. You should also re-install tmtoolkit in a new virtual environment or completely
21-
remove the old version prior to upgrading. See the
22-
`installation instructions <https://tmtoolkit.readthedocs.io/en/latest/install.html>`_.
17+
**Upgrade note:**
18+
19+
Since Feb 8 2022, the newest version 0.11.0 of tmtoolkit is available on PyPI. This version features a new API
20+
for text processing and mining which is incompatible with prior versions. It's advisable to first read the
21+
first three chapters of the `tutorial <https://tmtoolkit.readthedocs.io/en/latest/getting_started.html>`_
22+
to get used to the new API. You should also re-install tmtoolkit in a new virtual environment or completely
23+
remove the old version prior to upgrading. See the
24+
`installation instructions <https://tmtoolkit.readthedocs.io/en/latest/install.html>`_.
25+
26+
Requirements and installation
27+
-----------------------------
28+
29+
**tmtoolkit works with Python 3.8 or newer (tested up to Python 3.10).**
30+
31+
The tmtoolkit package is highly modular and tries to install as few dependencies as possible. For requirements and
32+
installation procedures, please have a look at the
33+
`installation section in the documentation <https://tmtoolkit.readthedocs.io/en/latest/install.html>`_. For short,
34+
the recommended way of installing tmtoolkit is to create and activate a
35+
`Python Virtual Environment ("venv") <https://docs.python.org/3/tutorial/venv.html>`_ and then install tmtoolkit with
36+
a recommended set of dependencies and a list of language models via the following:
37+
38+
.. code-block:: text
39+
40+
pip install -U "tmtoolkit[recommended]"
41+
# add or remove language codes in the list for installing the models that you need;
42+
# don't use spaces in the list of languages
43+
python -m tmtoolkit setup en,de
44+
45+
Again, you should have a look at the detailed
46+
`installation instructions <https://tmtoolkit.readthedocs.io/en/latest/install.html>`_ in order to install additional
47+
packages that enable more features such as topic modeling.
2348

2449
Features
2550
--------
@@ -93,14 +118,8 @@ Limits
93118
* all data must reside in memory, i.e. no streaming of large data from the hard disk (which for example
94119
`Gensim <https://radimrehurek.com/gensim/>`_ supports)
95120

96-
Requirements and installation
97-
==============================
98-
99-
For requirements and installation procedures, please have a look at the
100-
`installation section in the documentation <https://tmtoolkit.readthedocs.io/en/latest/install.html>`_.
101-
102121
License
103-
=======
122+
-------
104123

105124
Code licensed under `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0>`_.
106125
See `LICENSE <https://github.com/WZBSocialScienceCenter/tmtoolkit/blob/master/LICENSE>`_ file.

conftest.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,5 @@
77
from hypothesis import settings, HealthCheck
88

99
# profile for CI runs on GitHub machines, which may be slow from time to time so we disable the "too slow" HealthCheck
10-
settings.register_profile('ci', suppress_health_check=(HealthCheck.too_slow, ))
10+
# and set the timeout deadline very high (60 sec.)
11+
settings.register_profile('ci', suppress_health_check=(HealthCheck.too_slow, ), deadline=60000)

doc/source/install.rst

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -68,21 +68,22 @@ on the preferred package for topic modeling:
6868
# you may also select several topic modeling packages
6969
pip install -U "tmtoolkit[recommended,lda,sklearn,gensim]"
7070
71-
The minimal installation will only install a base set of dependencies and will only enable the modules for BoW
71+
The **minimal** installation will only install a base set of dependencies and will only enable the modules for BoW
7272
statistics, token sequence operations, topic modeling and utility functions. You can install it as follows:
7373

7474
.. code-block:: text
7575
76+
# alternative installation if you only want to install a minimum set of dependencies
7677
pip install -U tmtoolkit
7778
78-
.. note::
79-
The tmtoolkit package is about 7MB big, because it contains some example corpora.
79+
.. note:: The tmtoolkit package is about 7MB big, because it contains some example corpora.
8080

81-
After that, you should initially run tmtoolkit's setup routine. This makes sure that all required data files are
81+
**After that, you should initially run tmtoolkit's setup routine.** This makes sure that all required data files are
8282
present and downloads them if necessary. You should specify a list of languages for which language models should be
8383
downloaded and installed. The list of available language models corresponds with the models provided by
8484
`SpaCy <https://spacy.io/usage/models#languages>`_ (except for "multi-language"). You need to specify the two-letter ISO
85-
language code for the language models that you want to install. E.g. in order to install models for English and German:
85+
language code for the language models that you want to install. **Don't use spaces in the list of languages.**
86+
E.g. in order to install models for English and German:
8687

8788
.. code-block:: text
8889

doc/source/version_history.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,11 @@
33
Version history
44
===============
55

6+
0.11.1 - 2022-02-10
7+
-------------------
8+
9+
- show better error messages when dependencies for optional module ``corpus`` are not met
10+
- fix a SciPy deprecation warning
611

712
0.11.0 - 2022-02-08
813
-------------------

examples/README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
11
# Examples
22

3-
This folder contains very few examples for *tmtoolkit*. The majority of examples is available as Jupyter Notebooks as part of the [documentation](https://tmtoolkit.readthedocs.io/). You may download these notebooks from the [documentation source](https://github.com/WZBSocialScienceCenter/tmtoolkit/tree/master/doc/source) and run them on your computer.
3+
This folder contains very few examples for *tmtoolkit*. The majority of examples is available as Jupyter Notebooks as
4+
part of the [documentation](https://tmtoolkit.readthedocs.io/). You may download these notebooks from
5+
the [documentation source](https://github.com/WZBSocialScienceCenter/tmtoolkit/tree/master/doc/source) and run them
6+
on your computer.

examples/benchmark_en_newsarticles.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,14 @@
11
"""
22
Benchmarking script that loads and processes English language test corpus with Corpus in parallel.
33
4+
This examples requires that you have installed tmtoolkit with the recommended set of packages and have installed an
5+
English language model for spaCy:
6+
7+
pip install -U "tmtoolkit[recommended]"
8+
python -m tmtoolkit setup en
9+
10+
For more information, see the installation instructions: https://tmtoolkit.readthedocs.io/en/latest/install.html
11+
412
To benchmark whole script with `time` from command line run:
513
614
PYTHONPATH=.. /usr/bin/time -v python benchmark_en_newsarticles.py [NUMBER OF WORKERS]

examples/bundestag18_tfidf.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,14 @@
99
1010
The data for the debates comes from offenesparlament.de, see https://github.com/Datenschule/offenesparlament-data.
1111
12+
This examples requires that you have installed tmtoolkit with the recommended set of packages and have installed a
13+
German language model for spaCy:
14+
15+
pip install -U "tmtoolkit[recommended]"
16+
python -m tmtoolkit setup de
17+
18+
For more information, see the installation instructions: https://tmtoolkit.readthedocs.io/en/latest/install.html
19+
1220
Markus Konrad <[email protected]>
1321
June 2019 / Feb. 2022
1422
"""

examples/gensim_evaluation.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,17 @@
11
"""
22
An example for topic modeling evaluation with gensim.
33
4-
Please note that this is just an example for showing how to perform Topic Model evaluation with Gensim. The
4+
Please note that this is just an example for showing how to perform topic model evaluation with Gensim. The
55
preprocessing of the data is just done quickly and probably not the best way for the given data.
66
7-
**Important note for Windows users:**
8-
You need to wrap all of the following code in a `if __name__ == '__main__'` block (just as in `lda_evaluation.py`).
7+
This examples requires that you have installed tmtoolkit with the recommended set of packages plus Gensim and have
8+
installed a German language model for spaCy:
9+
10+
pip install -U "tmtoolkit[recommended,gensim]"
11+
python -m tmtoolkit setup de
12+
13+
For more information, see the installation instructions: https://tmtoolkit.readthedocs.io/en/latest/install.html
14+
915
"""
1016

1117

examples/topicmod_lda.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,14 @@
22
An example for topic modeling with LDA with focus on the new plotting functions in `tmtoolkit.corpus.visualize` and
33
in `tmtoolkit.topicmod.visualize`.
44
5+
This examples requires that you have installed tmtoolkit with the recommended set of packages plus "lda" and have
6+
installed an English language model for spaCy:
7+
8+
pip install -U "tmtoolkit[recommended,lda]"
9+
python -m tmtoolkit setup en
10+
11+
For more information, see the installation instructions: https://tmtoolkit.readthedocs.io/en/latest/install.html
12+
513
.. codeauthor:: Markus Konrad <[email protected]>
614
"""
715

0 commit comments

Comments
 (0)