Skip to content

NLCR/PoetGuesser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PoetGuesser

Python library tailored for authorship recognition of Czech poetry. Notebook test.ipynb contains a tutorial showcase targeting part of the text of "Cid v zrcadle španělškých romancí" by Jaroslav Vrchlický (testdoc.txt)

How it works

Recognition is based on a feature set combining delexicalized linguistic features (frequencies of delexicalized tokens, token bigrams, or token trigrams) and versification features (frequencies of rhythmical bitstrings or rhythmical trigrams). Both linguistic and versification analysis are provided on-the-fly by UDPipe API and Ingram API, respectively.

Key features

  • Built-in SVC model, but any kind of sklearn model may be used
  • Pretrained models for a number of Czech poets and most frequent poetic meters (given that poetic meters are of immense effect on vocabulary, recognition is always based on texts written in a single meter)

Roles

  • Petr Plecháč - main developer

Dedication

National Library of the Czech Republic. Realized with the support of institutional research of National Library of the Czech Republic funded by Ministry of Culture of the Czech Republic as part of the framework of Longterm conception developement of scientific organization, DKRVO, 9: Digital Humanities. Trained on data from the Institute for Czech Literature of the Czech Academy of Sciences.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •