Public Repository

Last pushed: 2 years ago
Short Description
Annotate text with POS tags and lemma information
Full Description

treetagger.docker

This repository contains docker images to build and ship ready to use TreeTagger instances.

You will not have to manually install TreeTagger in your system again.

Detailed info here.

What it is

A tool for annotating text with part-of-speech - i.e., POS tagging - and lemma information.

Supported languages

17 languages are supported: bulgarian, dutch, english, estonian, finnish, french, galician, german, italian, latin, portuguese, polish, russian, slovak, spanish, swahili, mongolian (only parameter file provided, no scripts).

Some of them have also alternative parameter files.

Tagging

Italian

Suppose you want to (tokenize and) tag an Italian text.

The script to use is tree-tagger-italian.

It expects UTF8 encoded input files as arguments. If no files have been specified, input from stdin is expected.

echo 'Proviamo semplicemente a eseguire un test di prova.' | \
          docker run --rm -i leodido/treetagger tree-tagger-italian

Outputs:

Proviamo         VER:pres       provare
semplicemente    ADV            semplicemente
a                PRE            a
eseguire         VER:infi       eseguire
un               DET:indef      un
test             NOM            test
di               PRE            di
prova            NOM            prova
.                SENT           .

Portuguese

Now, try with some Portuguese.

echo 'Qual é o seu nome?' | \
         docker run --rm -i leodido/treetagger tree-tagger-portuguese

Results:

Qual    PT0     qual
é       VM      ser
o       DA0     o
seu     DP3     seu
nome    NCMS    nome
?       Fit     ?

And so on for other supported languages.

Chunking

Suppose you want to tokenize, tag and annotate a German text with nominal and verbal chunks.

echo 'Das ist ein Test.' | \
        docker run -i leodido/treetagger tagger-chunker-german

Which outputs:

<NC>
Das        PDS        die
</NC>
<VC>
ist        VAFIN      sein
</VC>
<NC>
ein        ART        eine
Test       NN         Test
</NC>
.          $.         .

Build

This image is tested, built and pushed using CircleCI.

See the repository for further information about TreeTagger, about manual building, testing, and so on.

Credits

  • Helmut Schmid, University of Stuttgart, Germany - TreeTagger.

Last update: 28/05/2015

Docker Pull Command
Owner
leodido

Comments (0)