Public | Automated Build

Last pushed: 3 years ago
Short Description
wp2txt on ubuntu latest
Full Description

wp2txt (https://github.com/yohasebe/wp2txt) is a tool to extract plain text from wikipedia dump.

Run

$ mkdir /path/to/wikidump
$ mkdir /path/to/wikidump/text
$ docker run -it -v /path/to/wikidump:/mnt/wikidump toshihikoyanase/wp2txt /bin/bash
# cd /mnt/wikidump
# wp2txt -i jawiki-20150805-pages-articles.xml.bz2 -o text/
Docker Pull Command
Owner
toshihikoyanase
Source Repository