Short Description
Extract a random sample from Wikipedia XML dump
Full Description
If you need to extract samples of different sizes of wikipedia articles in an XML format but you don't want to deal with XML parsing, this little bad boy is just what you need.
docker run -i -v ~/folder-with-xml-dump/:/work idio/wikistats-split-wiki-dump /work/enwiki-20150602-pages-articles.xml 10 20 40
Will produce following files in ~/folder-with-xml-dump/
:
enwiki-20150602-pages-articles.xml.sample-10
enwiki-20150602-pages-articles.xml.sample-20
enwiki-20150602-pages-articles.xml.sample-40
Docker Pull Command
Owner
idio