_______ _______ _______ ______ _____ _______ _ _ _______ | | | |_____| | | |_____/ | |______ \ / |______ | | | | | |_____ | | \_ __|__ |______ \/ |______
Maltrieve originated as a fork of mwcrawler. It retrieves malware directly from the sources as listed at a number of sites. Currently we crawl the following:
- Malware Domain List
- Malware URLs
- VX Vault
- Minotaur Analysis
- DAS MALWERK
Other improvements include:
- Proxy support
- Multithreading for improved performance
- Logging of source URLs
- Multiple user agent support
- Better error handling
- Cuckoo Sandbox support
Maltrieve requires the following dependencies:
- Python 2 plus header files (2.6 should be sufficient)
- BeautifulSoup version 4
With the exception of the Python header files, these can all be found in requirements.txt. On Debian-based distributions, run
sudo apt-get install python-dev. On Red Hat-based distributions, run
sudo yum install python-devel. After that, just
pip install -e .. You may need to prepend that with
sudo if not running in a virtual environment, but using such an environment is highly encouraged.
Alternately, avoid all of that by using the Docker image
maltrieve (if installed normally) or
python maltrieve.py (if just downloaded and run)
usage: maltrieve.py [-h] [-q] [-v] [--debug] [-p PROXY] [-d DUMPDIR] [-i INPUTFILE] [-b BLACKLIST] [-w WHITELIST] [-P PRIORITY] [-c URL] [-U USERAGENT] [--malshare MALSHARE] [-t TIMEOUT] [-N CONCURRENCY] [-s] optional arguments: -h, --help show this help message and exit -q, --quiet Don't print results to console -v, --verbose Log informational messages --debug Log debugging messages -p PROXY, --proxy PROXY Define HTTP proxy, e.g. socks5://localhost:9050 -d DUMPDIR, --dumpdir DUMPDIR Define dump directory for retrieved files -i INPUTFILE, --inputfile INPUTFILE Text file with URLs to retrieve -b BLACKLIST, --blacklist BLACKLIST Comma separated mimetype blacklist -w WHITELIST, --whitelist WHITELIST Comma separated mimetype whitelist -P PRIORITY, --priority PRIORITY Cuckoo sample priority -c URL, --cuckoo URL Cuckoo API -U USERAGENT, --useragent USERAGENT HTTP User agent --malshare MALSHARE Malshare key -t TIMEOUT, --timeout TIMEOUT HTTP request/response timeout (default 20) -N CONCURRENCY, --concurrency CONCURRENCY HTTP request/response concurrency (default 5) -s, --sort_mime Sort files by MIME type
Automated Execution (Optional)
Cron can be used to automate the execution of Maltrieve. The following example is provided to help get you started. It will create a cron job that will run Maltrieve every day at 2:01 as a standard user. That said, we recommend enhancing this by creating a custom script for production environments.
As a user, execute
If installed normally, add the following to the end of the file.
01 02 * * * maltrieve <optional flags>
If downloaded to a folder and executed, add the following to the end of the file.
01 02 * * * cd </folder/location> && /usr/bin/python maltrieve.py <optional flags>
Red Hat systems will need to ensure that the user is added to the
Released under GPL version 3. See the LICENSE file for full details.
We list all the bugs we know about (plus some things we know we need to add) at the GitHub issues page.
How you can help
Aside from pull requests, non-developers can open issues on GitHub. Things we'd really appreciate:
- Bug reports, preferably with error logs
- Suggestions of additional sources for malware lists
- Descriptions of how you use it and ways we can improve it for you
Check the contributing guide for details.