Public Repository

Last pushed: 2 years ago
Short Description
Small web mirroring Docker container appliance. Includes lighttpd to re-export.
Full Description

Small web mirroring Docker container appliance which mirrors a web page via httrack 3.48-21, and debian:8 with lighttpd.

Example usage:
docker run -e web_source="http://www.sample.co.uk/devops/" -e port=8080 -e refresh=24 -e recursive=2 -e max_time=300 -d --expose=8080 -p 8080:8080 hkubota/webmirror

Variables:
web_source: Typical web address, either http://host.name.com/some/path/ or https://... No default.
recursive: Level of recursivity. 1 is no recursion. 0 is infinite. Default is 2.
refresh: Refresh after N hours. Default is 24.
max_time: Kill httrack after N seconds. Default is 300 (5min)
other_flags: Here you can use any other flags for httrack. Typical one is -v for added verbosity or -mN1,N2 where N1 and N2 are max file sizes for non-html resp html files (in bytes)

The example (partially) mirrors http://www.sample.co.uk/devops/ with (default) 2 levels of recursion and an automatic refresh every (default) 24h and a timeout of (default) 5min, and exports the whole via its web server running on port 8080

The initial mirroring can take a while depending on what you mirror. Once the web server started, the refreshes will be done in the background.

Note: if you mirror http://some.web.site/some/directory, you'll access it via http://my.mirror/some/directory, and not http://my.mirror/

Caveat: You cannot run this forever. The logs will fill up after a while.
Also it does not always copy completely web pages. It might hang or get into an endless loop.

Docker Pull Command
Owner
hkubota