Public | Automated Build

Last pushed: 2 years ago
Short Description
Python WayBack for web archive replay and url-rewriting HTTP/S web proxy
Full Description

pywb Docker setup

Docker setup for pywb Python Wayback.

pywb is a web archive replay and live web proxy rewriting system.

Running With Live Web Proxy

To run a container of this image:

docker run -it ikreymer/pywb

This will start pywb with a live web proxy configured at http://[DOCKER_HOST]:8080/live/
For example, the live rewrite of http://example.com/ can be viewed at http://[DOCKER_HOST]:8080/live/http://example.com/

Creating a Web Archive With Pywb

pywb supports creating collections with the wb-manager utility, which can be run from Docker as well.

For example, the following can be used to add a WARC from /path/to/mywarc/warc_file.warc.gz to a new collection called test
and store it in a Docker volume, mapped to local /path/to/my_collection/

# Init Collection 'test'
docker run -it -v /path/to/collection:/webarchive ikreymer/pywb wb-manager init test

# Add warc to collection 'test'
docker run -it -v /path/to/collection:/webarchive -v /path/to/mywarc/:/warcs/ ikreymer/pywb wb-manager init add /warcs/warc_file.warc.gz`

# run pywb
docker run -it -p 8080:8080 -v /path/to/collection:/webarchive ikreymer/pywb

The contents of the WARC can now be browsed by visiting http://[DOCKER_HOST]:8080/test/[url]

The mapping to /webarchive volume can be omitted if the collection need not be accessed outside of Docker.

Running with An Existing Web Archive

This image exposes the /webarchive volume which can be used to store all the pywb collections and config files.

It can be used to create a new archive as shown above or map an existing web archive for pywb.

Refer to pywb documentation for additional pywb usage info.

Docker Pull Command
Owner
ikreymer
Source Repository