cdzombak/ytdlbot-api
A tool for collecting videos from YouTube & similar sites
326
A tool for collecting videos from YouTube & similar sites.
I use ytdlbot to collect music videos, photography videos, "nature relaxation"-style videos, and comedy bits I'll want to reference often.
A deployment of ytdlbot requires Docker and includes:
ytdlbot-api
, which accepts requests to add videos to your collections.ytdlbot-processor
, which works through the download queue and files videos in the requested collection. This is the Docker container on which you'll set environment variables to customize how your media is filed.Optionally, the deployment may use:
docker compose
for easier deployment managementytdlbot-processor
The processor container respects the environment variables PUID
and PGID
to ensure saved videos and log files are owned by the correct user on the host system.
If you're using Netdata to collect metrics, you should also set the NETDATA_GID
variable to the ID of the netdata
group on the host system.
ytdlbot-api
The API container should simply be run as the same user as the processor container's PUID
/PGID
variables, using Docker's native --user
setting.
The processor container will write log files containing timestamps. By default these are in UTC. You may set the TZ
environment variable (for example, TZ=America/Detroit
) to make these timestamps match your local system.
You'll need to pick a directory to store your video collections.
Map this folder into both the API and processor Docker containers like the following:
"/mnt/storage/my-ytdlbot-media:/ytdlbot-media"
You may pick a directory to store logs from the queue processor.
If you want logs from the queue processor to be persisted somewhere you can easily browse them, map the following volume on the processor Docker container:
"/home/me/ytdlbot-logs:/ytdlbot-logs"
Finally, /var/run/ytdlbot
is required if you want to use Netdata for metrics.
Map the following volume on the processor Docker container:
"/var/run/ytdlbot:/ytdlbot-run"
By default, videos in a collection are organized into folders based on uploader name. To disable this filing, set the environment variable ORGANIZE_BY_UPLOADER=false
on the processor container.
If you expect your collection to be extremely large, you can enable sharding your collection by uploader name by setting the environment variable SHARD_BY_UPLOADER=true
on the processor container. This results in a folder structure like the following:
ytdlbot-media
- Music Videos
- f
- Foo Fighters
- White Limo.mp4
(...)
Sharding is performed by dirshard
and can be configured via environment variables on the processor container. See the dirshard
README for available environment variables.
The API container exposes the API on port 5000
. Map this to your desired port when running the container.
I recommend managing your deployment via Docker Compose. A sample docker-compose.yml
is provided in this repo. Notably, it demonstrates:
A sample Makefile
is also provided, which:
The queue processor runs its tasks using runner
. You can enable notifications for failed operations via email, Discord, or Ntfy by setting the appropriate RUNNER_
environment variables on the processor container.
See the runner
README for details, and see the sample docker-compose.yml
in this repo for an example using SMTP to send failure notifications.
I use and recommend Tailscale to control access to the ytdlbot API. This is simple to do. Assuming you've exposed the API from its Docker container on port 5000, running this command will make it accessible via HTTPS at https://your-machine.your-tailnet.ts.net:8124
:
sudo tailscale serve --bg --https=8124 http://127.0.0.1:5000
The sample Makefile
in this repo has an example of this.
I use Netdata to collect metrics and monitor my server. The netdata
directory in this repo contains definitions for several charts tracking metrics about your ytdlbot media library. It also has an alarm you can install to be notified when a download failure occurs.
These charts assume you have the following volume mapped on the processor container:
"/var/run/ytdlbot:/ytdlbot-run"
To install the Netdata chart, clone this repository to your server, change into the netdata
directory, and run ./install-chart.sh
. To additionally install the failure alarm, run ./install-alarm.sh
.
Each folder inside the media directory represents a "collection." (Examples from my system are "Music Videos" and "Photography.")
When adding a video, the name of the target collection is required. (See the "curl" section below for an example.) If the collection does not exist, it's created automatically.
You can list the existing collections by calling the /collections
API endpoint. This could be used to build a collection UI that allows selecting from the existing collections.
Collection names are case-sensitive if and only if the filesystem storing your media is case-sensitive.
ytdlbot intentionally implements the server side of a video collection, allowing you to build up whatever clients integrate well into your workflows. However, a few sample ways to collect videos are provided below.
Apple Shortcuts
This Shortcut receives a URL from the system share sheet, asks you which collection to add it to, and saves it. Here's a preview of the Shortcut:
AppleScript
The following AppleScript, placed in ~/Library/Scripts/Applications/Firefox
, will add whatever's in the current tab to the "Photography" collection:
set previousClipboard to the clipboard
tell application "Firefox" to activate
delay 0.25
tell application "System Events"
keystroke "l" using command down
keystroke "c" using command down
tell application "System Events"
key code 53
end tell
end tell
delay 0.5
set theUrl to the clipboard
set the clipboard to previousClipboard
set theJson to "{\"url\": \"" & theUrl & "\", \"collection\": \"Photography\"}"
set theResult to (do shell script "curl -X POST https://my-machine.my-tailnet.ts.net:8124/add -H 'Content-Type: application/json' -d '" & theJson & "'")
if theResult does not contain "URL accepted" then
display dialog theJson & "
" & theResult
else
do shell script "/opt/homebrew/bin/terminal-notifier -message 'URL accepted.' -title '✅'"
end if
curl
curl \
--header "Content-Type: application/json" \
--request POST \
--data '{"url": "https://www.youtube.com/watch?v=ZDOI0cq6GZM", "collection": "Web Videos"}' \
http://localhost:5000/add
YouTube channels sometimes change names. You can use the _disambiguations.json
file to map channel/uploader names to the names you want used for folders in your media directory.
Uploader names are case-sensitive if and only if the filesystem storing your media is case-sensitive.
A sample _disambiguations.json
file, mapping the channel name Lofi Girl
to the folder name ChilledCow
within my collection:
{
"Lofi Girl": "ChilledCow"
}
The ytdlbot API has three endpoints. There is no authentication; if you need authentication/authorization, you should use an authenticating reverse proxy or Tailscale instead of exposing the API directly.
/add
POST /add
accepts a JSON body with two keys, collection
and url
. Sample JSON:
{
"collection": "Music Videos",
"url": "https://www.youtube.com/watch?v=ebJ2brErERQ"
}
The response contains two keys, status
and message
. If the URL was successfully accepted, status
will be "ok"
.
/collections
GET /collections
returns JSON representing a list of the existing ytdlbot collections. The list is sorted. Sample output:
{
"collections": [
"Music Videos",
"Photography"
]
}
/health
If the service is up and running, GET /health
returns HTTP 200 and the value ytdlbot-api is online
.
GNU GPLv3; see LICENSE in this repo.
docker pull cdzombak/ytdlbot-api