Sosse - Open-source, enterprise-grade web search & crawling.
10K+
Discover Sosse β the Selenium Open Source Search Engine built for powerful web archiving, crawling, and search. Explore all its features and capabilities on the official websiteβ .
Whether you're a developer, researcher, or data enthusiast, Sosse is ready to support your projects. Join the community on GitHubβ or GitLabβ to submit feature requests, report bugs, contribute code, or start a discussionβ .
π Web Page Search: Search the content of web pages, including dynamically rendered ones, with advanced queries. (docβ )
π Recurring Crawling: Crawl pages at fixed intervals or adapt the rate based on content changes. (docβ )
π Web Page Archiving: Archive HTML content, adjust links for local use, download required assets, and support dynamic content. (docβ )
π·οΈ Tags: Organize and filter crawled or archived pages using tags for better search and management. (docβ )
π File Downloads: Batch download binary files from web pages. (docβ )
π‘ Webhooks: Integrate with external services using highly flexible webhooks. Connect to proprietary AI platforms (docβ ) or locally hosted solutions (docβ ) to enable advanced data extraction, summarization, auto-tagging, notifications, and more.
π Atom Feeds: Generate content feeds for websites that donβt have them, or receive updates when a new page containing a keyword is published. (docβ )
π Authentication: The crawler can authenticate to access private pages and retrieve content. (docβ )
π₯ Permissions: Admins can configure crawlers and view statistics, while authenticated users can search or do so anonymously. (docβ )
π€ Search Features: Includes private search history (docβ ), and external search engine shortcuts (docβ ), etc.
Explore the π documentationβ and check out some π· screenshotsβ .
Sosse is written in Python and is distributed under the GNU AGPLv3 licenseβ . It uses browser-based crawling with Mozilla Firefoxβ or Google Chromiumβ alongside Seleniumβ to index pages that rely on JavaScript. For faster crawling, Requestsβ can also be used. Sosse uses PostgreSQLβ for data storage.
To quickly try the latest version with Docker:
docker run -p 8005:80 biolds/sosse:stable
Then, open http://127.0.0.1:8005/β and log in with the username admin and password admin.
For persistence of Docker data or alternative installation methods, please refer to the installation guideβ .
Join the Discord serverβ to get help, share ideas, or discuss Sosse!
Content type
Image
Digest
sha256:d8b49d2f2β¦
Size
1.7 GB
Last updated
4 days ago
Requires Docker Desktop 4.37.1 or later.