myscale/myscaledb
Enable every developer to build production-grade GenAI applications with powerful and familiar SQL.
MyScaleDB is the SQL vector database that enables developers to build production-ready and scalable AI applications using familiar SQL. It is built on top of ClickHouse and optimized for AI applications and solutions, allowing developers to effectively manage and process massive volumes of data.
Key benefits of using MyScaleDB include:
MyScale Cloud provides fully-managed MyScaleDB with premium features on billion-scale data2. Compared with specialized vector databases that use custom APIs, MyScale is more powerful, performant, and cost-effective while remaining simpler to use. This makes it suitable for a large community of programmers. Additionally, when compared to integrated vector databases like PostgreSQL with pgvector or ElasticSearch with vector extensions, MyScale consumes fewer resources and achieves better accuracy and speed for structured and vector joint queries, such as filtered searches.
See our documentation and blogs for more about MyScale’s unique features and advantages. Our open-source benchmark provides detailed comparison with other vector database products.
ClickHouse is a popular open-source analytical database that excels at big data processing and analytics due to its columnar storage with advanced compression, skip indexing, and SIMD processing. Unlike transactional databases like PostgreSQL and MySQL, which use row storage and main optimzies for transactional processing, ClickHouse has significantly faster analytical and data scanning speeds.
One of the key operations in combining structured and vector search is filtered search, which involves filtering by other attributes first and then performing vector search on the remaining data. Columnar storage and pre-filtering are crucial for ensuring high accuracy and high performance in filtered search, which is why we chose to build MyScaleDB on top of ClickHouse.
While we have modified ClickHouse's execution and storage engine in many ways to ensure fast and cost-effective SQL vector queries, many of the features (#37893, #38048, #37859, #56728, #58223) related to general SQL processing have been contributed back to the ClickHouse open source community.
The simplest way to use MyScaleDB is to create an instance on MyScale Cloud service. You can start from a free pod supporting 5M 768D vectors. Sign up here and checkout MyScaleDB QuickStart for more instructions.
Using MyScaleDB Docker Image
To quickly get a MyScaleDB instance up and running, simply pull and run the latest Docker image:
docker run --name myscaledb --net=host myscale/myscaledb:1.4
Note: Myscale's default configuration only allows localhost ip access. For the docker run startup method, you need to specify
--net=host
to access services deployed in docker mode on the current node.
This will start a MyScaleDB instance with default user default
and no password. You can then connect to the database using clickhouse-client
:
docker exec -it myscaledb clickhouse-client
Using Docker Compose
docker-compose.yaml
file:> tree myscaledb
myscaledb
├── docker-compose.yaml
└── volumes
└── config
└── users.d
└── custom_users_config.xml
3 directories, 2 files
docker-compose.yaml
file, which you can adjust based on your specific requirements:version: '3.7'
services:
myscaledb:
image: myscale/myscaledb:1.4
tty: true
ports:
- '8123:8123'
- '9000:9000'
- '8998:8998'
- '9363:9363'
- '9116:9116'
networks:
myscaledb_network:
ipv4_address: 10.0.0.2
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/data:/var/lib/clickhouse
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/log:/var/log/clickhouse-server
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/config/users.d/custom_users_config.xml:/etc/clickhouse-server/users.d/custom_users_config.xml
deploy:
resources:
limits:
cpus: "16.00"
memory: 32Gb
networks:
myscaledb_network:
driver: bridge
ipam:
driver: default
config:
- subnet: 10.0.0.0/24
custom_users_config.xml
<clickhouse>
<users>
<default>
<password></password>
<networks>
<ip>::1</ip>
<ip>127.0.0.1</ip>
<ip>10.0.0.0/24</ip>
</networks>
<profile>default</profile>
<quota>default</quota>
<access_management>1</access_management>
</default>
</users>
</clickhouse>
Note: The custom_users_config configuration allows you to use the default user to access the database on the node where the database service is deployed using docker compose. If you want to access the database service on other nodes, it is recommended to create a user that can be accessed through other IPs. For detailed settings, see: MyScaleDB Create User. You can also customize the configuration file of MyScaleDB. Copy the
/etc/clickhouse-server
directory from yourmyscaledb
container to your local drive, modify the configuration, and add a directory mapping to thedocker-compose.yaml
file to make the configuration take effect:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/config:/etc/clickhouse-server
cd myscaledb
docker compose up -d
docker exec -it myscaledb-myscaledb-1 clickhouse-client
See Vector Search Documentation for how to create a SQL table with vector index and perform vector search. It's recommended to specify TYPE SCANN
when creating a vector index in open source MyScaleDB.
-- Create a table with body_vector of length 384
CREATE TABLE default.wiki_abstract
(
`id` UInt64,
`body` String,
`title` String,
`url` String,
`body_vector` Array(Float32),
CONSTRAINT check_length CHECK length(body_vector) = 384
)
ENGINE = MergeTree
ORDER BY id;
-- Insert data from parquet files on S3
INSERT INTO default.wiki_abstract SELECT * FROM s3('https://myscale-datasets.s3.ap-southeast-1.amazonaws.com/wiki_abstract_with_vector.parquet','Parquet');
-- Build a SCANN vector index with Cosine metric on the body_vector
ALTER TABLE default.wiki_abstract ADD VECTOR INDEX vec_idx body_vector TYPE SCANN('metric_type=Cosine');
-- Query the index build progress from the `vector_indices` table
-- Wait until the index progress becomes `Built`
SELECT * FROM system.vector_indices;
-- Perform vector search return the top-5 results
SELECT
id,
title,
distance(body_vector, [-0.052, -0.0146, -0.0677, -0.0256, -0.0395, -0.0381, -0.025, 0.0911, -0.0429, -0.0592, 0.0017, -0.0358, -0.0464, -0.0189, -0.0192, 0.0544, -0.0022, -0.0292, -0.0474, -0.0286, 0.0746, -0.013, -0.0217, -0.0246, -0.0169, 0.0495, -0.0947, 0.0139, 0.0445, -0.0262, -0.0049, 0.0506, 0.004, 0.0276, 0.0063, -0.0643, 0.0059, -0.0229, -0.0315, 0.0549, 0.1427, 0.0079, 0.011, -0.0036, -0.0617, 0.0155, -0.0607, 0.0258, -0.0205, 0.0008, -0.0547, 0.0329, -0.0522, -0.0347, 0.0921, 0.0139, -0.013, 0.0716, -0.0165, 0.0257, -0.0071, 0.0084, -0.0653, 0.0091, 0.0544, -0.0192, -0.0169, -0.0017, -0.0304, 0.0427, -0.0389, 0.0921, -0.0622, -0.0196, 0.0025, 0.0214, 0.0259, -0.0493, -0.0211, -0.119, -0.0736, -0.1545, -0.0578, -0.0145, 0.0138, 0.0478, -0.0451, -0.0332, 0.0799, 0.0001, -0.0737, 0.0427, 0.0517, 0.0102, 0.0386, 0.0233, 0.0425, -0.0279, -0.0529, 0.0744, -0.0305, -0.026, 0.1229, -0.002, 0.0038, -0.0491, 0.0352, 0.0027, -0.056, -0.1044, 0.123, -0.0184, 0.1148, -0.0189, 0.0412, -0.0347, -0.0569, -0.0119, 0.0098, -0.0016, 0.0451, 0.0273, 0.0436, 0.0082, 0.0166, -0.0989, 0.0747, -0.0, 0.0306, -0.0717, -0.007, 0.0665, 0.0452, 0.0123, -0.0238, 0.0512, -0.0116, 0.0517, 0.0288, -0.0013, 0.0176, 0.0762, 0.1284, -0.031, 0.0891, -0.0286, 0.0132, 0.003, 0.0433, 0.0102, -0.0209, -0.0459, -0.0312, -0.0387, 0.0201, -0.027, 0.0243, 0.0713, 0.0359, -0.0674, -0.0747, -0.0147, 0.0489, -0.0092, -0.018, 0.0236, 0.0372, -0.0071, -0.0513, -0.0396, -0.0316, -0.0297, -0.0385, -0.062, 0.0465, 0.0539, -0.033, 0.0643, 0.061, 0.0062, 0.0245, 0.0868, 0.0523, -0.0253, 0.0157, 0.0266, 0.0124, 0.1382, -0.0107, 0.0835, -0.1057, -0.0188, -0.0786, 0.057, 0.0707, -0.0185, 0.0708, 0.0189, -0.0374, -0.0484, 0.0089, 0.0247, 0.0255, -0.0118, 0.0739, 0.0114, -0.0448, -0.016, -0.0836, 0.0107, 0.0067, -0.0535, -0.0186, -0.0042, 0.0582, -0.0731, -0.0593, 0.0299, 0.0004, -0.0299, 0.0128, -0.0549, 0.0493, 0.0, -0.0419, 0.0549, -0.0315, 0.1012, 0.0459, -0.0628, 0.0417, -0.0153, 0.0471, -0.0301, -0.0615, 0.0137, -0.0219, 0.0735, 0.083, 0.0114, -0.0326, -0.0272, 0.0642, -0.0203, 0.0557, -0.0579, 0.0883, 0.0719, 0.0007, 0.0598, -0.0431, -0.0189, -0.0593, -0.0334, 0.02, -0.0371, -0.0441, 0.0407, -0.0805, 0.0058, 0.1039, 0.0534, 0.0495, -0.0325, 0.0782, -0.0403, 0.0108, -0.0068, -0.0525, 0.0801, 0.0256, -0.0183, -0.0619, -0.0063, -0.0605, 0.0377, -0.0281, -0.0097, -0.0029, -0.106, 0.0465, -0.0033, -0.0308, 0.0357, 0.0156, -0.0406, -0.0308, 0.0013, 0.0458, 0.0231, 0.0207, -0.0828, -0.0573, 0.0298, -0.0381, 0.0935, -0.0498, -0.0979, -0.1452, 0.0835, -0.0973, -0.0172, 0.0003, 0.09, -0.0931, -0.0252, 0.008, -0.0441, -0.0938, -0.0021, 0.0885, 0.0088, 0.0034, -0.0049, 0.0217, 0.0584, -0.012, 0.059, 0.0146, -0.0, -0.0045, 0.0663, 0.0017, 0.0015, 0.0569, -0.0089, -0.0232, 0.0065, 0.0204, -0.0253, 0.1119, -0.036, 0.0125, 0.0531, 0.0584, -0.0101, -0.0593, -0.0577, -0.0656, -0.0396, 0.0525, -0.006, -0.0149, 0.003, -0.1009, -0.0281, 0.0311, -0.0088, 0.0441, -0.0056, 0.0715, 0.051, 0.0219, -0.0028, 0.0294, -0.0969, -0.0852, 0.0304, 0.0374, 0.1078, -0.0559, 0.0805, -0.0464, 0.0369, 0.0874, -0.0251, 0.0075, -0.0502, -0.0181, -0.1059, 0.0111, 0.0894, 0.0021, 0.0838, 0.0497, -0.0183, 0.0246, -0.004, -0.0828, 0.06, -0.1161, -0.0367, 0.0475, 0.0317]) AS distance
FROM default.wiki_abstract
ORDER BY distance ASC
LIMIT 5;
We're committed to continuously improving and evolving MyScaleDB to meet the ever-changing needs of the AI industry. Join us on this exciting journey and be part of the revolution in AI data management!
Get the latest MyScaleDB news or updates
ReplacingMergeTree
MyScaleDB is licensed under the Apache License, Version 2.0. View a copy of the License file.
We give special thanks for these open-source projects, upon which we have developed MyScaleDB:
See why metadata filtering is crucial for imporoving RAG accuracy here. ↩
The MSTG (Multi-scale Tree Graph) algorithm is provided through MyScale Cloud, achieving high data density with disk-based storage and better indexing & search performance on billion-scale vector data. ↩
docker pull myscale/myscaledb