This Week in Databend #109
September 3, 2023 · 3 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Understanding Cluster Key
By defining a Cluster Key, you can enhance query performance by clustering tables. In this case, data will be organized and grouped based on the Cluster Key, rather than relying solely on the order in which data is ingested. This optimizes the data retrieval logic for large tables and accelerates queries.
When using the COPY INTO
or REPLACE INTO
command to write data into a table that includes a cluster key, Databend will automatically initiate a re-clustering process, as well as a segment and block compact process.
Clustering or re-clustering a table takes time. Databend suggests defining cluster keys primarily for sizable tables that experience slow query performance.
If you are interested in learning more, please check out the resources listed below.
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Exploring Databend Local Mode
Databend Local Mode aims to provide a simple Databend environment where users can directly interact with Databend using SQL without the need to perform a full Databend deployment. This makes it convenient for developers to perform basic data processing using Databend.
databend-query local
starts a temporary databend-query process which functions as both client and server.The storage is located in a temporary directory and its lifecycle follows the process. Once the process ends, the resources are also destroyed. You can start multiple local processes on one server, and their resources are isolated from each other.
❯ alias databend-local="databend-query local"
❯ echo " select sum(a) from range(1, 100000) as t(a)" | databend-local
4999950000
❯ databend-local
databend-local:) select number %3 n, number %4 m, sum(number) from numbers(1000000000) group by n,m limit 3 ;
┌───────────────────────────────────┐
│ n │ m │ sum(number) │
│ UInt8 │ UInt8 │ UInt64 NULL │
├───────┼───────┼───────────────────┤
│ 0 │ 0 │ 41666666833333332 │
│ 1 │ 0 │ 41666666166666668 │
│ 2 │ 0 │ 41666666500000000 │
└───────────────────────────────────┘
0 row result in 1.669 sec. Processed 1 billion rows, 953.67 MiB (599.02 million rows/s, 4.46 GiB/s)
If you need to use Databend in a production environment, we recommend deploying a Databend cluster according to the official documentation or using Databend Cloud. databend-local is a good choice for developers to quickly leverage Databend's capabilities without the need to deploy a full Databend instance.
If you are interested in learning more, please check out the resources listed below:
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- Added initial support for
MERGE INTO
. - Implemented SQLsmith testing framework to support more accurate fuzzy testing.
- Read the document Docs | Setting Environment Variables for how to change Databend configurations through environment variables.
- Implemented
json_strip_nulls
andjson_typeof
functions. You can also read Docs | Semi-Structured Functions to check out Databend functions for semi-structured data processing.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Optimizing the implementation of MERGE INTO
In PR #12350 | feat: support Merge-Into V1, Databend has initially supported the MERGE INTO
syntax.
Based on this, there are many optimizations worth considering, such as providing parallel and distributed implementations, reducing I/O, and simplifying data block splitting.
Issue #12595 | Feature: Merge Into Optimizations
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.87-nightly...v1.2.96-nightly
🎉 Contributors 23 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!