This Week in Databend #98
June 18, 2023 · 3 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Background Service
Databend Internal Storage FuseTable is similar to Apache Iceberg, a Log Structured Table that requires regular table compaction, re-clustering, and vacuuming to merge small data chunks. The process involves sorting the data by the cluster key or vacuuming unneeded branches.
Previously, different drivers were used for these implementations which added complexity to the infrastructure. Additional services had to be deployed and maintained to trigger driver events. To simplify this process, we have introduced a Background Service that allows Databend to run as a background one-shot job or daemon for running cron jobs. These jobs can trigger table maintenance tasks such as automatic compaction/vacuum/reclustering based on criteria without additional maintenance needed.
This implementation includes:
- Complete metasrv schema definition and background_job and background_tasks.
- APIs for updating and maintaining background_job and background_task state on meta-service.
- Simplified job scheduler implementation which support
one_shot
,interval
,cron
job type.
Background Service
requires Enterprise Edition. To inquire about upgrading, please contact Databend Support.
If you are interested in learning more, please check out the resources listed below:
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
IceLake - Pure Rust Iceberg Implementation
Iceberg, an open table format for analytics, lacks a mature Rust binding, making integration with databases like Databend difficult. IceLake aims to fill this gap and build an open ecosystem that:
- Users can read/write iceberg tables from ANY storage services like s3, gcs, azblob, hdfs and so on.
- ANY Databases can integrate with
icelake
to facilitate reading and writing of iceberg tables. - Provides NATIVE support transmute between
arrow
s. - Provides bindings so that other languages can work with iceberg tables powered by Rust core.
If you are interested in learning more, please check out the resources listed below:
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- Added support for MERGE JOIN.
- Add support for column position to CSV format.
- Read Docs | Computed Columns to understand how to use computed columns and the trade-offs when choosing which type to adopt.
- Read Docs | Subquery-Based Deletions to learn how to use subquery operators and comparison operators to achieve the desired deletion.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Support VALIDATION_MODE
for COPY INTO
We hope to support the VALIDATION_MODE
for Databend's COPY INTO
statement, which can be used to validate the data that needs to be loaded (but not actually loaded) and return results based on specified validation options.
RETURN_ERRORS
: This mode validates the data and returns all errors.RETURN_<number>_ROWS
: This mode validates<number>
rows of data. If there are no errors, it returns the loaded information. If there are any errors encountered, it will throw errors.
Issue #11582 | Feature: copy support VALIDATION_MODE
Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
New Contributors
We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.
- @jonahgao made their first contribution in #11718. Fixed column types of MySQLClient.
- @akoshchiy made their first contribution in #11783. Updated
MACOSX_DEPLOYMENT_TARGET
value.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.56-nightly...v1.1.64-nightly
🎉 Contributors 20 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!