This Week in Databend #113
October 1, 2023 · 3 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Loading to Table with Extra Columns
By default, COPY INTO loads data into a table by matching the order of fields in the file to the corresponding columns in the table. It's essential to ensure that the data aligns correctly between the file and the table.
If your table has more columns than the file, you can specify the columns into which you want to load data.
When working with CSV format, if your table has more columns than the file and the additional columns are at the end of the table, you can load data using the FILE_FORMAT
option ERROR_ON_COLUMN_COUNT_MISMATCH
.
If you are interested in learning more, please check out the resources below:
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Introducing Read Policies to Parquet Reader
There is a drawback of using the arrow-rs
APIs: When we try to prefetch data for prewhere and topk push downs, we can't reuse the deserialized blocks.
In order to improve the logic of row group reading and reuse the prefetched data blocks at the final output stage, we have done a lot of refactoring and introduced some read policies.
NoPrefetchPoliy
No prefetch stage at all. Databend reads, deserializes, and outputs data blocks you need directly.
PredicateAndTopkPolicy
Databend prefetches columns needed by prewhere and topk at the prefetch stage. It deserializes them into a DataBlock and evaluates them into RowSelection. It then slices the DataBlock by batch size and stores the resulting VecDeque in memory.
Databend reads the remaining columns specified by RowSelection at the final stage, and it outputs DataBlocks in batches. Then, it merges the prefetched blocks and projects the resulting blocks according to the output_schema.
TopkOnlyPolicy
It's similar to the PredicateAndTopkPolicy
, but Databend only evaluates the topk at the prefech stage.
If you are interested in learning more, please check out the resources below:
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- Added spill info to query log.
- Added support for unloading data into a compressed file with COPY INTO.
- Introduced the
GET /v1/background/:tenant/background_tasks
HTTP API for querying background tasks. - Read Example 4: Filtering Files with Pattern to understand how to use Pattern to filter files.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Fixing issues detected by SQLsmith
In the last month, SQLsmith has discovered around 40 bugs in Databend. Databend Labs is actively working to fix these issues and improve system stability, even in uncommon scenarios. Your involvement in this effort, which may include tasks like type conversion or handling special values, is encouraged and can be facilitated by referring to past fixes.
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
New Contributors
We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.
- @zenus fixed an issue where schema mismatch was not detected during the execution of
COPY INTO
in #13010.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.128-nightly...v1.2.137-nightly
🎉 Contributors 22 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!