This Week in Databend #91
April 30, 2023 · 4 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
New datatype: BITMAP
Databend has added support for the bitmap datatype.
BITMAP
is a type of compressed data structure that can be used to efficiently store and manipulate sets of boolean values. It is often used to accelerate count distinct.
> CREATE TABLE IF NOT EXISTS t1(id Int, v Bitmap) Engine = Fuse;
> INSERT INTO t1 (id, v) VALUES(1, to_bitmap('0, 1')),(2, to_bitmap('1, 2')),(3, to_bitmap('3, 4'));
> SELECT id, to_string(v) FROM t1;
┌──────────────────────┐
│ id │ to_string(v) │
│ Int32 │ String │
├───────┼──────────────┤
│ 1 │ 0,1 │
│ 2 │ 1,2 │
│ 3 │ 3,4 │
└──────────────────────┘
Our implementation of the BITMAP data type utilizes RoaringTreemap
, a compressed bitmap with u64 values. Using this data structure brought us improved performance and decreased memory usage in comparison to alternative bitmap implementations.
If you are interested in learning more, please check out the resources listed below.
- PR #11097 | feat: add bitmap data type
- Website | Roaring Bitmaps
- Paper | Consistently faster and smaller compressed bitmaps with Roaring
Improving Hash Join Performance with New Hash Table Design
We optimized our previous hash table implementation for aggregation functions, but it significantly limited hash join operation performance. To improve hash join performance, we implemented a dedicated hash table optimized for it. We allocated a fixed-size hash table based on the number of rows in the build stage and replaced the value type with a pointer that supports CAS operations, ensuring memory control without the need for Vec growth. The new implementation significantly improved performance. Check out the resources below for more information:
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Rust Compilation Challenges and Solutions
Compiling a medium to large Rust program is not a breeze due to the accumulation of complex project dependencies and boilerplate code.
To address these challenges, Databend team implemented several measures, including observability tools, configuration adjustments, caching, linker optimization, compile-related profiles, and refactoring.
If you are interested in learning more, please check out the resources listed below.
Highlights
Here are some noteworthy items recorded here, perhaps you can find something that interests you.
- Databend will participate in OSPP 2023 projects: OSPP2023 - Databend.
- Check out Docs | Developing with Databend using Rust for Rust application development with
databend-driver
. - Learn to manage and query databases with ease using BendSQL, a powerful command-line tool for Databend. Check out Docs | BendSQL now!
- Check out Docs | Loading from a Stage and Docs | Loading from a Bucket to learn more about loading data from stages and object storage buckets.
- Introduced
table-meta-inspector
, a command-line tool for decoding new table metadata in Databend.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Contributors Wanted for Function Development
We are currently working on improving our functions, and we need your help!
We have identified four areas that require attention, and we would be extremely grateful for any assistance that you can provide.
If you are interested in contributing to any of these areas, please refer to the following resources to learn more about how to write scalar and aggregate functions:
We appreciate any help that you can provide, and we look forward to working with you.
Issue #11220 | Tracking: functions
Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.14-nightly...v1.1.23-nightly
🎉 Contributors 25 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!