This Week in Databend #93
May 14, 2023 · 3 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
The upgrade tool
meta-upgrade-09
will no longer be available in the release package. If you're using Databend 0.9 or an earlier version, you can seek help from the community.
What's On In Databend
Stay connected with the latest news about Databend.
Databend's Segment Caching Mechanism Now Boasts Improved Memory Usage
Databend's segment caching mechanism has received a significant upgrade that reduces its memory usage to 1.5/1000 of the previous usage in a test scenario.
The upgrade involves a different "representation" of cached segments, called CompactSegmentInfo
. This presentation consists mainly of two components:
- The decoded min/max indexes and other statistical information.
- The undecoded (and compressed) raw bytes of block-metas.
During segment pruning, if any segments are pruned, there is no need to decode the block-metas represented by raw bytes. If they are not pruned, then their raw bytes are decoded on-the-fly for block pruning and scanning purposes (and dropped if no longer needed).
If you are interested in learning more, please check out the resources listed below.
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Bind databend
into Python
Databend now offers a Python binding that allows users to execute SQL queries against Databend using Python even without deploying a Databend instance.
To use this functionality, simply import SessionContext
from databend
module and create an instance of it:
from databend import SessionContext
ctx = SessionContext()
You can then run SQL queries using the sql()
method on your session context object:
df = ctx.sql("select number, number + 1, number::String as number_p_1 from numbers(8)")
The resulting DataFrame can be converted to PyArrow or Pandas format using the to_py_arrow()
or to_pandas()
methods respectively:
df.to_pandas() # Or, df.to_py_arrow()
Feel free to integrate it with your data science workflow.
Highlights
Here are some noteworthy items recorded here, perhaps you can find something that interests you.
- Read the two new tutorials added to Transform Data During Load to learn how to perform arithmetic operations during loading and load data into a table with additional columns.
- Read Working with Stages to gain a deeper understanding and learn how to manage and use it effectively.
- Added functions:
date_format
,str_to_date
andstr_to_timestamp
.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Add open-sharing
Binary to Databend Image
Open Sharing is a cheap and secure data sharing protocol for databend query on multi-cloud environments. Databend provides a binary called open-sharing
, which is a tenant-level sharing endpoint. You can read databend | sharing-endpoint - README.md to learn more information.
To facilitate the deployment of open-sharing
endpoint instances using K8s or Docker, it is recommended to add it to Databend's docker image.
Issue #11182 | Feature: added open-sharing binary in the databend-query image
Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
New Contributors
We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.
- @Mehrbod2002 made their first contribution in #11367. Added validation for
max_storage_io_requests
. - @DongHaowen made their first contribution in #11362. Specified database in benchmark.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.30-nightly...v1.1.38-nightly
🎉 Contributors 24 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!