This Week in Databend #107
August 20, 2023 · 3 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Understanding Connection Parameters
The connection parameters refer to a set of essential connection details required for establishing a secure link to supported external storage services, like Amazon S3. These parameters are enclosed within parentheses and consists of key-value pairs separated by commas or spaces. It is commonly utilized in operations such as creating a stage, copying data into Databend, and querying staged files from external sources.
For example, the following statement creates an external stage on Amazon S3 with the connection parameters:
CREATE STAGE my_s3_stage
URL = 's3://load/files/'
CONNECTION = (
ACCESS_KEY_ID = '<your-access-key-id>',
SECRET_ACCESS_KEY = '<your-secret-access-key>'
);
If you are interested in learning more, please check out the resources listed below.
Adding Storage Parameters for Hive Catalog
Over the past week, Databend introduced storage parameters for the Hive Catalog, allowing the configuration of specific storage services. This means that the catalog no longer relies on the storage backend of the default catalog.
The following example shows how to create a Hive Catalog using MinIO as the underlying storage service:
CREATE CATALOG hive_ctl
TYPE = HIVE
CONNECTION =(
ADDRESS = '127.0.0.1:9083'
URL = 's3://warehouse/'
AWS_KEY_ID = 'admin'
AWS_SECRET_KEY = 'password'
ENDPOINT_URL = 'http://localhost:9000/'
)
If you are interested in learning more, please check out the resources listed below.
- Issue #12407 | Feature: Add storage support for Hive catalog
- PR #12469 | feat: Add storage params in hive catalog
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Using gitoxide
to Speed Up Git Dependency Downloads
gitoxide
is a high-performance, modern Git implementation written in Rust. Utilizing the gitoxide
feature of cargo (Unstable), the gitoxide
crate can replace git2
to perform various Git operations, thereby achieving several times performance improvement when downloading crates-index and git dependencies.
Databend has recently enabled this feature for `cargo {build | clippy | test}`` in CI. You can also try to add the -Zgitoxide option to speed up the build process during local development:
cargo -Zgitoxide=fetch,shallow-index,shallow-deps build
If you are interested in learning more, please check out the resources listed below:
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
VALUES
clause can be used without being combined withSELECT
.- You can now set a default value when modifying the type of a column. See Docs | ALTER TABLE COLUMN for details.
- Databend can now automatically recluster a table after write operations such as
COPY INTO
andREPLACE INTO
.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Enhancing infer_schema
for All File Locations
Currently, it is possible to query files using file locations or from stages in Databend.
select * from 'fs:///home/...';
select * from 's3://bucket/...';
select * from @stage;
However, the infer_schema
function only works with staged files. For example:
select * from infer_schema(location=>'@stage/...');
When attempting to use infer_schema
with other file locations, it leads to a panic:
select * from infer_schema(location =>'fs:///home/...'); -- this will panic.
So, the improvement involves extending the infer_schema
capability to encompass all types of file paths, not limited to staged files. This will enhance system consistency and the usefulness of the infer_schema
function.
Issue #12458 | Feature: infer_schema
support normal file path
Please let us know if you're interested in contributing to this feature, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.2.62-nightly...v1.2.74-nightly
🎉 Contributors 20 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!