This Week in Databend #97
June 11, 2023 · 4 min read
PsiACE
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
What's On In Databend
Stay connected with the latest news about Databend.
Column Position
Databend now offers support for utilizing syntax like $N to represent column positions. For instance, $2 indicates the second column. Additionally, Databend allows the usage of column positions alongside column names in SQL statements. Here is a simple example:
CREATE TABLE IF NOT EXISTS t1(a int, b varchar);
INSERT INTO t1 values (1, 'a'), (2, 'b');
select $1, $2, a, b from t1;
┌─────────────────────────────────┐
│ $1 │ $2 │ a │ b │
│ Int32 │ String │ Int32 │ String │
├───────┼────────┼───────┼────────┤
│ 1 │ a │ 1 │ a │
│ 2 │ b │ 2 │ b │
└─────────────────────────────────┘
You can also use column positions when you SELECT FROM a staged NDJSON file. We are also actively working on extending this support to other formats. When using the COPY INTO statement to copy data from a stage, Databend matches the field names at the top level of the NDJSON file with the column names in the destination table, rather than relying on column positions.
SELECT $1 FROM @my_stage (FILE_FORMAT=>'ndjson')
COPY INTO my_table FROM (SELECT $1 SELECT @my_stage t) FILE_FORMAT = (type = NDJSON)
It is important to note that when using the SELECT statement for NDJSON in Databend, only $1 is allowed, representing the entire row and having the data type variant.
-- Select the entire row using column position:
SELECT $1 FROM @my_stage (FILE_FORMAT=>'ndjson')
--Select a specific field named "a" using column position:
SELECT $1:a FROM @my_stage (FILE_FORMAT=>'ndjson')
If you are interested in learning more, please check out the resources listed below:
- Issue | Feature: support $<col_position>
- Issue | Feature: copy/select from stage by pos
- PR | feat: support column position like $N
- PR | feat: select from stage support NDJson
Code Corner
Discover some fascinating code snippets or projects that showcase our work or learning journey.
Learn Databend Workflows - Typos Check
Databend now has a very complex workflow for handling code auditing, testing, benchmarking and release. Typos Check is undoubtedly the simplest part of it. Let's take a look at some of its contents together.
Like other workflows, we need to use actions/checkout to check out the code.
- uses: actions/checkout@v4
with:
clean: "true"
typos-cli
is a source code spell checker that finds and corrects spelling mistakes in source code. It is fast enough to run on monorepos and has low false positives, making it suitable for use on PRs.
- uses: baptiste0928/cargo-install@v1
with:
crate: typos-cli
args: --locked
cache-key: typos-check
We use baptiste0928/cargo-install
to install dependencies. It is essentially the same as using cargo install
in your GitHub workflows. Additionally, it allows for automatic caching of resulting binaries to speed up subsequent builds.
- name: do typos check with typos-cli
run: typos
One thing to note is that typos-cli
is the name of the crate, but the corresponding executable binary name is typos
.
If you are interested in learning more, please check out the resources listed below:
Highlights
We have also made these improvements to Databend that we hope you will find helpful:
- Added support for distributed Top-N.
- The lazy_topn_threshold setting is now active by default, with a default value of 1,000.
- For enhanced security measures, the ability to change the password has been added to the root user.
- Read Blog | Databend X Tableau to learn how to connect Databend for BI data analysis in Tableau.
- Read Docs | Integrating Databend as a Sink for Vector and Docs | Analyzing Nginx Access Logs with Databend to understand how to integrate Vector with Databend.
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
Add a Deduplication Label Field to the Rest API
To ensure that data ingestion is idempotent, Databend now supports deduplication of DML through the use of a deduplication label. You can find more information on this feature at Docs | Setting Commands - SET_VAR.
To facilitate cross-language driver integration, we could add a REST API field for the label.
Issue #11710 | Feature: support to bring deduplication label on stage attachment api
Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
Changelog
You can check the changelog of Databend Nightly for details about our latest developments.
Full Changelog: https://github.com/datafuselabs/databend/compare/v1.1.55-nightly...v1.1.56-draft2
🎉 Contributors 17 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!