Retrieving Metadata
Why and What is Metadata?
Databend allows you to retrieve metadata from your data files using the INFER_SCHEMA function. This means you can extract column definitions from data files stored in internal or external stages. Retrieving metadata through the INFER_SCHEMA function provides a better understanding of the data structure, ensures data consistency, and enables automated data integration and analysis. The metadata for each column includes the following information:
- column_name: Indicates the name of the column.
- type: Indicates the data type of the column.
- nullable: Indicates whether the column allows null values.
- order_id: Represents the column's position in the table.
This feature is currently only available for the Parquet file format.
The syntax for INFER_SCHEMA is as follows. For more detailed information about this function, see INFER_SCHEMA.
INFER_SCHEMA(
LOCATION => '{ internalStage | externalStage }'
[ PATTERN => '<regex_pattern>']
)
Tutorial: Querying Column Definitions
In this tutorial, we will guide you through the process of uploading the sample file to an internal stage, querying the column definitions, and finally creating a table based on the staged file. Before you start, download and save the sample file books.parquet to a local folder.
- Create an internal stage named my_internal_stage:
CREATE STAGE my_internal_stage;
- Stage the sample file using BendSQL:
root@localhost:8000/default> PUT fs:///Users/eric/Documents/books.parquet @my_internal_stage
PUT fs:///Users/eric/Documents/books.parquet @my_internal_stage
┌───────────────────────────────────────────────┐
│ file │ status │
│ String │ String │
├─────────────────────────────────────┼─────────┤
│ /Users/eric/Documents/books.parquet │ SUCCESS │
└───────────────────────────────────────────────┘
- Query the column definitions from the staged sample file:
SELECT * FROM INFER_SCHEMA(location => '@my_internal_stage/books.parquet');
---
column_name|type |nullable|order_id|
-----------+-------+--------+--------+
title |VARCHAR| 0| 0|
author |VARCHAR| 0| 1|
date |VARCHAR| 0| 2|
- Create a table named mybooks based on the staged sample file:
CREATE TABLE mybooks AS SELECT * FROM @my_internal_stage/books.parquet;
Check the created table:
DESC mybooks;
---
Field |Type |Null|Default|Extra|
------+-------+----+-------+-----+
title |VARCHAR|NO |'' | |
author|VARCHAR|NO |'' | |
date |VARCHAR|NO |'' | |
SELECT * FROM mybooks;
---
title |author |date|
----------------------------+-------------------+----+
Transaction Processing |Jim Gray |1992|
Readings in Database Systems|Michael Stonebraker|2004|