Analyzing Hits Dataset with Databend
This usecase shows how to analyze the Hits dataset with Databend.
Step 1. Deploy Databend
Make sure you have installed Databend, if not please see:
Step 2. Load hits Datasets
2.1 Create a Databend User
Connect to Databend server with MySQL client:
mysql -h127.0.0.1 -uroot -P3307
Create a user:
CREATE USER user1 IDENTIFIED BY 'abc123';
Grant privileges for the user:
GRANT ALL ON *.* TO user1;
See also How To Create User.
2.2 Create hits Table
2.3 Load Data Into hits Table
mysql -h127.0.0.1 -P3307 -uroot
COPY
data into hits
table:
Load CSV files into Databend
COPY INTO hits FROM 'https://datasets.databend.org/hits_1m.tsv.gz' FILE_FORMAT=(type=TSV compression=AUTO);
Step 3. Queries
Execute Queries:
mysql -h127.0.0.1 -P3307 -uroot
SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits;
Example Queries:
Number | Query |
---|---|
Q1 | SELECT COUNT(*) FROM hits; |
Q2 | SELECT COUNT(*) FROM hits WHERE AdvEngineID <> 0; |
Q3 | SELECT SUM(AdvEngineID), COUNT(*), AVG(ResolutionWidth) FROM hits; |
Q4 | SELECT AVG(UserID) FROM hits; |
Q5 | SELECT COUNT(DISTINCT UserID) FROM hits; |
Q6 | SELECT COUNT(DISTINCT SearchPhrase) FROM hits; |
Q7 | SELECT MIN(EventDate), MAX(EventDate) FROM hits; |
Q8 | SELECT AdvEngineID, COUNT(*) FROM hits WHERE AdvEngineID <> 0 GROUP BY AdvEngineID ORDER BY COUNT(*) DESC; |
Q9 | SELECT RegionID, COUNT(DISTINCT UserID) AS u FROM hits GROUP BY RegionID ORDER BY u DESC LIMIT 10; |
Q10 | SELECT RegionID, SUM(AdvEngineID), COUNT(*) AS c, AVG(ResolutionWidth), COUNT(DISTINCT UserID) FROM hits GROUP BY RegionID ORDER BY c DESC LIMIT 10; |