GCP AlloyDB Blog Series Part 7 : Working with AlloyDB

In the Part 1 of AlloyDB Blog Series, I discussed the Introduction of AlloyDB for PostgresSQL. Then in Part 2 of the blog series, I discussed how to do the installation of AlloyDB Omni in a GCE VM. In Part 3 of the blog series, I discussed how to connect to AlloyDB Omni using psql and pgAdmin. In Part 4 of the blog series, I discussed how to load TPCH data into the AlloyDB Omni instance and check out AlloyDB Omni’s columnar engine. In Part 5 of the blog series, I covered topic of Compare AlloyDB Omni with PostgreSQL. In Part 6 of the blog series, I discussed how to create an AlloyDB cluster on GCP and how to connect using several different approaches, including one approach from my laptop. In Part 7 of the blog series, I will discuss more about how to use AlloyDB.

After creating the AlloyDB cluster from Part 6 of the blog series, I reused the pgdb instance from Part 4 as PostgreSQL client to connect to the AlloyDB cluster. Also use the same TPCH 1G data to load into the AlloyDB cluster.

Add Database Flags

AlloyDB has many database flags (why not call them database parameters? Anyway, not a big deal to me. But it makes sense to call some of them, cluster parameters while call others, database parameters.) Under the Instances in the cluster, click EDIT PRIMARY. Then add the following database flags. Some of the flags requires cluster restart. There is one parameter work_mem not in this list, but I discuss more at the end of this blog.

google_columnar_engine.enabled=on
google_columnar_engine.enable_auto_columnarization=on
google_columnar_engine.auto_columnarization_schedule=EVERY 1 DAYS
google_db_advisor.enabled=on
alloydb.enable_pgaudit=on
pgaudit.log=all
pgaudit.log_level=info
alloydb.enable_pg_hint_plan=on
alloydb.enable_auto_explain=on

A few notes here about the flags used above:

google_columnar_engine.memory_size_in_mb flag

I did not set columnar engine memory size (google_columnar_engine.memory_size_in_mb) as it would be automatically managed in the future release. By default, it is set to 30% of instance RAM. To understand the memory usage for the columnar engine, run the following. The memory of main_pool is the one when we talk about columnar engine memory usage. The g_columnar_columns view provides the information about what tables and columns are stored in the columnar engine.

tpch_db=> select * from g_columnar_memory_usage;
     memory_name     | memory_total | memory_available | memory_available_percentage | reserved_memory 
---------------------+--------------+------------------+-----------------------------+-----------------
 main_pool           |   5035261768 |       4808379160 |                          95 |      1007052353
 recommendation_pool |     14568176 |         14546776 |                          99 |         3059316
 work_pool           |     82837320 |         82686792 |                          99 |        29821435
 column_set          |      1048576 |          1047820 |                          99 |               0
(4 rows)

Time: 1.659 ms
tpch_db=> select database_name db, schema_name schema, relation_name table, column_name column, status, avg_dictionary_count row_count, dictionary_size_in_bytes total_size, to_char(last_accessed_time, 'YYYY-MM-DD') last_access, num_times_accessed freq from g_columnar_columns;
   db    | schema |  table   |     column      | status | row_count | total_size | last_access | freq 
---------+--------+----------+-----------------+--------+-----------+------------+-------------+------
 tpch_db | tpch1g | supplier | s_suppkey       | Usable |     10000 |      40000 | 2023-08-26  |   13
 tpch_db | tpch1g | supplier | s_nationkey     | Usable |        25 |        100 | 2023-08-26  |   13
 tpch_db | tpch1g | orders   | o_orderkey      | Usable |    214285 |    6000000 | 2023-08-26  |    9
 tpch_db | tpch1g | orders   | o_custkey       | Usable |     83012 |    2324360 | 2023-08-26  |    9
 tpch_db | tpch1g | orders   | o_totalprice    | Usable |    213528 |   13452273 | 2023-08-26  |    9
 tpch_db | tpch1g | orders   | o_orderdate     | Usable |      2406 |      67368 | 2023-08-26  |    9
 tpch_db | tpch1g | lineitem | l_orderkey      | Usable |     53592 |    6002396 |             |    0
 tpch_db | tpch1g | lineitem | l_partkey       | Usable |    131017 |   14673932 |             |    0
 tpch_db | tpch1g | lineitem | l_quantity      | Usable |        50 |       7000 |             |    0
 tpch_db | tpch1g | lineitem | l_extendedprice | Usable |    189204 |   47679597 |             |    0
 tpch_db | tpch1g | lineitem | l_discount      | Usable |        11 |       1540 |             |    0
 tpch_db | tpch1g | lineitem | l_shipdate      | Usable |      2523 |     282676 |             |    0
 tpch_db | tpch1g | lineitem | l_commitdate    | Usable |      2465 |     276116 |             |    0
 tpch_db | tpch1g | lineitem | l_receiptdate   | Usable |      2540 |     284528 |             |    0
 tpch_db | tpch1g | lineitem | l_shipmode      | Usable |         7 |       1764 |             |    0
(15 rows)

Time: 1.694 ms
 tpch_db=> select database_name db, schema_name schema, relation_name table, swap_status, columns_count col_count, size, cached_size_bytes cached_size from g_columnar_units;
   db    | schema |  table   |  swap_status   | col_count |  size   | cached_size 
---------+--------+----------+----------------+-----------+---------+-------------
 tpch_db | tpch1g | supplier | In-memory Only |         2 |   73556 |       73556
 tpch_db | tpch1g | orders   | In-memory Only |         4 | 6727855 |     6727855
 tpch_db | tpch1g | orders   | In-memory Only |         4 | 6732107 |     6732107
 tpch_db | tpch1g | orders   | In-memory Only |         4 | 6730090 |     6730090
 tpch_db | tpch1g | orders   | In-memory Only |         4 | 6731324 |     6731324
 tpch_db | tpch1g | orders   | In-memory Only |         4 | 2427179 |     2427179
 tpch_db | tpch1g | orders   | In-memory Only |         4 | 6729989 |     6729989
 tpch_db | tpch1g | orders   | In-memory Only |         4 | 6728467 |     6728467
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6689252 |     6689252
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6690421 |     6690421
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6692161 |     6692161
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6692160 |     6692160
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6689679 |     6689679
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6686296 |     6686296
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6689006 |     6689006
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6690120 |     6690120
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6692390 |     6692390
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6689351 |     6689351
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6691490 |     6691490
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6687693 |     6687693
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6693173 |     6693173
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6689057 |     6689057
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6691971 |     6691971
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6689132 |     6689132
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6691352 |     6691352
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6687150 |     6687150
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 3265817 |     3265817
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6687438 |     6687438
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6690777 |     6690777
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6688847 |     6688847
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6690904 |     6690904
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6692248 |     6692248
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6686911 |     6686911
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6690797 |     6690797
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6689084 |     6689084
 tpch_db | tpch1g | lineitem | In-memory Only |         9 | 6692489 |     6692489
(36 rows)

Time: 1.511 ms

google_db_advisor.enabled flag

It controls whether AlloyDB’s index advisor is enabled or not. This advisor tracks the queries in the database and periodically analyze the queries to see whether to recommend new indexes that may increase the query performance. There are two AlloyDB internal views provide more detail information. They are google_db_advisor_recommended_indexes and google_db_advisor_workload_report.

enable_auto_columnarization flag

It allows columnar engine automatically put data in the column store and improve the performance no matter whether it is hybrid transactional and analytical processing (HTAP) or analytical workload. Right now I set it to 1 day for the default schedule. Run the following commands in psql to check out the schedules for columnar engine.

tpch_db=> SELECT * FROM g_columnar_schedules;
         schedule_name          |    schedule    |           next_time           
--------------------------------+----------------+--------------------------
 RECOMMEND_AND_POPULATE_COLUMNS | EVERY 1 DAYS   | 2023-08-27 19:31:40.967188+00
 DATABASE_ADVISOR               | EVERY 24 HOURS | 2023-08-27 19:31:40.967262+00
(2 rows)

If want to run the auto-columnarization immediately, run the following

SELECT google_columnar_engine_recommend();
SELECT database_name, schema_name, relation_name, column_name FROM g_columnar_recommended_columns;

To find out what are stored in the columnar engine, run the following query

tpch_db=> select database_name db, schema_name schema, relation_name table, status, size, columnar_unit_count col_count from g_columnar_relations;
   db    | schema |  table   | status |   size    | col_count 
---------+--------+----------+--------+-----------+-----------
 tpch_db | tpch1g | supplier | Usable |     74053 |         1
 tpch_db | tpch1g | orders   | Usable |  42807658 |         7
 tpch_db | tpch1g | lineitem | Usable | 183898365 |        28
(3 rows)

Time: 1.355 ms

If you know what you want to add to the columnar engine, you can use either of the two approaches below to add the table or column data into the columnar engine.

-- add table nation to columnar engine
SELECT google_columnar_engine_add('nation');
-- add o_order_key column of order table to columnar engine
SELECT google_columnar_engine_add('order', 'o_order_key');

For a complete list of database flag, please check out Google database here.

Enable Extensions

After the above database flags are setting on, enable the following extensions in the database:

CREATE EXTENSION IF NOT EXISTS google_columnar_engine;
CREATE EXTENSION IF NOT EXISTS pg_hint_plan;
CREATE EXTENSION IF NOT EXISTS pgaudit;
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS postgres_fdw;
CREATE EXTENSION IF NOT EXISTS sslinfo

Use Query Insights

Query Insights in AlloyDB is a very nice tool to show query performance and detail execution plan. For people like me who use to Oracle OEM, this service is definitely a great plus to provide an idea where is the performance issue in the cluster. Here are some of the screenshot from the Query Insights.

If the query is slow, click the query and check out the execution plan.

It has two kinds of view: Query Plan and End to End. Query plan can give you the detail information about each step during the execution while End to End provides the total timing for each major category.

Review Performance Using Logs Explorer

Not only Query Insights, but also GCP Logs Explorer can provide additional information about the query performance. To use Logs Explorer, choose Resource Type as AlloyDB instance, then pick the time window to review the detail logs from AlloyDB. Here is one sample screenshot.

The interesting observation from the above log show some temporary files are created with size from around 13 MB to 30 MB. Temporary files are usually created when the query requires sort operation and work_mem parameter is not big enough.

When sort requires more memory than this parameter, it has to write to temporary files on disk. The sort operation include SQL statements like ORDER BY, DISTINCT, merge join and building hash table during the hash join. If the sort operation can be performed within the work_mem, it uses quicksort algorithm. If no enough memory in work_mem, external merge disk algorithm is used to split between disk and memory, and then merge the result in the end. This is usually a costly operation as physical IO is required. So setting this parameter to a correct value is very important to improve the query performance.

Although the work_mem parameter can be set to a value between 64KB and 2TB, it is 4MB by default. Because this is usually not enough for analytical type of workload, 65536 (64MB) is usually the recommended value for work_mem parameter. Can I set this parameter to a much bigger value? Yes, of course, but it may come with cost. It depends on max_connections parameter. Each database connection will have its own work_mem area. With too many database connections and large size of work_mem, you may see out of memory issue during busy load period. So determine the right value for the workload is important. Shane Borden has a nice blog talking about work_mem parameter and you can find the check out his article at Tuning the PostgreSQL “work_mem” Parameter.

There are another reason why we can see these kind of temporary files. By default, the log_temp_files parameter is set to 0. The value of 0 for log_temp_files means all temporary files, big or small, are going to show up in the log entry. If setting this parameter to 20480, then only temporary files with size >= 20 MB will show up in the logs.

Final Thoughts

At this moment, I complete this blog series of the introduction of AlloyDB and AlloyDB Omni. Hopefully this blog series can help people to learn and use AlloyDB products. I also list some of good articles about AlloyDB below. Enjoy reading!

Verify Columnar Engine Usage with AlloyDB Operational Views by Anish Nair

Leverage Google Cloud Logging + Monitoring for Custom Cloud SQL for Postgres or AlloyDB Alerts by Shane Borden

Essential SQL for exploring AlloyDB Columnar Engine by Deepak Mahto

Enhancing Google Cloud AlloyDB Interaction with Custom psql Shortcuts by Deepak Mahto

An Initial Test of Google’s AlloyDB Columnar Engine by Simon Pane

GCP AlloyDB Blog Series Part 4 : Load TPCH Data into AlloyDB Omni

In the Part 1 of AlloyDB Blog Series, I discussed the Introduction of AlloyDB for PostgresSQL. Then in Part 2 of the blog series, I discussed how to do the installation of AlloyDB Omni in a GCE VM. In Part 3 of the blog series, I discussed how to connect to AlloyDB Omni using psql and pgAdmin. In Part 4 of the blog series, I will discuss how to load TPCH data into the AlloyDB Omni instance and check out AlloyDB Omni‘s columnar engine.

Generate TPCH Data

TPCH is a popular benchmarking dataset generation tools and a set of associated queries. Many organizations use it for benchmarking database performance. The dataset can be generated based on size criteria, such as 1GB, 100GB, or much bigger. For my little VM, 1GB dataset will be fine. The largest table, lineitem, contains 6 million rows in 1GB dataset. The following shows the detail of the row count for each tables.

So you can see, 1000GB dataset will have 6 billion rows in lineitem table and 1.5 billion rows in orders table. It is possible to download 1GB dataset somewhere from internet. But I prefer to build up the dataset by myself. The tools to build up this dataset is called, dbgen. Before I can use this tools, I need to install a few packages to allow me to compile dbgen source code. Those packages has been taken care of in the Step 2 of Part 2 Blog Series. So I do not need to install them anymore. Just execute the following to download and compile dbgen code.

cd
mkdir tpch
cd tpch
wget https://github.com/electrum/tpch-dbgen/archive/refs/heads/master.zip
unzip master.zip
cd tpch-dbgen-master/
echo "#define EOL_HANDLING 1" >> config.h # remove the tail '|'
make

You may see some redefine warning messages. Just ignore them. Ok, let generate the 1GB dataset with 1 file per table. To improve the load performance for large dataset,

./dbgen -s 1 -C 1 -f 
mkdir -p ../data/tpch1g
mv *.tbl ../data/tpch1g
ls -l ../data/tpch1g

Here is the result looks like

[alloyuser@omnidb tpch-dbgen-master]$ ./dbgen -s 1 -C 1 -f 
TPC-H Population Generator (Version 2.14.0)
Copyright Transaction Processing Performance Council 1994 - 2010
[alloyuser@omnidb tpch-dbgen-master]$ mkdir -p ../data/tpch1g
[alloyuser@omnidb tpch-dbgen-master]$ mv *.tbl ../data/tpch1g
[alloyuser@omnidb tpch-dbgen-master]$ ls -l ../data/tpch1g
total 1066456
-rw-r--r--. 1 alloyuser adm  24196144 Aug 20 20:31 customer.tbl
-rw-r--r--. 1 alloyuser adm 753862072 Aug 20 20:31 lineitem.tbl
-rw-r--r--. 1 alloyuser adm      2199 Aug 20 20:31 nation.tbl
-rw-r--r--. 1 alloyuser adm 170452161 Aug 20 20:31 orders.tbl
-rw-r--r--. 1 alloyuser adm 118184616 Aug 20 20:31 partsupp.tbl
-rw-r--r--. 1 alloyuser adm  23935125 Aug 20 20:31 part.tbl
-rw-r--r--. 1 alloyuser adm       384 Aug 20 20:31 region.tbl
-rw-r--r--. 1 alloyuser adm   1399184 Aug 20 20:31 supplier.tbl

Create TPCH Loading Scripts

Ok, the data files are ready. Let me work on the table DDL scripts.

cd /home/alloyuser/tpch
mkdir ddl
cd ddl
vi tpch_create_table.sql
vi tpch_create_index.sql
vi tpch_load_1g_data.sql

Create tpch_create_table.sql, tpch_create_index.sql and tpch_load_1g_data.sql scripts.

tpch_create_table.sql

-- This script requires the definition of cur_schema_name variable
select :'cur_schema_name' "Create Table : cur_schema_name";

-- nation
CREATE TABLE IF NOT EXISTS :"cur_schema_name".nation (
  "n_nationkey"  INT,
  "n_name"       CHAR(25),
  "n_regionkey"  INT,
  "n_comment"    VARCHAR(152)
  );

-- region
CREATE TABLE IF NOT EXISTS :"cur_schema_name".region (
  "r_regionkey"  INT,
  "r_name"       CHAR(25),
  "r_comment"    VARCHAR(152)
  );

-- supplier
CREATE TABLE IF NOT EXISTS :"cur_schema_name".supplier (
  "s_suppkey"     INT,
  "s_name"        CHAR(25),
  "s_address"     VARCHAR(40),
  "s_nationkey"   INT,
  "s_phone"       CHAR(15),
  "s_acctbal"     DECIMAL(15,2),
  "s_comment"     VARCHAR(101)
  );

-- customer
CREATE TABLE IF NOT EXISTS :"cur_schema_name".customer (
  "c_custkey"     INT,
  "c_name"        VARCHAR(25),
  "c_address"     VARCHAR(40),
  "c_nationkey"   INT,
  "c_phone"       CHAR(15),
  "c_acctbal"     DECIMAL(15,2),
  "c_mktsegment"  CHAR(10),
  "c_comment"     VARCHAR(117)
  );

-- part
CREATE TABLE IF NOT EXISTS :"cur_schema_name".part (
  "p_partkey"     INT,
  "p_name"        VARCHAR(55),
  "p_mfgr"        CHAR(25),
  "p_brand"       CHAR(10),
  "p_type"        VARCHAR(25),
  "p_size"        INT,
  "p_container"   CHAR(10),
  "p_retailprice" DECIMAL(15,2) ,
  "p_comment"     VARCHAR(23) 
);

-- partsupp
CREATE TABLE IF NOT EXISTS :"cur_schema_name".partsupp (
  "ps_partkey"     INT,
  "ps_suppkey"     INT,
  "ps_availqty"    INT,
  "ps_supplycost"  DECIMAL(15,2),
  "ps_comment"     VARCHAR(199)
  );

-- orders
CREATE TABLE IF NOT EXISTS :"cur_schema_name".orders (
  "o_orderkey"       INT,
  "o_custkey"        INT,
  "o_orderstatus"    CHAR(1),
  "o_totalprice"     DECIMAL(15,2),
  "o_orderdate"      DATE,
  "o_orderpriority"  CHAR(15),
  "o_clerk"          CHAR(15),
  "o_shippriority"   INT,
  "o_comment"        VARCHAR(79)
  );

-- lineitem
CREATE TABLE IF NOT EXISTS :"cur_schema_name".lineitem (
  "l_orderkey"          INT,
  "l_partkey"           INT,
  "l_suppkey"           INT,
  "l_linenumber"        INT,
  "l_quantity"          DECIMAL(15,2),
  "l_extendedprice"     DECIMAL(15,2),
  "l_discount"          DECIMAL(15,2),
  "l_tax"               DECIMAL(15,2),
  "l_returnflag"        CHAR(1),
  "l_linestatus"        CHAR(1),
  "l_shipdate"          DATE,
  "l_commitdate"        DATE,
  "l_receiptdate"       DATE,
  "l_shipinstruct"      CHAR(25),
  "l_shipmode"          CHAR(10),
  "l_comment"           VARCHAR(44)
  );

tpch_create_index.sql

-- This script requires the definition of cur_schema_name variable

select :'cur_schema_name' "Create Table : cur_schema_name";

create unique index c_ck on :"cur_schema_name".customer (c_custkey asc) ;
create index c_nk on :"cur_schema_name".customer (c_nationkey asc) ;
create unique index p_pk on :"cur_schema_name".part (p_partkey asc) ;
create unique index s_sk on :"cur_schema_name".supplier (s_suppkey asc) ;
create index s_nk on :"cur_schema_name".supplier (s_nationkey asc) ;
create index ps_pk on :"cur_schema_name".partsupp (ps_partkey asc) ;
create index ps_sk on :"cur_schema_name".partsupp (ps_suppkey asc) ;
create unique index ps_pk_sk on :"cur_schema_name".partsupp (ps_partkey asc, ps_suppkey asc) ;
create unique index ps_sk_pk on :"cur_schema_name".partsupp (ps_suppkey asc, ps_partkey asc) ;
create unique index o_ok on :"cur_schema_name".orders (o_orderkey asc) ;
create index o_ck on :"cur_schema_name".orders (o_custkey asc) ;
create index o_od on :"cur_schema_name".orders (o_orderdate asc) ;
create index l_ok on :"cur_schema_name".lineitem (l_orderkey asc) ;
create index l_pk on :"cur_schema_name".lineitem (l_partkey asc) ;
create index l_sk on :"cur_schema_name".lineitem (l_suppkey asc) ;
--create index l_ln on :"cur_schema_name".lineitem (l_linenumber asc) ;
create index l_sd on :"cur_schema_name".lineitem (l_shipdate asc) ;
create index l_cd on :"cur_schema_name".lineitem (l_commitdate asc) ;
create index l_rd on :"cur_schema_name".lineitem (l_receiptdate asc) ;
--create unique index l_ok_ln on :"cur_schema_name".lineitem (l_orderkey asc, l_linenumber asc) ;
--create unique index l_ln_ok on :"cur_schema_name".lineitem (l_linenumber asc, l_orderkey asc) ;
create index l_pk_sk on :"cur_schema_name".lineitem (l_partkey asc, l_suppkey asc) ;
create index l_sk_pk on :"cur_schema_name".lineitem (l_suppkey asc, l_partkey asc) ;
create unique index n_nk on :"cur_schema_name".nation (n_nationkey asc) ;
create index n_rk on :"cur_schema_name".nation (n_regionkey asc) ;
create unique index r_rk on :"cur_schema_name".region (r_regionkey asc) ;

tpch_load_1g_data.sql

\copy :"cur_schema_name".region     from '/home/alloyuser/tpch/data/tpch1g/region.tbl'        DELIMITER '|' CSV;
\copy :"cur_schema_name".nation     from '/home/alloyuser/tpch/data/tpch1g/nation.tbl'        DELIMITER '|' CSV;
\copy :"cur_schema_name".customer   from '/home/alloyuser/tpch/data/tpch1g/customer.tbl'    DELIMITER '|' CSV;
\copy :"cur_schema_name".supplier   from '/home/alloyuser/tpch/data/tpch1g/supplier.tbl'    DELIMITER '|' CSV;
\copy :"cur_schema_name".part       from '/home/alloyuser/tpch/data/tpch1g/part.tbl'            DELIMITER '|' CSV;
\copy :"cur_schema_name".partsupp   from '/home/alloyuser/tpch/data/tpch1g/partsupp.tbl'    DELIMITER '|' CSV;
\copy :"cur_schema_name".orders     from '/home/alloyuser/tpch/data/tpch1g/orders.tbl'        DELIMITER '|' CSV;
\copy :"cur_schema_name".lineitem   from '/home/alloyuser/tpch/data/tpch1g/lineitem.tbl'    DELIMITER '|' CSV;

Create Tables for TPCH and Load Data

Right now we have both data and SQL scripts ready. We can create database, schema and tables using the commands below. Note: do not execute these scripts and commands from psql within the Omni container as the scripts and data are in the hosting VM, not the container.

Create DB/Schema/Tables and Load Data

psql -h localhost -U postgres

CREATE EXTENSION IF NOT EXISTS google_columnar_engine;

CREATE USER tpchuser WITH PASSWORD 'alloy123' CREATEDB;
CREATE DATABASE tpch_db OWNER tpchuser;

\c tpch_db
CREATE SCHEMA tpch1g;
set search_path to tpch1g;

\set cur_schema_name 'tpch1g'
\set cur_user 'tpchuser'
\set root_path '/home/alloyuser/tpch'

select :'root_path' "root_path";
select :'cur_schema_name' "cur_schema_name";
select :'cur_user' "cur_user";

GRANT SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA :"cur_schema_name" TO :"cur_user";
ALTER DEFAULT PRIVILEGES IN SCHEMA :"cur_schema_name" GRANT SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER ON TABLES TO :"cur_user";

\i :root_path/ddl/tpch_create_table.sql
\i :root_path/ddl/tpch_create_index.sql 
\q

The execution result is shown below:

postgres=# CREATE USER tpchuser WITH PASSWORD 'alloy123' CREATEDB;
CREATE ROLE
postgres=# CREATE DATABASE tpch_db OWNER tpchuser;
CREATE DATABASE
postgres=# \c tpch_db
psql (13.11, server 15.2)
WARNING: psql major version 13, server major version 15.
         Some psql features might not work.
You are now connected to database "tpch_db" as user "postgres".
tpch_db=# CREATE EXTENSION IF NOT EXISTS google_columnar_engine;
NOTICE:  extension "google_columnar_engine" already exists, skipping
CREATE EXTENSION
tpch_db=# \c tpch_db tpchuser
psql (13.11, server 15.2)
WARNING: psql major version 13, server major version 15.
         Some psql features might not work.
You are now connected to database "tpch_db" as user "tpchuser".
tpch_db=> \conninfo
You are connected to database "tpch_db" as user "tpchuser" on host "localhost" (address "::1") at port "5432".
tpch_db=> CREATE SCHEMA tpch1g;
CREATE SCHEMA
tpch_db=> set search_path to tpch1g;
SET
tpch_db=> \set cur_schema_name 'tpch1g'
tpch_db=> \set cur_user 'tpchuser'
tpch_db=> \set root_path '/home/alloyuser/tpch'
tpch_db=> select :'root_path' "root_path";
      root_path       
----------------------
 /home/alloyuser/tpch
(1 row)

tpch_db=> select :'cur_schema_name' "cur_schema_name";
 cur_schema_name 
-----------------
 tpch1g
(1 row)

tpch_db=> select :'cur_user' "cur_user";
 cur_user 
----------
 tpchuser
(1 row)

tpch_db=> GRANT SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA :"cur_schema_name" TO :"cur_user";
GRANT
tpch_db=> ALTER DEFAULT PRIVILEGES IN SCHEMA :"cur_schema_name" GRANT SELECT, INSERT, UPDATE, DELETE, TRUNCATE, REFERENCES, TRIGGER ON TABLES TO :"cur_user";
ALTER DEFAULT PRIVILEGES
tpch_db=> \i :root_path/ddl/tpch_create_table.sql
 Create Table : cur_schema_name 
--------------------------------
 tpch1g
(1 row)

CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
tpch_db=> \i :root_path/ddl/tpch_create_index.sql 
 Create Table : cur_schema_name 
--------------------------------
 tpch1g
(1 row)

CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX
CREATE INDEX

tpch_db=> \i :root_path/ddl/tpch_load_1g_data.sql
COPY 5
COPY 25
COPY 150000
COPY 10000
COPY 200000
COPY 800000
COPY 1500000
COPY 6001215
tpch_db=> \q
[alloyuser@omnidb ~]$ 

If for some reason you want to set the table to a different user, you can run the following commands to generate the SQL statements to rename table owner.

psql -h localhost -U postgres
\set cur_schema_name 'tpch1g'
\set cur_user 'anotheruser'
\set root_path '/home/alloyuser/tpch'

select :'root_path' "root_path";
select :'cur_schema_name' "cur_schema_name";
select :'cur_user' "cur_user";

\c tpch_db
\pset tuples_only
\o :root_path/ddl/tpch_change_table_owner.sql
select 'ALTER TABLE ' || schemaname || '.' || tablename || ' OWNER TO ' || :'cur_user' || ';' 
from pg_tables
where schemaname = :'cur_schema_name';
\o 
\i :root_path/ddl/tpch_change_table_owner.sql
\q

Execute TPCH Queries

Right now we have the data ready. Let’s do some checking with one TPCH Query. The query is Query 6 : Forecasting Revenue Change Query. This query quantifies the amount of revenue increase that would have resulted from eliminating certain companywide discounts in a given percentage range in a given year. Asking this type of “what if” query can be used to look for ways to increase revenues. I will start with enable_columnar_scan parameter off. Run twice to make sure no major impact from caching.

[alloyuser@omnidb ddl]$ psql -h localhost -U tpchuser -d tpch_db
psql (13.11, server 15.2)
WARNING: psql major version 13, server major version 15.
         Some psql features might not work.
Type "help" for help.

tpch_db=> show google_columnar_engine.enable_columnar_scan;
 google_columnar_engine.enable_columnar_scan 
---------------------------------------------
 on
(1 row)

tpch_db=> SET google_columnar_engine.enable_columnar_scan=off;
SET
tpch_db=> show google_columnar_engine.enable_columnar_scan;
 google_columnar_engine.enable_columnar_scan 
---------------------------------------------
 off
(1 row)

tpch_db=> set search_path to tpch1g,public;
SET
tpch_db=> show search_path;
  search_path   
----------------
 tpch1g, public
(1 row)

tpch_db=> \timing on
Timing is on.
tpch_db=> select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1995-01-01'
        and l_shipdate < date '1995-01-01' + interval '1' year
        and l_discount between 0.09 - 0.01 and 0.09 + 0.01
        and l_quantity < 24
limit 1;
    revenue     
----------------
 185724059.4734
(1 row)

Time: 284.483 ms
tpch_db=> select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1995-01-01'
        and l_shipdate < date '1995-01-01' + interval '1' year
        and l_discount between 0.09 - 0.01 and 0.09 + 0.01
        and l_quantity < 24
limit 1;
    revenue     
----------------
 185724059.4734
(1 row)

Time: 274.554 ms

Run twice for the queries and timing is around 270~280 ms range with enable_columnar_scan off. Ok, let’s run it again with enable_columnar_scan on.

tpch_db=> SET google_columnar_engine.enable_columnar_scan=on;
SET
Time: 0.200 ms
tpch_db=> select                                             
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1995-01-01'
        and l_shipdate < date '1995-01-01' + interval '1' year
        and l_discount between 0.09 - 0.01 and 0.09 + 0.01
        and l_quantity < 24
limit 1;
    revenue     
----------------
 185724059.4734
(1 row)

Time: 278.193 ms

Interesting. It seems timing is similar with parameter off. Let me do another trick with google_columnar_engine_recommend and let see the result.

tpch_db=> SELECT * FROM g_columnar_columns;
 database_name | schema_name | relation_name | column_name | column_type | column_format | compression_level | status | avg_dictionary_
count | avg_encoded_values_count | dictionary_size_in_bytes | encoded_values_size_in_bytes | size_in_bytes | uncompressed_dictionary_si
ze_in_bytes | uncompressed_encoded_values_size_in_bytes | uncompressed_size_in_bytes | last_accessed_time | num_times_accessed 
---------------+-------------+---------------+-------------+-------------+---------------+-------------------+--------+----------------
------+--------------------------+--------------------------+------------------------------+---------------+---------------------------
------------+-------------------------------------------+----------------------------+--------------------+--------------------
(0 rows)

Time: 0.420 ms
tpch_db=> SELECT google_columnar_engine_recommend();

                          google_columnar_engine_recommend                          
------------------------------------------------------------------------------------
 (3000,"tpch_db.tpch1g.lineitem(l_discount,l_extendedprice,l_quantity,l_shipdate)")
(1 row)

Time: 26422.687 ms (00:26.423)
tpch_db=> 
tpch_db=> SELECT * FROM g_columnar_columns;
 database_name | schema_name | relation_name |   column_name   | column_type | column_format | compression_level | status | avg_diction
ary_count | avg_encoded_values_count | dictionary_size_in_bytes | encoded_values_size_in_bytes | size_in_bytes | uncompressed_dictionar
y_size_in_bytes | uncompressed_encoded_values_size_in_bytes | uncompressed_size_in_bytes | last_accessed_time | num_times_accessed 
---------------+-------------+---------------+-----------------+-------------+---------------+-------------------+--------+------------
----------+--------------------------+--------------------------+------------------------------+---------------+-----------------------
----------------+-------------------------------------------+----------------------------+--------------------+--------------------
 tpch_db       | tpch1g      | lineitem      | l_quantity      | numeric     |             0 |                 1 | Usable |            
       50 |                   214329 |                     7000 |                      6005695 |       6019863 |                       
           7000 |                                   6005695 |                    6019863 |                    |                  0
 tpch_db       | tpch1g      | lineitem      | l_extendedprice | numeric     |             0 |                 1 | Usable |            
   189204 |                   214329 |                 47679597 |                     24020092 |      71707065 |                       
       47679597 |                                  24020092 |                   71707065 |                    |                  0
 tpch_db       | tpch1g      | lineitem      | l_discount      | numeric     |             0 |                 1 | Usable |            
       11 |                   214329 |                     1540 |                      6005695 |       6014403 |                       
           1540 |                                   6005695 |                    6014403 |                    |                  0
 tpch_db       | tpch1g      | lineitem      | l_shipdate      | date        |             0 |                 1 | Usable |            
     2523 |                   214329 |                   282676 |                     12010494 |      12299890 |                       
         282676 |                                  12010494 |                   12299890 |                    |                  0
(4 rows)

Time: 1.787 ms
tpch_db=> 
tpch_db=> select                           
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1995-01-01'
        and l_shipdate < date '1995-01-01' + interval '1' year
        and l_discount between 0.09 - 0.01 and 0.09 + 0.01
        and l_quantity < 24
limit 1;
    revenue     
----------------
 185724059.4734
(1 row)

Time: 28.820 ms
tpch_db=> select
        sum(l_extendedprice * l_discount) as revenue
from
        lineitem
where
        l_shipdate >= date '1995-01-01'
        and l_shipdate < date '1995-01-01' + interval '1' year
        and l_discount between 0.09 - 0.01 and 0.09 + 0.01
        and l_quantity < 24
limit 1;
    revenue     
----------------
 185724059.4734
(1 row)

Time: 26.608 ms

Wow, the result is much better and drop to less than 30 ms, which is about 10 times faster. I think the key reason is that I run SELECT google_columnar_engine_recommend() statement, which is doing some magic to put certain data in the columnar store. You may notice I use g_columnar_columns view. This view is one of 30 views providing stats about AlloyDB on columnar engine and the index advisory. This g_columnar_columns view provides detail information about the columns that are in the columnar store. If a column is frequently used, it is likely to stay in the columnar store. It will have significant impact in the performance of the queries. From above, you can see that there is no rows in view. After I run google_columnar_engine_recommend statement, it has four rows. This caused the performance improvement in the same query. I will have a separate blog to discuss the columnar views in AlloyDB.

So far I show the basic of the AlloyDB Omni. One common question will be asked: how this compare with same configuration of PostgreSQL database? In the next blog, I will create a GCE VM with identical configuration and load the same TPCH data. Let’s see how is performance difference between AlloyDB Omni and PostgreSQL. Stay tune.