3.测试数据和查询语句的产生步骤
主要步骤参照本系列第一篇文章《Oracle 11g R2企业版评测》。然后需要针对db2的特性作修改。
Db2对SQL语句的特殊要求主要有2点:
(1)取前若干行的语法。传统的db2语法不支持rownum伪列,需要修改为分析函数rownumber()over()或用fetch first n rows only子句,其中n是整数,9.7版本开始也支持Oracle的rownum写法。需要按前面描述的步骤启动兼容Oracle模式。
(2)日期间隔的表达式。不支持interval 'n' year/month等写法,要改为 n year/month。
4.数据压缩测试
对表进行压缩有2种方法,一种是在创建表的时候指定压缩属性,再加载数据,另一种是对已经包含数据的非压缩表进行修改压缩属性的操作,然后重新整理表。另外db2也提供了inspect功能,可以预测一个表压缩后的大小,供人作决定前参考。
--方法1,对空压缩表加载数据
SQL> alter table tpch.part compress yes;
DB250000I: 成功地完成该命令。
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"
NPAGES
--------------------
9194
1 record(s) selected.
-sh-3.2$ db2 "truncate table tpch.part"
DB20000I The SQL command completed successfully.
-sh-3.2$ db2 "RUNSTATS ON TABLE tpch.part"
DB20000I The RUNSTATS command completed successfully.
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"
NPAGES
--------------------
0
1 record(s) selected.
-sh-3.2$ date;db2 load from /user1/ibm/part.tbl of del MODIFIED BY COLDELX7C MESSAGES /tmp/msg.tmp insert into tpch.part ;date
2011年 05月 10日 星期二 21:48:43 CST
Number of rows committed = 2000000
2011年 05月 10日 星期二 21:48:46 CST
-sh-3.2$ db2 "RUNSTATS ON TABLE tpch.part"
DB20000I The RUNSTATS command completed successfully.
--占用空间由9194个数据页(每页32K)下降到3403页
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"
NPAGES
--------------------
3403
1 record(s) selected.
--方法2,对已经包含数据的表重整
-sh-3.2$ db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"
1 2
-------------------- --------------------
1234144 PARTSUPP
-sh-3.2$ db2 "alter table tpch.partsupp compress yes"
DB20000I The SQL command completed successfully.
-sh-3.2$ time db2 "REORG table tpch.partsupp resetdictionary"
DB20000I The REORG command completed successfully.
-sh-3.2$ time db2 "runstats on table tpch.partsupp"
DB20000I The RUNSTATS command completed successfully.
--占用空间从1.2GB下降到438MB
-sh-3.2$ db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"
1 2
-------------------- --------------------
438016 PARTSUPP
--预测表压缩效果
db2 => inspect ROWCOMPESTIMATE TABLE name supplier schema tpch results a.1;
DB20000I The INSPECT command completed successfully.
db2 => quit;
DB20000I The QUIT command completed successfully.
-sh-3.2$ db2inspf ~/sqllib/db2dump/a.1 /tmp/a.2
-sh-3.2$ ll /tmp/a.2
-rw-r----- 1 db2inst1 db2grp 924 05-06 22:02 /tmp/a.2
-sh-3.2$ cat /tmp/a.2
DATABASE: SAMPLE
VERSION : SQL09071
2011-05-06-21.57.28.679301
Action: ROWCOMPESTIMATE TABLE
Schema name: TPCH
Table name: SUPPLIER
Tablespace ID: 3 Object ID: 8
Result file name: a.1
Table phase start (ID Signed: 8, Unsigned: 8; Tablespace ID: 3) : TPCH.SUPPLIER
Data phase start. Object: 8 Tablespace: 3
Row compression estimate results:
Percentage of pages saved from compression: 45
Percentage of bytes saved from compression: 45
Compression dictionary size: 36224 bytes.
Expansion dictionary size: 32768 bytes.
Data phase end.
Table phase end.
Processing has completed. 2011-05-06-21.57.46.316025
-sh-3.2$
--压缩前字节数
[db2inst1@aix:/home/db2inst1]#>db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"
1 2
-------------------- --------------------
32 REGION
294208 PART
15808 SUPPLIER
262304 CUSTOMER
1234144 PARTSUPP
32 NATION
1768256 ORDERS
8604032 LINEITEM
8 record(s) selected.
--压缩后
1 2
-------------------- --------------------
32 REGION
96000 PART
8768 SUPPLIER
138752 CUSTOMER
425856 PARTSUPP
32 NATION
752576 ORDERS
3396160 LINEITEM
SQL> alter table tpch.part compress yes;
DB250000I: 成功地完成该命令。
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"
NPAGES
--------------------
9194
1 record(s) selected.
-sh-3.2$ db2 "truncate table tpch.part"
DB20000I The SQL command completed successfully.
-sh-3.2$ db2 "RUNSTATS ON TABLE tpch.part"
DB20000I The RUNSTATS command completed successfully.
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"
NPAGES
--------------------
0
1 record(s) selected.
-sh-3.2$ date;db2 load from /user1/ibm/part.tbl of del MODIFIED BY COLDELX7C MESSAGES /tmp/msg.tmp insert into tpch.part ;date
2011年 05月 10日 星期二 21:48:43 CST
Number of rows committed = 2000000
2011年 05月 10日 星期二 21:48:46 CST
-sh-3.2$ db2 "RUNSTATS ON TABLE tpch.part"
DB20000I The RUNSTATS command completed successfully.
--占用空间由9194个数据页(每页32K)下降到3403页
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"
NPAGES
--------------------
3403
1 record(s) selected.
--方法2,对已经包含数据的表重整
-sh-3.2$ db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"
1 2
-------------------- --------------------
1234144 PARTSUPP
-sh-3.2$ db2 "alter table tpch.partsupp compress yes"
DB20000I The SQL command completed successfully.
-sh-3.2$ time db2 "REORG table tpch.partsupp resetdictionary"
DB20000I The REORG command completed successfully.
-sh-3.2$ time db2 "runstats on table tpch.partsupp"
DB20000I The RUNSTATS command completed successfully.
--占用空间从1.2GB下降到438MB
-sh-3.2$ db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"
1 2
-------------------- --------------------
438016 PARTSUPP
--预测表压缩效果
db2 => inspect ROWCOMPESTIMATE TABLE name supplier schema tpch results a.1;
DB20000I The INSPECT command completed successfully.
db2 => quit;
DB20000I The QUIT command completed successfully.
-sh-3.2$ db2inspf ~/sqllib/db2dump/a.1 /tmp/a.2
-sh-3.2$ ll /tmp/a.2
-rw-r----- 1 db2inst1 db2grp 924 05-06 22:02 /tmp/a.2
-sh-3.2$ cat /tmp/a.2
DATABASE: SAMPLE
VERSION : SQL09071
2011-05-06-21.57.28.679301
Action: ROWCOMPESTIMATE TABLE
Schema name: TPCH
Table name: SUPPLIER
Tablespace ID: 3 Object ID: 8
Result file name: a.1
Table phase start (ID Signed: 8, Unsigned: 8; Tablespace ID: 3) : TPCH.SUPPLIER
Data phase start. Object: 8 Tablespace: 3
Row compression estimate results:
Percentage of pages saved from compression: 45
Percentage of bytes saved from compression: 45
Compression dictionary size: 36224 bytes.
Expansion dictionary size: 32768 bytes.
Data phase end.
Table phase end.
Processing has completed. 2011-05-06-21.57.46.316025
-sh-3.2$
--压缩前字节数
[db2inst1@aix:/home/db2inst1]#>db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"
1 2
-------------------- --------------------
32 REGION
294208 PART
15808 SUPPLIER
262304 CUSTOMER
1234144 PARTSUPP
32 NATION
1768256 ORDERS
8604032 LINEITEM
8 record(s) selected.
--压缩后
1 2
-------------------- --------------------
32 REGION
96000 PART
8768 SUPPLIER
138752 CUSTOMER
425856 PARTSUPP
32 NATION
752576 ORDERS
3396160 LINEITEM
从上面表的占用空间可见,对于TPC-H数据,因为dbgen生成的数据比较随机,又是符合第3范式的,冗余较少,压缩的效果不太明显。除了很小的表至少占用一个数据页没有压缩外,平均压缩为原来的1/3,和gzip的压缩率差不多,已经很不错了。