技术开发 频道

主流行式数据库评测:DB2 9.7企业版

  3.测试数据和查询语句的产生步骤

  主要步骤参照本系列第一篇文章《Oracle 11g R2企业版评测》。然后需要针对db2的特性作修改。

  Db2对SQL语句的特殊要求主要有2点:

  (1)取前若干行的语法。传统的db2语法不支持rownum伪列,需要修改为分析函数rownumber()over()或用fetch first n rows only子句,其中n是整数,9.7版本开始也支持Oracle的rownum写法。需要按前面描述的步骤启动兼容Oracle模式。

  (2)日期间隔的表达式。不支持interval 'n' year/month等写法,要改为 n year/month。

  4.数据压缩测试

  对表进行压缩有2种方法,一种是在创建表的时候指定压缩属性,再加载数据,另一种是对已经包含数据的非压缩表进行修改压缩属性的操作,然后重新整理表。另外db2也提供了inspect功能,可以预测一个表压缩后的大小,供人作决定前参考。

--方法1,对空压缩表加载数据
SQL> alter table tpch.part compress yes;

DB250000I: 成功地完成该命令。

-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"

NPAGES              
--------------------
                9194

  
1 record(s) selected.
-sh-3.2$ db2 "truncate table tpch.part"
DB20000I  The SQL command completed successfully.

-sh-3.2$ db2 "RUNSTATS ON TABLE tpch.part"
DB20000I  The RUNSTATS command completed successfully.
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"

NPAGES              
--------------------
                   0

  
1 record(s) selected.

-sh-3.2$ date;db2 load from  /user1/ibm/part.tbl of del MODIFIED BY COLDELX7C MESSAGES /tmp/msg.tmp insert into tpch.part ;date
2011年 05月 10日 星期二
21:48:43 CST

Number of rows committed    = 2000000

2011年 05月 10日 星期二
21:48:46 CST

-sh-3.2$ db2 "RUNSTATS ON TABLE tpch.part"
DB20000I  The RUNSTATS command completed successfully.
--占用空间由9194个数据页(每页32K)下降到3403页
-sh-3.2$ db2 "select npages from syscat.tables where TABNAME='PART'"

NPAGES              
--------------------
                3403

  
1 record(s) selected.
--方法2,对已经包含数据的表重整
-sh-3.2$ db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"

1                    2                  
-------------------- --------------------
             1234144 PARTSUPP            


-sh-3.2$ db2 "alter table tpch.partsupp compress yes"
DB20000I  The SQL command completed successfully.
-sh-3.2$ time db2 "REORG table tpch.partsupp resetdictionary"
DB20000I  The REORG command completed successfully.
-sh-3.2$ time db2 "runstats on table tpch.partsupp"
DB20000I  The RUNSTATS command completed successfully.

--占用空间从1.2GB下降到438MB
-sh-3.2$ db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"

1                    2                  
-------------------- --------------------
              438016 PARTSUPP            


--预测表压缩效果


db2
=> inspect ROWCOMPESTIMATE TABLE name supplier schema tpch results a.1;
DB20000I  The INSPECT command completed successfully.
db2
=> quit;
DB20000I  The QUIT command completed successfully.

-sh-3.2$ db2inspf ~/sqllib/db2dump/a.1 /tmp/a.2
-sh-3.2$ ll /tmp/a.2
-rw-r----- 1 db2inst1 db2grp 924 05-06 22:02 /tmp/a.2
-sh-3.2$ cat /tmp/a.2
DATABASE: SAMPLE                                                              
VERSION : SQL09071                                                            
2011-05-06-21.57.28.679301                                                    
                                                                              

Action: ROWCOMPESTIMATE
TABLE
Schema name: TPCH    
Table name: SUPPLIER
Tablespace ID:
3  Object ID: 8
Result
file name: a.1

    
Table phase start (ID Signed: 8, Unsigned: 8; Tablespace ID: 3) : TPCH.SUPPLIER
    
      Data phase start. Object:
8  Tablespace: 3
      Row compression estimate results:
      Percentage
of pages saved from compression: 45
      Percentage
of bytes saved from compression: 45
      Compression dictionary size:
36224 bytes.
      Expansion dictionary size:
32768 bytes.
      Data phase
end.
    
Table phase end.
Processing has completed.
2011-05-06-21.57.46.316025
-sh-3.2$

--压缩前字节数
[db2inst1@aix:/home/db2inst1]#>db2 "select npages*32 , cast(tabname as varchar(20)) from syscat.tables where TABSCHEMA='TPCH'"

1                    2                  
-------------------- --------------------
                  32 REGION              
              
294208 PART                
              
15808 SUPPLIER            
              
262304 CUSTOMER            
            
1234144 PARTSUPP            
                  
32 NATION              
            
1768256 ORDERS              
            
8604032 LINEITEM            

  
8 record(s) selected.
--压缩后

1                    2                  
-------------------- --------------------
                  32 REGION              
              
96000 PART                
                
8768 SUPPLIER            
              
138752 CUSTOMER            
              
425856 PARTSUPP            
                  
32 NATION              
              
752576 ORDERS              
            
3396160 LINEITEM

   从上面表的占用空间可见,对于TPC-H数据,因为dbgen生成的数据比较随机,又是符合第3范式的,冗余较少,压缩的效果不太明显。除了很小的表至少占用一个数据页没有压缩外,平均压缩为原来的1/3,和gzip的压缩率差不多,已经很不错了。

0
相关文章