技术开发 频道

行式数据库评测:SQL Server2008 R2版

  4.数据压缩测试

  表和索引压缩是SQL Server 2008起才具有的功能,有行压缩和页压缩2种选项。只有在Enterprise 版和 Developer 版中才能使用压缩。

  对表进行压缩有2种方法,一种是在创建表的时候指定压缩属性,再加载数据,加载数据又分为从外部文本导入和从数据库另一个表复制二种;另一种是对已经包含数据的非压缩表进行修改压缩属性的操作,然后重新整理表。另外SQL Server也提供了一个存储过程,可以预测一个表压缩后的大小,供人作决定前参考。

  预测压缩率的存储过程sp_estimate_data_compression_savings的语法为:

sp_estimate_data_compression_savings
      
[ @schema_name = ] 'schema_name'  
     ,
[ @object_name = ] 'object_name'
    ,
[@index_id = ] index_id
     ,
[@partition_number = ] partition_number
    ,
[@data_compression = ] 'data_compression'

   其中5个参数分别是模式名、对象名,索引的 ID、分区号,要评估的压缩的类型(NONE、ROW和PAGE),不存在的参数用NULL表示。

D:\soft>osql -Usa -Pabcd1234
1> use tpch;
2> go
1> set statistics time on
2> go
1> select * into tpch.dbo.l  from lineitem --从压缩表复制到非压缩表
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 2 ms.

SQL Server Execution Times:
CPU time
= 236109 ms,  elapsed time = 239178 ms.
(
59986052 rows affected)
1> sp_spaceused 'L'
2> go

name        rows     reserved           data               index_size         unused
-----   ----------  ------------------ ------------------ ------------------ ----------
l        59986052   8280224 KB         8280184 KB         32 KB              8 KB

1> set statistics time off
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.
1> drop table l;
2> go
1> set statistics time on
2> go
1>
2> select * into tpch.dbo.l  from master.dbo.lineitem --从非压缩表复制到非压缩表
3> go
SQL Server parse
and compile time:
CPU time
= 8 ms, elapsed time = 8 ms.

SQL Server Execution Times:
CPU time
= 107609 ms,  elapsed time = 109807 ms.
(
59986052 rows affected)
1> set statistics time off
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.
1> sp_spaceused 'L'
2> go

name        rows     reserved           data               index_size         unused
-----   ----------  ------------------ ------------------ ------------------ ----------
l        59986052   8280216 KB         8280184 KB         24 KB              8 KB

1> drop table L;
2> go
1> select * into tpch.dbo.l  from master.dbo.lineitem where 1=2; --复制表结构
2> go
(
0 rows affected)
1> sp_spaceused 'L'
2> go

name        rows     reserved           data               index_size         unused
-----   ----------  ------------------ ------------------ ------------------ ----------
l              0    80 KB              8 KB               16 KB              56 KB

1> alter table tpch.dbo.l rebuild WITH (DATA_COMPRESSION =page); --修改为页压缩属性
2> go
1> set statistics time on
2> go
1> insert into tpch.dbo.l with (tablock) select * from master.dbo.lineitem; --用insert with (tablock) 大批量插入压缩表
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time
= 479813 ms,  elapsed time = 480804 ms.
(
59986052 rows affected)
1> set statistics time off
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.
1> sp_spaceused 'L'
2> go

name        rows     reserved           data               index_size         unused
-----   ----------  ------------------ ------------------ ------------------ ----------
l        59986052   4404056 KB         4403976 KB         24 KB              56 KB


1> truncate table L;
2> go
1> set statistics time on
2> go
1> BULK INSERT L FROM 'd:\soft\lineitem.tbl' WITH ( FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n'); --用BULK insert大批量插入压缩表
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 17 ms.

SQL Server Execution Times:
CPU time
= 1141437 ms,  elapsed time = 1223991 ms.
(
59986052 rows affected)
1> set statistics time off
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.
1> truncate table L;
2> go
1> alter table tpch.dbo.l rebuild WITH (DATA_COMPRESSION =none);
2> go
1> set statistics time on
2> go
1> BULK INSERT L FROM 'd:\soft\lineitem.tbl' WITH ( FIELDTERMINATOR = '|',ROWTERMINATOR = '|\n'); --用BULK insert大批量插入非压缩表
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 15 ms.

SQL Server Execution Times:
CPU time
= 1032969 ms,  elapsed time = 1130748 ms.
(
59986052 rows affected)
1> set statistics time off
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.
1>  sp_spaceused 'L'
2>  go

name        rows     reserved           data               index_size         unused
-----   ----------  ------------------ ------------------ ------------------ ----------
l        59986052   8282152 KB         8280184 KB         40 KB              1928 KB


1> alter table tpch.dbo.l rebuild WITH (DATA_COMPRESSION =page); --用rebuild从非压缩表转为压缩

2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time
= 475936 ms,  elapsed time = 60577 ms.
1> sp_spaceused 'L'
2> go
name        rows     reserved           data               index_size         unused
-----   ----------  ------------------ ------------------ ------------------ ----------
l        59986052   4404696 KB         4404392 KB         24 KB              280 KB

1> alter table tpch.dbo.l rebuild WITH (DATA_COMPRESSION =none);--用rebuild从压缩表转为非压缩
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:
CPU time
= 367218 ms,  elapsed time = 52398 ms.

--用sp_estimate_data_compression_savings估计按行压缩后的表大小
1> sp_estimate_data_compression_savings 'DBO' ,'L',NULL,NULL,'ROW'
2> go
object_name
        schema_name
        partition_number size_with_current_compression_setting(KB) size_with_requested_compression_setting(KB)
        sample_size_with_current_compression_setting(KB) sample_size_with_requested_compression_setting(KB)
--------------------------------------------------
L
        DBO
                      
1                                   4406080                                     5450120
                                                  
41088                                              50824
2> set statistics time on
3> go
1> alter table tpch.dbo.l rebuild WITH (DATA_COMPRESSION =row)
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 3 ms.

SQL Server Execution Times:
CPU time
= 341969 ms,  elapsed time = 45206 ms.
1> set statistics time off
2> go
SQL Server parse
and compile time:
CPU time
= 0 ms, elapsed time = 0 ms.
1> sp_spaceused 'L'
2> go

name        rows     reserved           data               index_size         unused
-----   ----------  ------------------ ------------------ ------------------ ----------
l        59986052   5567656 KB         5567368 KB         40 KB              248 KB

   从测试结果可知,同样的LINEITEM表数据,在数据库中复制最快,而其中从非压缩表到非压缩表又比其他方式快,只要109秒,读压缩表写入非压缩表多了解压的操作,慢了100多秒,读非压缩表插入压缩表多了压缩的操作,又多了200多秒。如果用bulk insert用时大约1100多秒,插入非压缩表也比插入压缩表快100多秒,差距没有前面数据库内操作明显。对已有数据的表进行压缩或解压缩rebuild操作的速度都很快,从CPU时间和实际时间的比较可知,这个操作是并行处理的,因此比其他操作优势明显。用页方式压缩比行方式的压缩率高,但压缩的速度是行方式高,sp_estimate_data_compression_savings的抽样估计还是比较准确的。

  压缩对不同的表效果不同,下面是压缩前后TPC-H数据的对比。

  表1 各种压缩类型的大小比较,单位:KB

一、数据库安装

  表2 各种压缩类型的大小比较,单位:以原始未压缩大小为1的比率

一、数据库安装

  从上面表的占用空间可见,对于TPC-H数据,因为dbgen生成的数据比较随机,又是符合第3范式的,冗余较少,压缩的效果不太明显。最大压缩约为原来的1/2,压缩率较低,有些表如PARTSUPP,基本上没有被压缩。而且,如果向具有压缩属性的表导入数据,无论采用什么方式,导入时间都大大长于导入非压缩表,因此,像TPCH这样大量数据的导入,还是首先导入非压缩表,再用alter table rebuild命令压缩比较合适。

  聚集主键增加的空间非常少,这与它的存储和数据在一起有关,非聚集主键的存储和聚集主键有较大的区别,它是和数据表分离的,因此占用空间要超过聚集主键。

0
相关文章