技术开发 频道

行式数据库PostgreSQL 9.04版本评测

  2.数据加载

  PostgreSQL的大容量文本数据文件导入/导出命令是COPY,COPY 在 PostgreSQL表和标准文件系统文件之间交换数据。 COPY TO 把一个表的所有内容都拷贝到一个文件,而 COPY FROM 从一个文件里拷贝数据到一个表里,也可以把数据附加到表中已经存在的内容里,可以有效地减少导入日志的产生,达到高速导入的目的。它的命令行格式是:

COPY tablename [ ( column [, ...] ) ]
    
FROM { 'filename' | STDIN }
    
[ [ WITH ]
          
[ BINARY ]
          
[ OIDS ]
          
[ DELIMITER [ AS ] 'delimiter' ]
          
[ NULL [ AS ] 'null string' ]
          
[ CSV [ HEADER ]
                
[ QUOTE [ AS ] 'quote' ]
                
[ ESCAPE [ AS ] 'escape' ]
                
[ FORCE NOT NULL column [, ...] ]

   通常用到的是文本格式,只要指定表名、文件名和列分隔符。下面举例说明。

  Tpch的数据以tbl.gz格式保存,因为copy命令支持命名管道,而又不支持在最后一列后加列分隔符,所以用命名管道同时解压缩和替换分隔符。

-bash-3.2$ ls -l /user1/app/oradata/tpch2/*tbl.gz
-rw-r--r-- 1 oracle oinstall   90373097 Feb  2 19:10 /user1/app/oradata/tpch2/customer.tbl.gz
-rw-r--r-- 1 oracle oinstall 2304949340 Feb  2 19:17 /user1/app/oradata/tpch2/lineitem.tbl.gz
-rw-r--r-- 1 oracle oinstall  498102977 Feb  2 19:19 /user1/app/oradata/tpch2/orders.tbl.gz
-rw-r--r-- 1 oracle oinstall  287463641 Feb  2 19:20 /user1/app/oradata/tpch2/partsupp.tbl.gz
-rw-r--r-- 1 oracle oinstall        270 Feb  2 19:20 /user1/app/oradata/tpch2/region.tbl.gz
-rw-r--r-- 1 oracle oinstall    5657118 Feb  2 19:20 /user1/app/oradata/tpch2/supplier.tbl.gz
-bash-3.2$ pwd
/opt/postgres/9.0/bin
-bash-3.2$ gzip --stdout -d /user1/app/oradata/tpch2/region.tbl.gz
0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
1|AMERICA|hs use ironic, even requests. s|
2|ASIA|ges. thinly even pinto beans ca|
3|EUROPE|ly final courts cajole furiously final excuse|
4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
--用sed命令删除行尾|
-bash-3.2$ gzip --stdout -d /user1/app/oradata/tpch2/region.tbl.gz |sed 's/|\r/\r/' > /user1/daa &
[2] 29069
-bash-3.2$ psql pgdb

pgdb=# \timing
Timing is on.
pgdb=# COPY region FROM '/user1/daa' WITH DELIMITER AS '|';
COPY 5
Time: 3.085 ms
pgdb=# select * from region;
r_regionkey  |  r_name                   |                                  r_comment
-------------+---------------------------+----------------------------------------------
           0 | AFRICA                    | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
           1 | AMERICA                   | hs use ironic, even requests. s
           2 | ASIA                      | ges. thinly even pinto beans ca
           3 | EUROPE                    | ly final courts cajole furiously final excuse
           4 | MIDDLE EAST               | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)

Time: 1.574 ms

   8个表用copy from命令导入的时间分别是12秒、556秒、100秒、46秒、16秒、1秒、0秒、0秒,总时间731秒。

  所有表格加载完成后,用VACUUM ANALYZE命令收集统计信息。

pgdb=# VACUUM ANALYZE;
VACUUM
Time:
76961.460 ms

   导入过程中,系统提示检查点发生太频繁,建议修改配置参数,但修改后该提示仍然存在。

pgdb=# copy partsupp FROM '/user1/daa' WITH DELIMITER AS '|';
LOG:  checkpoints are occurring too frequently (1 second apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".
LOG:  checkpoints are occurring too frequently (2 seconds apart)
HINT:  Consider increasing the configuration parameter "checkpoint_segments".


-bash-3.2$ ./pg_ctl -D /user1/postgresql/data stop
LOG:  received smart shutdown request
waiting
for server to shut down....LOG:  autovacuum launcher shutting down
LOG:  shutting down
LOG:  database system is shut down
done
server stopped
[1]+  Done                    ./postgres -D /user1/postgresql/data

-bash-3.2$ vi postgresql.conf

#
- Checkpoints -

#checkpoint_segments
= 30              # in logfile segments, min 1, 16MB each
#checkpoint_timeout
= 5min              # range 30s-1h
#checkpoint_completion_target
= 0.5     # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning
= 30s               # 0 disables
0
相关文章