2.数据加载
PostgreSQL的大容量文本数据文件导入/导出命令是COPY,COPY 在 PostgreSQL表和标准文件系统文件之间交换数据。 COPY TO 把一个表的所有内容都拷贝到一个文件,而 COPY FROM 从一个文件里拷贝数据到一个表里,也可以把数据附加到表中已经存在的内容里,可以有效地减少导入日志的产生,达到高速导入的目的。它的命令行格式是:
COPY tablename [ ( column [, ...] ) ]
FROM { 'filename' | STDIN }
[ [ WITH ]
[ BINARY ]
[ OIDS ]
[ DELIMITER [ AS ] 'delimiter' ]
[ NULL [ AS ] 'null string' ]
[ CSV [ HEADER ]
[ QUOTE [ AS ] 'quote' ]
[ ESCAPE [ AS ] 'escape' ]
[ FORCE NOT NULL column [, ...] ]通常用到的是文本格式,只要指定表名、文件名和列分隔符。下面举例说明。
Tpch的数据以tbl.gz格式保存,因为copy命令支持命名管道,而又不支持在最后一列后加列分隔符,所以用命名管道同时解压缩和替换分隔符。
-bash-3.2$ ls -l /user1/app/oradata/tpch2/*tbl.gz
-rw-r--r-- 1 oracle oinstall 90373097 Feb 2 19:10 /user1/app/oradata/tpch2/customer.tbl.gz
-rw-r--r-- 1 oracle oinstall 2304949340 Feb 2 19:17 /user1/app/oradata/tpch2/lineitem.tbl.gz
-rw-r--r-- 1 oracle oinstall 498102977 Feb 2 19:19 /user1/app/oradata/tpch2/orders.tbl.gz
-rw-r--r-- 1 oracle oinstall 287463641 Feb 2 19:20 /user1/app/oradata/tpch2/partsupp.tbl.gz
-rw-r--r-- 1 oracle oinstall 270 Feb 2 19:20 /user1/app/oradata/tpch2/region.tbl.gz
-rw-r--r-- 1 oracle oinstall 5657118 Feb 2 19:20 /user1/app/oradata/tpch2/supplier.tbl.gz
-bash-3.2$ pwd
/opt/postgres/9.0/bin
-bash-3.2$ gzip --stdout -d /user1/app/oradata/tpch2/region.tbl.gz
0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
1|AMERICA|hs use ironic, even requests. s|
2|ASIA|ges. thinly even pinto beans ca|
3|EUROPE|ly final courts cajole furiously final excuse|
4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
--用sed命令删除行尾|
-bash-3.2$ gzip --stdout -d /user1/app/oradata/tpch2/region.tbl.gz |sed 's/|\r/\r/' > /user1/daa &
[2] 29069
-bash-3.2$ psql pgdb
pgdb=# \timing
Timing is on.
pgdb=# COPY region FROM '/user1/daa' WITH DELIMITER AS '|';
COPY 5
Time: 3.085 ms
pgdb=# select * from region;
r_regionkey | r_name | r_comment
-------------+---------------------------+----------------------------------------------
0 | AFRICA | lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1 | AMERICA | hs use ironic, even requests. s
2 | ASIA | ges. thinly even pinto beans ca
3 | EUROPE | ly final courts cajole furiously final excuse
4 | MIDDLE EAST | uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
(5 rows)
Time: 1.574 ms8个表用copy from命令导入的时间分别是12秒、556秒、100秒、46秒、16秒、1秒、0秒、0秒,总时间731秒。
所有表格加载完成后,用VACUUM ANALYZE命令收集统计信息。
pgdb=# VACUUM ANALYZE;
VACUUM
Time: 76961.460 ms导入过程中,系统提示检查点发生太频繁,建议修改配置参数,但修改后该提示仍然存在。
pgdb=# copy partsupp FROM '/user1/daa' WITH DELIMITER AS '|';
LOG: checkpoints are occurring too frequently (1 second apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (2 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".

-bash-3.2$ ./pg_ctl -D /user1/postgresql/data stop
LOG: received smart shutdown request
waiting for server to shut down....LOG: autovacuum launcher shutting down
LOG: shutting down
LOG: database system is shut down
done
server stopped
[1]+ Done ./postgres -D /user1/postgresql/data
-bash-3.2$ vi postgresql.conf
# - Checkpoints -
#checkpoint_segments = 30 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables