2、外部表方式加载
外部表支持sqlldr引擎和数据泵,由于我们的数据是文本方式,故采用sqlldr引擎。
首先创建外部表,注意type为oracle_loader,PREPROCESSOR选择zcat,表示用zcat的结果插入。
SQL> create directory tpch_dir as '/user1/tpch';
Directory created.
Elapsed: 00:00:00.02
SQL> create directory zcat_dir as '/bin';
Directory created.
CREATE TABLE lineitem_ext (L_ORDERKEY NUMBER(10),
L_PARTKEY NUMBER(10),
L_SUPPKEY NUMBER(10),
L_LINENUMBER NUMBER(38),
L_QUANTITY NUMBER,
L_EXTENDEDPRICE NUMBER,
L_DISCOUNT NUMBER,
L_TAX NUMBER,
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE VARCHAR2(10),
L_COMMITDATE VARCHAR2(10),
L_RECEIPTDATE VARCHAR2(10),
L_SHIPINSTRUCT VARCHAR2(25),
L_SHIPMODE VARCHAR2(10),
L_COMMENT VARCHAR2(44))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
PREPROCESSOR zcat_dir:'zcat'
BADFILE 'bad_%a_%p.bad'
LOGFILE 'log_%a_%p.log'
FIELDS TERMINATED BY '|'
MISSING FIELD VALUES ARE NULL)
LOCATION ('lineitem.tbl.gz'))
PARALLEL 2
REJECT LIMIT 0
/
Table created.
Elapsed: 00:00:00.03
Directory created.
Elapsed: 00:00:00.02
SQL> create directory zcat_dir as '/bin';
Directory created.
CREATE TABLE lineitem_ext (L_ORDERKEY NUMBER(10),
L_PARTKEY NUMBER(10),
L_SUPPKEY NUMBER(10),
L_LINENUMBER NUMBER(38),
L_QUANTITY NUMBER,
L_EXTENDEDPRICE NUMBER,
L_DISCOUNT NUMBER,
L_TAX NUMBER,
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE VARCHAR2(10),
L_COMMITDATE VARCHAR2(10),
L_RECEIPTDATE VARCHAR2(10),
L_SHIPINSTRUCT VARCHAR2(25),
L_SHIPMODE VARCHAR2(10),
L_COMMENT VARCHAR2(44))
ORGANIZATION EXTERNAL (
TYPE oracle_loader
DEFAULT DIRECTORY tpch_dir
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
PREPROCESSOR zcat_dir:'zcat'
BADFILE 'bad_%a_%p.bad'
LOGFILE 'log_%a_%p.log'
FIELDS TERMINATED BY '|'
MISSING FIELD VALUES ARE NULL)
LOCATION ('lineitem.tbl.gz'))
PARALLEL 2
REJECT LIMIT 0
/
Table created.
Elapsed: 00:00:00.03
然后用直接路径将外部表数据插入实际要处理的表。由于日期列需要指定格式,用to_date参数处理。
insert /*+append*/into h_lineitem
select L_ORDERKEY,
L_PARTKEY,
L_SUPPKEY,
L_LINENUMBER,
L_QUANTITY,
L_EXTENDEDPRICE,
L_DISCOUNT,
L_TAX,
L_RETURNFLAG,
L_LINESTATUS,
to_date(L_SHIPDATE, 'YYYY-MM-DD'),
to_date(L_COMMITDATE, 'YYYY-MM-DD'),
to_date(L_RECEIPTDATE, 'YYYY-MM-DD'),
L_SHIPINSTRUCT,
L_SHIPMODE,
L_COMMENT
from lineitem_ext;
/
59986052 rows created.
Elapsed: 00:10:26.35
--利用多个文件创建另一个外部表,只修改location
CREATE TABLE lineitem_ext2 (L_ORDERKEY NUMBER(10),
…
LOCATION ('x00.gz','x01.gz','x02.gz','x03.gz'))
…
--用第二个外部表并行插入
insert /*+append parallel (a 8)*/into lineitem a
select /*+ parallel (b 8)*/L_ORDERKEY,
L_PARTKEY,
L_SUPPKEY,
L_LINENUMBER,
L_QUANTITY,
L_EXTENDEDPRICE,
L_DISCOUNT,
L_TAX,
L_RETURNFLAG,
L_LINESTATUS,
to_date(L_SHIPDATE, 'YYYY-MM-DD'),
to_date(L_COMMITDATE, 'YYYY-MM-DD'),
to_date(L_RECEIPTDATE, 'YYYY-MM-DD'),
L_SHIPINSTRUCT,
L_SHIPMODE,
L_COMMENT
from lineitem_ext2 b;
/
59986052 rows created.
Elapsed: 00:04:13.94
--用第一个外部表并行插入
insert /*+append parallel (a 8)*/into lineitem a
select /*+ parallel (b 8)*/L_ORDERKEY,
…
from lineitem_ext b;
/
59986052 rows created.
Elapsed: 00:08:59.89
select L_ORDERKEY,
L_PARTKEY,
L_SUPPKEY,
L_LINENUMBER,
L_QUANTITY,
L_EXTENDEDPRICE,
L_DISCOUNT,
L_TAX,
L_RETURNFLAG,
L_LINESTATUS,
to_date(L_SHIPDATE, 'YYYY-MM-DD'),
to_date(L_COMMITDATE, 'YYYY-MM-DD'),
to_date(L_RECEIPTDATE, 'YYYY-MM-DD'),
L_SHIPINSTRUCT,
L_SHIPMODE,
L_COMMENT
from lineitem_ext;
/
59986052 rows created.
Elapsed: 00:10:26.35
--利用多个文件创建另一个外部表,只修改location
CREATE TABLE lineitem_ext2 (L_ORDERKEY NUMBER(10),
…
LOCATION ('x00.gz','x01.gz','x02.gz','x03.gz'))
…
--用第二个外部表并行插入
insert /*+append parallel (a 8)*/into lineitem a
select /*+ parallel (b 8)*/L_ORDERKEY,
L_PARTKEY,
L_SUPPKEY,
L_LINENUMBER,
L_QUANTITY,
L_EXTENDEDPRICE,
L_DISCOUNT,
L_TAX,
L_RETURNFLAG,
L_LINESTATUS,
to_date(L_SHIPDATE, 'YYYY-MM-DD'),
to_date(L_COMMITDATE, 'YYYY-MM-DD'),
to_date(L_RECEIPTDATE, 'YYYY-MM-DD'),
L_SHIPINSTRUCT,
L_SHIPMODE,
L_COMMENT
from lineitem_ext2 b;
/
59986052 rows created.
Elapsed: 00:04:13.94
--用第一个外部表并行插入
insert /*+append parallel (a 8)*/into lineitem a
select /*+ parallel (b 8)*/L_ORDERKEY,
…
from lineitem_ext b;
/
59986052 rows created.
Elapsed: 00:08:59.89
我们看到,location单个外部文件的外部表,指定了并行插入仍然需要大约9分钟时间,比sqlldr单进程还慢,采用多个外部文件的外部表,并行插入时间4分钟多,也比sqlldr并行加载差很多,to_date的日期转换是有些影响,但由于加载引擎实际没有区别,也就难以超过sqlldr命令的效果。