postgresql 13.1 insert into select并行查询的实现

2023-07-06 22:38:03 235

本文信息基于PG13.1。

从PG9.6开始支持并行查询。PG11开始支持CREATETABLE…AS、SELECTINTO以及CREATEMATERIALIZEDVIEW的并行查询。

先说结论：

换用createtableas或者selectinto或者导入导出。

首先跟踪如下查询语句的执行计划：

selectcount(*)fromtestt1,test1t2wheret1.id=t2.id;
postgres=#explainanalyzeselectcount(*)fromtestt1,test1t2wheret1.id=t2.id;
QUERYPLAN
-------------------------------------------------------------------------------------------
FinalizeAggregate(cost=34244.16..34244.17rows=1width=8)(actualtime=683.246..715.324rows=1loops=1)
->Gather(cost=34243.95..34244.16rows=2width=8)(actualtime=681.474..715.311rows=3loops=1)
WorkersPlanned:2
WorkersLaunched:2
->PartialAggregate(cost=33243.95..33243.96rows=1width=8)(actualtime=674.689..675.285rows=1loops=3)
->ParallelHashJoin(cost=15428.00..32202.28rows=416667width=0)(actualtime=447.799..645.689rows=333333loops=3)
HashCond:(t1.id=t2.id)
->ParallelSeqScanontestt1(cost=0.00..8591.67rows=416667width=4)(actualtime=0.025..74.010rows=333333loops=3)
->ParallelHash(cost=8591.67..8591.67rows=416667width=4)(actualtime=260.052..260.053rows=333333loops=3)
Buckets:131072Batches:16MemoryUsage:3520kB
->ParallelSeqScanontest1t2(cost=0.00..8591.67rows=416667width=4)(actualtime=0.032..104.804rows=333333loops=3)
PlanningTime:0.420ms
ExecutionTime:715.447ms
(13rows)

可以看到走了两个Workers。

下边看一下insertintoselect：

postgres=#explainanalyzeinsertintovaselectcount(*)fromtestt1,test1t2wheret1.id=t2.id;
QUERYPLAN
-------------------------------------------------------------------------------------------
Insertonva(cost=73228.00..73228.02rows=1width=4)(actualtime=3744.179..3744.187rows=0loops=1)
->SubqueryScanon"*SELECT*"(cost=73228.00..73228.02rows=1width=4)(actualtime=3743.343..3743.352rows=1loops=1)
->Aggregate(cost=73228.00..73228.01rows=1width=8)(actualtime=3743.247..3743.254rows=1loops=1)
->HashJoin(cost=30832.00..70728.00rows=1000000width=0)(actualtime=1092.295..3511.301rows=1000000loops=1)
HashCond:(t1.id=t2.id)
->SeqScanontestt1(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.030..421.537rows=1000000loops=1)
->Hash(cost=14425.00..14425.00rows=1000000width=4)(actualtime=1090.078..1090.081rows=1000000loops=1)
Buckets:131072Batches:16MemoryUsage:3227kB
->SeqScanontest1t2(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.021..422.768rows=1000000loops=1)
PlanningTime:0.511ms
ExecutionTime:3745.633ms
(11rows)

可以看到并没有Workers的指示，没有启用并行查询。

即使开启强制并行，也无法走并行查询。

postgres=#setforce_parallel_mode=on;
SET
postgres=#explainanalyzeinsertintovaselectcount(*)fromtestt1,test1t2wheret1.id=t2.id;
QUERYPLAN
-------------------------------------------------------------------------------------------
Insertonva(cost=73228.00..73228.02rows=1width=4)(actualtime=3825.042..3825.049rows=0loops=1)
->SubqueryScanon"*SELECT*"(cost=73228.00..73228.02rows=1width=4)(actualtime=3824.976..3824.984rows=1loops=1)
->Aggregate(cost=73228.00..73228.01rows=1width=8)(actualtime=3824.972..3824.978rows=1loops=1)
->HashJoin(cost=30832.00..70728.00rows=1000000width=0)(actualtime=1073.587..3599.402rows=1000000loops=1)
HashCond:(t1.id=t2.id)
->SeqScanontestt1(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.034..414.965rows=1000000loops=1)
->Hash(cost=14425.00..14425.00rows=1000000width=4)(actualtime=1072.441..1072.443rows=1000000loops=1)
Buckets:131072Batches:16MemoryUsage:3227kB
->SeqScanontest1t2(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.022..400.624rows=1000000loops=1)
PlanningTime:0.577ms
ExecutionTime:3825.923ms
(11rows)

原因在官方文档有写：

Thequerywritesanydataorlocksanydatabaserows.Ifaquerycontainsadata-modifyingoperationeitheratthetoplevelorwithinaCTE,noparallelplansforthatquerywillbegenerated.Asanexception,thecommandsCREATETABLE…AS,SELECTINTO,andCREATEMATERIALIZEDVIEWwhichcreateanewtableandpopulateitcanuseaparallelplan.

解决方案有如下三种：

1.selectinto

postgres=#explainanalyzeselectcount(*)intovaafromtestt1,test1t2wheret1.id=t2.id;
QUERYPLAN
-------------------------------------------------------------------------------------------
FinalizeAggregate(cost=34244.16..34244.17rows=1width=8)(actualtime=742.736..774.923rows=1loops=1)
->Gather(cost=34243.95..34244.16rows=2width=8)(actualtime=740.223..774.907rows=3loops=1)
WorkersPlanned:2
WorkersLaunched:2
->PartialAggregate(cost=33243.95..33243.96rows=1width=8)(actualtime=731.408..731.413rows=1loops=3)
->ParallelHashJoin(cost=15428.00..32202.28rows=416667width=0)(actualtime=489.880..700.830rows=333333loops=3)
HashCond:(t1.id=t2.id)
->ParallelSeqScanontestt1(cost=0.00..8591.67rows=416667width=4)(actualtime=0.033..87.479rows=333333loops=3)
->ParallelHash(cost=8591.67..8591.67rows=416667width=4)(actualtime=266.839..266.840rows=333333loops=3)
Buckets:131072Batches:16MemoryUsage:3520kB
->ParallelSeqScanontest1t2(cost=0.00..8591.67rows=416667width=4)(actualtime=0.058..106.874rows=333333loops=3)
PlanningTime:0.319ms
ExecutionTime:783.300ms
(13rows)

2.createtableas

postgres=#explainanalyzecreatetablevbasselectcount(*)fromtestt1,test1t2wheret1.id=t2.id;
QUERYPLAN
-------------------------------------------------------------------------------------------
FinalizeAggregate(cost=34244.16..34244.17rows=1width=8)(actualtime=540.120..563.733rows=1loops=1)
->Gather(cost=34243.95..34244.16rows=2width=8)(actualtime=537.982..563.720rows=3loops=1)
WorkersPlanned:2
WorkersLaunched:2
->PartialAggregate(cost=33243.95..33243.96rows=1width=8)(actualtime=526.602..527.136rows=1loops=3)
->ParallelHashJoin(cost=15428.00..32202.28rows=416667width=0)(actualtime=334.532..502.793rows=333333loops=3)
HashCond:(t1.id=t2.id)
->ParallelSeqScanontestt1(cost=0.00..8591.67rows=416667width=4)(actualtime=0.018..57.819rows=333333loops=3)
->ParallelHash(cost=8591.67..8591.67rows=416667width=4)(actualtime=189.502..189.503rows=333333loops=3)
Buckets:131072Batches:16MemoryUsage:3520kB
->ParallelSeqScanontest1t2(cost=0.00..8591.67rows=416667width=4)(actualtime=0.023..77.786rows=333333loops=3)
PlanningTime:0.189ms
ExecutionTime:565.448ms
(13rows)

3.或者通过导入导出的方式，例如：

psql-hlocalhost-dpostgres-Upostgres-c"selectcount(*)fromtestt1,test1t2wheret1.id=t2.id"-oresult.csv-A-t-F","
psql-hlocalhost-dpostgres-Upostgres-c"COPYvaFROM'result.csv'WITH(FORMATCSV,DELIMITER',',HEADERFALSE,ENCODING'windows-1252')"

一些场景下也会比非并行快。

补充：POSTGRESQL:动态SQL语句中不能使用SELECTINTO？

我的数据库版本是PostgreSQL8.4.7。下面是出错的存储过程：

CREATEorReplaceFUNCTIONfunc_getnextid(
tablenamevarchar(240),
idnamevarchar(20)default'id')
RETURNSintegerAS$funcbody$
Declare
sqlstringvarchar(240);
currentIdinteger;
Begin
sqlstring:='selectmax("'||idname||'")intocurrentIdfrom"'||tablename||'";';
EXECUTEsqlstring;
ifcurrentIdisNULLorcurrentId=0then
return1;
else
returncurrentId+1;
endif;
End;
$funcbody$LANGUAGEplpgsq

执行后出现这样的错误：

SQLerror:

ERROR:EXECUTEofSELECT...INTOisnotimplemented

CONTEXT:PL/pgSQLfunction"func_getnextbigid"line6atEXECUTEstatement

改成这样的就对了：

CREATEorReplaceFUNCTIONfunc_getnextid(
tablenamevarchar(240),
idnamevarchar(20)default'id')
RETURNSintegerAS$funcbody$
Declare
sqlstringvarchar(240);
currentIdinteger;
Begin
sqlstring:='selectmax("'||idname||'")from"'||tablename||'";';
EXECUTEsqlstringintocurrentId;
ifcurrentIdisNULLorcurrentId=0then
return1;
else
returncurrentId+1;
endif;
End;
$funcbody$LANGUAGEplpgsql;

以上为个人经验，希望能给大家一个参考，也希望大家多多支持毛票票。如有错误或未考虑完全的地方，望不吝赐教。

postgresql 13.1 insert into select并行查询的实现

先说结论：

1.selectinto

2.createtableas

3.或者通过导入导出的方式，例如：

热门推荐

随机推荐