postgresql 13.1 insert into select并行查询的实现
本文信息基于PG13.1。
从PG9.6开始支持并行查询。PG11开始支持CREATETABLE…AS、SELECTINTO以及CREATEMATERIALIZEDVIEW的并行查询。
先说结论:
换用createtableas或者selectinto或者导入导出。
首先跟踪如下查询语句的执行计划:
selectcount(*)fromtestt1,test1t2wheret1.id=t2.id; postgres=#explainanalyzeselectcount(*)fromtestt1,test1t2wheret1.id=t2.id; QUERYPLAN ------------------------------------------------------------------------------------------- FinalizeAggregate(cost=34244.16..34244.17rows=1width=8)(actualtime=683.246..715.324rows=1loops=1) ->Gather(cost=34243.95..34244.16rows=2width=8)(actualtime=681.474..715.311rows=3loops=1) WorkersPlanned:2 WorkersLaunched:2 ->PartialAggregate(cost=33243.95..33243.96rows=1width=8)(actualtime=674.689..675.285rows=1loops=3) ->ParallelHashJoin(cost=15428.00..32202.28rows=416667width=0)(actualtime=447.799..645.689rows=333333loops=3) HashCond:(t1.id=t2.id) ->ParallelSeqScanontestt1(cost=0.00..8591.67rows=416667width=4)(actualtime=0.025..74.010rows=333333loops=3) ->ParallelHash(cost=8591.67..8591.67rows=416667width=4)(actualtime=260.052..260.053rows=333333loops=3) Buckets:131072Batches:16MemoryUsage:3520kB ->ParallelSeqScanontest1t2(cost=0.00..8591.67rows=416667width=4)(actualtime=0.032..104.804rows=333333loops=3) PlanningTime:0.420ms ExecutionTime:715.447ms (13rows)
可以看到走了两个Workers。
下边看一下insertintoselect:
postgres=#explainanalyzeinsertintovaselectcount(*)fromtestt1,test1t2wheret1.id=t2.id; QUERYPLAN ------------------------------------------------------------------------------------------- Insertonva(cost=73228.00..73228.02rows=1width=4)(actualtime=3744.179..3744.187rows=0loops=1) ->SubqueryScanon"*SELECT*"(cost=73228.00..73228.02rows=1width=4)(actualtime=3743.343..3743.352rows=1loops=1) ->Aggregate(cost=73228.00..73228.01rows=1width=8)(actualtime=3743.247..3743.254rows=1loops=1) ->HashJoin(cost=30832.00..70728.00rows=1000000width=0)(actualtime=1092.295..3511.301rows=1000000loops=1) HashCond:(t1.id=t2.id) ->SeqScanontestt1(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.030..421.537rows=1000000loops=1) ->Hash(cost=14425.00..14425.00rows=1000000width=4)(actualtime=1090.078..1090.081rows=1000000loops=1) Buckets:131072Batches:16MemoryUsage:3227kB ->SeqScanontest1t2(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.021..422.768rows=1000000loops=1) PlanningTime:0.511ms ExecutionTime:3745.633ms (11rows)
可以看到并没有Workers的指示,没有启用并行查询。
即使开启强制并行,也无法走并行查询。
postgres=#setforce_parallel_mode=on; SET postgres=#explainanalyzeinsertintovaselectcount(*)fromtestt1,test1t2wheret1.id=t2.id; QUERYPLAN ------------------------------------------------------------------------------------------- Insertonva(cost=73228.00..73228.02rows=1width=4)(actualtime=3825.042..3825.049rows=0loops=1) ->SubqueryScanon"*SELECT*"(cost=73228.00..73228.02rows=1width=4)(actualtime=3824.976..3824.984rows=1loops=1) ->Aggregate(cost=73228.00..73228.01rows=1width=8)(actualtime=3824.972..3824.978rows=1loops=1) ->HashJoin(cost=30832.00..70728.00rows=1000000width=0)(actualtime=1073.587..3599.402rows=1000000loops=1) HashCond:(t1.id=t2.id) ->SeqScanontestt1(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.034..414.965rows=1000000loops=1) ->Hash(cost=14425.00..14425.00rows=1000000width=4)(actualtime=1072.441..1072.443rows=1000000loops=1) Buckets:131072Batches:16MemoryUsage:3227kB ->SeqScanontest1t2(cost=0.00..14425.00rows=1000000width=4)(actualtime=0.022..400.624rows=1000000loops=1) PlanningTime:0.577ms ExecutionTime:3825.923ms (11rows)
原因在官方文档有写:
Thequerywritesanydataorlocksanydatabaserows.Ifaquerycontainsadata-modifyingoperationeitheratthetoplevelorwithinaCTE,noparallelplansforthatquerywillbegenerated.Asanexception,thecommandsCREATETABLE…AS,SELECTINTO,andCREATEMATERIALIZEDVIEWwhichcreateanewtableandpopulateitcanuseaparallelplan.
解决方案有如下三种:
1.selectinto
postgres=#explainanalyzeselectcount(*)intovaafromtestt1,test1t2wheret1.id=t2.id; QUERYPLAN ------------------------------------------------------------------------------------------- FinalizeAggregate(cost=34244.16..34244.17rows=1width=8)(actualtime=742.736..774.923rows=1loops=1) ->Gather(cost=34243.95..34244.16rows=2width=8)(actualtime=740.223..774.907rows=3loops=1) WorkersPlanned:2 WorkersLaunched:2 ->PartialAggregate(cost=33243.95..33243.96rows=1width=8)(actualtime=731.408..731.413rows=1loops=3) ->ParallelHashJoin(cost=15428.00..32202.28rows=416667width=0)(actualtime=489.880..700.830rows=333333loops=3) HashCond:(t1.id=t2.id) ->ParallelSeqScanontestt1(cost=0.00..8591.67rows=416667width=4)(actualtime=0.033..87.479rows=333333loops=3) ->ParallelHash(cost=8591.67..8591.67rows=416667width=4)(actualtime=266.839..266.840rows=333333loops=3) Buckets:131072Batches:16MemoryUsage:3520kB ->ParallelSeqScanontest1t2(cost=0.00..8591.67rows=416667width=4)(actualtime=0.058..106.874rows=333333loops=3) PlanningTime:0.319ms ExecutionTime:783.300ms (13rows)
2.createtableas
postgres=#explainanalyzecreatetablevbasselectcount(*)fromtestt1,test1t2wheret1.id=t2.id; QUERYPLAN ------------------------------------------------------------------------------------------- FinalizeAggregate(cost=34244.16..34244.17rows=1width=8)(actualtime=540.120..563.733rows=1loops=1) ->Gather(cost=34243.95..34244.16rows=2width=8)(actualtime=537.982..563.720rows=3loops=1) WorkersPlanned:2 WorkersLaunched:2 ->PartialAggregate(cost=33243.95..33243.96rows=1width=8)(actualtime=526.602..527.136rows=1loops=3) ->ParallelHashJoin(cost=15428.00..32202.28rows=416667width=0)(actualtime=334.532..502.793rows=333333loops=3) HashCond:(t1.id=t2.id) ->ParallelSeqScanontestt1(cost=0.00..8591.67rows=416667width=4)(actualtime=0.018..57.819rows=333333loops=3) ->ParallelHash(cost=8591.67..8591.67rows=416667width=4)(actualtime=189.502..189.503rows=333333loops=3) Buckets:131072Batches:16MemoryUsage:3520kB ->ParallelSeqScanontest1t2(cost=0.00..8591.67rows=416667width=4)(actualtime=0.023..77.786rows=333333loops=3) PlanningTime:0.189ms ExecutionTime:565.448ms (13rows)
3.或者通过导入导出的方式,例如:
psql-hlocalhost-dpostgres-Upostgres-c"selectcount(*)fromtestt1,test1t2wheret1.id=t2.id"-oresult.csv-A-t-F"," psql-hlocalhost-dpostgres-Upostgres-c"COPYvaFROM'result.csv'WITH(FORMATCSV,DELIMITER',',HEADERFALSE,ENCODING'windows-1252')"
一些场景下也会比非并行快。
补充:POSTGRESQL:动态SQL语句中不能使用SELECTINTO?
我的数据库版本是PostgreSQL8.4.7。下面是出错的存储过程:
CREATEorReplaceFUNCTIONfunc_getnextid( tablenamevarchar(240), idnamevarchar(20)default'id') RETURNSintegerAS$funcbody$ Declare sqlstringvarchar(240); currentIdinteger; Begin sqlstring:='selectmax("'||idname||'")intocurrentIdfrom"'||tablename||'";'; EXECUTEsqlstring; ifcurrentIdisNULLorcurrentId=0then return1; else returncurrentId+1; endif; End; $funcbody$LANGUAGEplpgsq
执行后出现这样的错误:
SQLerror:
ERROR:EXECUTEofSELECT...INTOisnotimplemented
CONTEXT:PL/pgSQLfunction"func_getnextbigid"line6atEXECUTEstatement
改成这样的就对了:
CREATEorReplaceFUNCTIONfunc_getnextid( tablenamevarchar(240), idnamevarchar(20)default'id') RETURNSintegerAS$funcbody$ Declare sqlstringvarchar(240); currentIdinteger; Begin sqlstring:='selectmax("'||idname||'")from"'||tablename||'";'; EXECUTEsqlstringintocurrentId; ifcurrentIdisNULLorcurrentId=0then return1; else returncurrentId+1; endif; End; $funcbody$LANGUAGEplpgsql;
以上为个人经验,希望能给大家一个参考,也希望大家多多支持毛票票。如有错误或未考虑完全的地方,望不吝赐教。