postgresql 中的 like 查询优化方案
当时数量量比较庞大的时候,做模糊查询效率很慢,为了优化查询效率,尝试如下方法做效率对比
一、对比情况说明:
1、数据量100w条数据
2、执行sql
二、对比结果
explainanalyzeSELECT c_patent, c_applyissno, d_applyissdate, d_applydate, c_patenttype_dimn, c_newlawstatus, c_abstract FROM public.t_knowl_patent_zlxx_temp WHERE c_applicantLIKE'%本溪满族自治县连山关镇安平安养殖场%';
1、未建索时执行计划:
"Gather(cost=1000.00..83803.53rows=92width=1278)(actualtime=217.264..217.264rows=0loops=1) WorkersPlanned:2 WorkersLaunched:2 ->ParallelSeqScanont_knowl_patent_zlxx(cost=0.00..82794.33rows=38width=1278)(actualtime=212.355..212.355rows=0loops=3) Filter:((c_applicant)::text~~'%本溪满族自治县连山关镇安平安养殖场%'::text) RowsRemovedbyFilter:333333 Planningtime:0.272ms Executiontime:228.116ms"
2、btree索引
建索引语句
CREATEINDEXidx_public_t_knowl_patent_zlxx_applicantONpublic.t_knowl_patent_zlxx(c_applicantvarchar_pattern_ops);
执行计划
"Gather(cost=1000.00..83803.53rows=92width=1278)(actualtime=208.253..208.253rows=0loops=1) WorkersPlanned:2 WorkersLaunched:2 ->ParallelSeqScanont_knowl_patent_zlxx(cost=0.00..82794.33rows=38width=1278)(actualtime=203.573..203.573rows=0loops=3) Filter:((c_applicant)::text~~'%本溪满族自治县连山关镇安平安养殖场%'::text) RowsRemovedbyFilter:333333 Planningtime:0.116ms Executiontime:218.189ms"
但是如果将查询sql稍微改动一下,把like查询中的前置%去掉是这样的
IndexScanusingidx_public_t_knowl_patent_zlxx_applicantont_knowl_patent_zlxx_temp(cost=0.55..8.57rows=92width=1278)(actualtime=0.292..0.292rows=0loops=1) IndexCond:(((c_applicant)::text~>=~'本溪满族自治县连山关镇安平安养殖场'::text)AND((c_applicant)::text~<~'本溪满族自治县连山关镇安平安养殖圻'::text)) Filter:((c_applicant)::text~~'本溪满族自治县连山关镇安平安养殖场%'::text) Planningtime:0.710ms Executiontime:0.378ms
3、gin索引
创建索引语句(postgresql要求在9.6版本及以上)
createextensionpg_trgm; CREATEINDEXidx_public_t_knowl_patent_zlxx_applicantONpublic.t_knowl_patent_zlxxUSINGgin(c_applicantgin_trgm_ops);
执行计划
BitmapHeapScanont_knowl_patent_zlxx(cost=244.71..600.42rows=91width=1268)(actualtime=0.649..0.649rows=0loops=1) RecheckCond:((c_applicant)::text~~'%本溪满族自治县连山关镇安平安养殖场%'::text) ->BitmapIndexScanonidx_public_t_knowl_patent_zlxx_applicant(cost=0.00..244.69rows=91width=0)(actualtime=0.647..0.647rows=0loops=1) IndexCond:((c_applicant)::text~~'%本溪满族自治县连山关镇安平安养殖场%'::text) Planningtime:0.673ms Executiontime:0.740ms
三、结论
btree索引可以让后置%"abc%"的模糊匹配走索引,gin+gp_trgm可以让前后置%"%abc%"走索引。但是gin索引也有弊端,以下情况可能导致无法命中:
搜索字段少于3个字符时,不会命中索引,这是gin自身机制导致。
当搜索字段过长时,比如email检索,可能也不会命中索引,造成原因暂时未知。
补充:PostgreSQLLIKE查询效率提升实验
一、未做索引的查询效率
作为对比,先对未索引的查询做测试
EXPLAINANALYZEselect*fromgallery_mapwhereauthor='曹志耘'; QUERYPLAN ----------------------------------------------------------------------------------------------------------------- SeqScanongallery_map(cost=0.00..7002.32rows=1025width=621)(actualtime=0.011..39.753rows=1031loops=1) Filter:((author)::text='曹志耘'::text) RowsRemovedbyFilter:71315 Planningtime:0.194ms Executiontime:39.879ms (5rows) Time:40.599ms EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'曹志耘'; QUERYPLAN ----------------------------------------------------------------------------------------------------------------- SeqScanongallery_map(cost=0.00..7002.32rows=1025width=621)(actualtime=0.017..41.513rows=1031loops=1) Filter:((author)::text~~'曹志耘'::text) RowsRemovedbyFilter:71315 Planningtime:0.188ms Executiontime:41.669ms (5rows) Time:42.457ms EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'曹志耘%'; QUERYPLAN ----------------------------------------------------------------------------------------------------------------- SeqScanongallery_map(cost=0.00..7002.32rows=1028width=621)(actualtime=0.017..41.492rows=1031loops=1) Filter:((author)::text~~'曹志耘%'::text) RowsRemovedbyFilter:71315 Planningtime:0.307ms Executiontime:41.633ms (5rows) Time:42.676ms
很显然都会做全表扫描
二、创建btree索引
PostgreSQL默认索引是btree
CREATEINDEXix_gallery_map_authorONgallery_map(author); EXPLAINANALYZEselect*fromgallery_mapwhereauthor='曹志耘'; QUERYPLAN ------------------------------------------------------------------------------------------------------------------------------------- BitmapHeapScanongallery_map(cost=36.36..2715.37rows=1025width=621)(actualtime=0.457..1.312rows=1031loops=1) RecheckCond:((author)::text='曹志耘'::text) HeapBlocks:exact=438 ->BitmapIndexScanonix_gallery_map_author(cost=0.00..36.10rows=1025width=0)(actualtime=0.358..0.358rows=1031loops=1) IndexCond:((author)::text='曹志耘'::text) Planningtime:0.416ms Executiontime:1.422ms (7rows) Time:2.462ms EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'曹志耘'; QUERYPLAN ------------------------------------------------------------------------------------------------------------------------------------- BitmapHeapScanongallery_map(cost=36.36..2715.37rows=1025width=621)(actualtime=0.752..2.119rows=1031loops=1) Filter:((author)::text~~'曹志耘'::text) HeapBlocks:exact=438 ->BitmapIndexScanonix_gallery_map_author(cost=0.00..36.10rows=1025width=0)(actualtime=0.560..0.560rows=1031loops=1) IndexCond:((author)::text='曹志耘'::text) Planningtime:0.270ms Executiontime:2.295ms (7rows) Time:3.444ms EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'曹志耘%'; QUERYPLAN ----------------------------------------------------------------------------------------------------------------- SeqScanongallery_map(cost=0.00..7002.32rows=1028width=621)(actualtime=0.015..41.389rows=1031loops=1) Filter:((author)::text~~'曹志耘%'::text) RowsRemovedbyFilter:71315 Planningtime:0.260ms Executiontime:41.518ms (5rows) Time:42.430ms EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'%研究室'; QUERYPLAN ----------------------------------------------------------------------------------------------------------------- SeqScanongallery_map(cost=0.00..7002.32rows=2282width=621)(actualtime=0.064..52.824rows=2152loops=1) Filter:((author)::text~~'%研究室'::text) RowsRemovedbyFilter:70194 Planningtime:0.254ms Executiontime:53.064ms (5rows) Time:53.954ms
可以看到,等于、like的全匹配是用到索引的,like的模糊查询还是全表扫描
三、创建gin索引
CREATEEXTENSIONpg_trgm; CREATEINDEXix_gallery_map_authorONgallery_mapUSINGgin(authorgin_trgm_ops); EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'曹%'; QUERYPLAN ------------------------------------------------------------------------------------------------------------------------------------- BitmapHeapScanongallery_map(cost=19.96..2705.69rows=1028width=621)(actualtime=0.419..1.771rows=1031loops=1) RecheckCond:((author)::text~~'曹%'::text) HeapBlocks:exact=438 ->BitmapIndexScanonix_gallery_map_author(cost=0.00..19.71rows=1028width=0)(actualtime=0.312..0.312rows=1031loops=1) IndexCond:((author)::text~~'曹%'::text) Planningtime:0.358ms Executiontime:1.916ms (7rows) Time:2.843ms EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'%耘%'; QUERYPLAN ----------------------------------------------------------------------------------------------------------------- SeqScanongallery_map(cost=0.00..7002.32rows=1028width=621)(actualtime=0.015..51.641rows=1031loops=1) Filter:((author)::text~~'%耘%'::text) RowsRemovedbyFilter:71315 Planningtime:0.268ms Executiontime:51.957ms (5rows) Time:52.899ms EXPLAINANALYZEselect*fromgallery_mapwhereauthorlike'%研究室%'; QUERYPLAN ------------------------------------------------------------------------------------------------------------------------------------- BitmapHeapScanongallery_map(cost=31.83..4788.42rows=2559width=621)(actualtime=0.914..4.195rows=2402loops=1) RecheckCond:((author)::text~~'%研究室%'::text) HeapBlocks:exact=868 ->BitmapIndexScanonix_gallery_map_author(cost=0.00..31.19rows=2559width=0)(actualtime=0.694..0.694rows=2402loops=1) IndexCond:((author)::text~~'%研究室%'::text) Planningtime:0.306ms Executiontime:4.403ms (7rows) Time:5.227ms
gin_trgm索引的效果好多了
由于pg_trgm的索引是把字符串切成多个3元组,然后使用这些3元组做匹配,所以gin_trgm索引对于少于3个字符(包括汉字)的查询,只有前缀匹配会走索引
另外,还测试了btree_gin,效果和btree一样
注意:
gin_trgm要求数据库必须使用UTF-8编码
demo_v1#\ldemo_v1 Listofdatabases Name|Owner|Encoding|Collate|Ctype|Accessprivileges ---------+-----------+----------+-------------+-------------+------------------- demo_v1|wmpp_user|UTF8|en_US.UTF-8|en_US.UTF-8|
以上为个人经验,希望能给大家一个参考,也希望大家多多支持毛票票。如有错误或未考虑完全的地方,望不吝赐教。