分析PostgreSQL中的大表连接

这篇文章主要介绍“分析PostgreSQL中的大表连接”，在日常操作中，相信很多人在分析PostgreSQL中的大表连接问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”分析PostgreSQL中的大表连接”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

让客户满意是我们工作的目标，不断超越客户的期望值来自于我们对这个行业的热爱。我们立志把好的技术通过有效、简单的方式提供给客户，将通过不懈努力成为客户在信息化领域值得信任、有价值的长期合作伙伴，公司提供的服务项目有：域名与空间、网络空间、营销软件、网站建设、鄞州网站维护、网站推广。

数据库配置
主机CPU 4核，内存4G，PG共享缓存128MB，work_mem 4MB。

测试数据
创建4张表，每张表1000w行，数据量约1G，是PG共享内存的8倍。

drop table t_big_1;
drop table t_big_2;
drop table t_big_3;
drop table t_big_4;
create table t_big_1(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
create table t_big_2(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
create table t_big_3(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
create table t_big_4(id int,c1 varchar(30),c2 varchar(30),c3 varchar(30));
insert into t_big_1 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
insert into t_big_2 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
insert into t_big_3 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
insert into t_big_4 select x,rpad('c1'||x,30,'c1'),rpad('c2'||x,30,'c2'),rpad('c3'||x,30,'c3') from generate_series(1,10000000) as x;
show shared_buffers;
show effective_cache_size;
show work_mem;
select pg_size_pretty(pg_table_size('t_big_1'));
select pg_size_pretty(pg_table_size('t_big_2'));
select pg_size_pretty(pg_table_size('t_big_3'));
select pg_size_pretty(pg_table_size('t_big_4'));
analyze t_big_1,t_big_2,t_big_3,t_big_4;
explain verbose
select a.*
from t_big_1 a join t_big_2 b on a.c1 = b.c1;
explain verbose
select a.id,b.c1,c.c2,d.c3 
from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
where a.id = b.id and b.id = c.id and c.id = d.id;
explain verbose
select a.id,b.c1,c.c2,d.c3 
from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;

大表连接
未分析数据表前

[local:/data/run/pg12]:5120 pg12@testdb=# explain verbose
pg12@testdb-# select a.id,b.c1,c.c2,d.c3 
pg12@testdb-# from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
pg12@testdb-# where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;
                                                   QUERY PLAN                                                    
-----------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=164722831406.26..1096915306139605248.00 rows=73127676034285903872 width=238)
   Output: a.id, b.c1, c.c2, d.c3
   Merge Cond: ((b.c1)::text = (c.c1)::text)
   ->  Sort  (cost=58799667920.13..59102008117.66 rows=120936079012 width=82)
         Output: a.id, b.c1
         Sort Key: b.c1
         ->  Merge Join  (cost=2124653.55..1816202724.10 rows=120936079012 width=82)
               Output: a.id, b.c1
               Merge Cond: (a.id = b.id)
               ->  Sort  (cost=894232.27..906527.40 rows=4918050 width=4)
                     Output: a.id
                     Sort Key: a.id
                     ->  Seq Scan on public.t_big_1 a  (cost=0.00..213115.50 rows=4918050 width=4)
                           Output: a.id
               ->  Materialize  (cost=1230421.27..1255011.52 rows=4918050 width=82)
                     Output: b.c1, b.id
                     ->  Sort  (cost=1230421.27..1242716.40 rows=4918050 width=82)
                           Output: b.c1, b.id
                           Sort Key: b.id
                           ->  Seq Scan on public.t_big_2 b  (cost=0.00..213115.50 rows=4918050 width=82)
                                 Output: b.c1, b.id
   ->  Materialize  (cost=105923163486.13..106527843881.19 rows=120936079012 width=234)
         Output: c.c2, c.c1, d.c3
         ->  Sort  (cost=105923163486.13..106225503683.66 rows=120936079012 width=234)
               Output: c.c2, c.c1, d.c3
               Sort Key: c.c1
               ->  Merge Join  (cost=3066006.55..1817144077.10 rows=120936079012 width=234)
                     Output: c.c2, c.c1, d.c3
                     Merge Cond: ((c.c2)::text = (d.c2)::text)
                     ->  Sort  (cost=1533003.27..1545298.40 rows=4918050 width=156)
                           Output: c.c2, c.c1
                           Sort Key: c.c2
                           ->  Seq Scan on public.t_big_3 c  (cost=0.00..213115.50 rows=4918050 width=156)
                                 Output: c.c2, c.c1
                     ->  Materialize  (cost=1533003.27..1557593.52 rows=4918050 width=156)
                           Output: d.c3, d.c2
                           ->  Sort  (cost=1533003.27..1545298.40 rows=4918050 width=156)
                                 Output: d.c3, d.c2
                                 Sort Key: d.c2
                                 ->  Seq Scan on public.t_big_4 d  (cost=0.00..213115.50 rows=4918050 width=156)
                                       Output: d.c3, d.c2
(41 rows)

可以看到，未分析前，执行计划使用merge join，计划的cost是一个大数。

执行分析后

[local:/data/run/pg12]:5120 pg12@testdb=# explain (analyze,buffers,verbose)
select a.id,b.c1,c.c2,d.c3 
from t_big_1 a,t_big_2 b,t_big_3 c,t_big_4 d
where a.id = b.id and b.c1 = c.c1 and c.c2 = d.c2;
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=896126.19..2564935.91 rows=9999844 width=97) (actual time=393803.655..404902.025 rows=10000000 loops=1)
   Output: a.id, b.c1, c.c2, d.c3
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068
   ->  Parallel Hash Join  (cost=895126.19..1563951.51 rows=4166602 width=97) (actual time=393672.896..398825.027 rows=3333333 loops=3)
         Output: a.id, b.c1, c.c2, d.c3
         Hash Cond: ((c.c2)::text = (d.c2)::text)
         Buffers: shared hit=266 read=655676 dirtied=486717 written=486414, temp read=473954 written=486068
         Worker 0: actual time=393629.565..399028.498 rows=3549817 loops=1
           Buffers: shared hit=118 read=218079 dirtied=161599 written=161495, temp read=162307 written=161880
         Worker 1: actual time=393585.994..399049.295 rows=3609509 loops=1
           Buffers: shared hit=119 read=217313 dirtied=161014 written=160913, temp read=163324 written=160736
         ->  Parallel Hash Join  (cost=592683.65..1070481.02 rows=4166681 width=66) (actual time=328335.871..378143.916 rows=3333333 loops=3)
               Output: a.id, b.c1, c.c2
               Hash Cond: ((b.c1)::text = (c.c1)::text)
               Buffers: shared hit=63 read=491773 dirtied=352782 written=352575, temp read=267125 written=274312
               Worker 0: actual time=328475.430..378240.528 rows=3325497 loops=1
                 Buffers: shared hit=25 read=164024 dirtied=117445 written=117373, temp read=88941 written=91448
               Worker 1: actual time=328084.038..377943.176 rows=3311112 loops=1
                 Buffers: shared hit=29 read=163900 dirtied=117550 written=117481, temp read=88747 written=91320
               ->  Parallel Hash Join  (cost=290238.33..609558.42 rows=4166681 width=35) (actual time=158380.042..198763.345 rows=3333333 loops=3)
                     Output: a.id, b.c1
                     Hash Cond: (a.id = b.id)
                     Buffers: shared hit=63 read=327838 dirtied=218847 written=218710, temp read=98317 written=100856
                     Worker 0: actual time=158518.764..199077.411 rows=3331104 loops=1
                       Buffers: shared hit=25 read=109394 dirtied=72893 written=72845, temp read=32790 written=33668
                     Worker 1: actual time=158520.409..198920.394 rows=3332824 loops=1
                       Buffers: shared hit=29 read=109323 dirtied=73002 written=72956, temp read=32934 written=33560
                     ->  Parallel Seq Scan on public.t_big_1 a  (cost=0.00..205601.81 rows=4166681 width=4) (actual time=239.830..75704.152 rows=3333333 loops=3)
                           Output: a.id
                           Buffers: shared read=163935 dirtied=109449 written=109391
                           Worker 0: actual time=239.584..75677.703 rows=3327794 loops=1
                             Buffers: shared read=54554 dirtied=36489 written=36468
                           Worker 1: actual time=240.355..75258.837 rows=3347802 loops=1
                             Buffers: shared read=54882 dirtied=36486 written=36467
                     ->  Parallel Hash  (cost=205601.81..205601.81 rows=4166681 width=35) (actual time=65812.428..65812.431 rows=3333333 loops=3)
                           Output: b.c1, b.id
                           Buckets: 65536  Batches: 256  Memory Usage: 3328kB
                           Buffers: shared hit=32 read=163903 dirtied=109398 written=109319, temp written=70136
                           Worker 0: actual time=65812.900..65812.904 rows=3345876 loops=1
                             Buffers: shared hit=11 read=54840 dirtied=36404 written=36377, temp written=23428
                           Worker 1: actual time=65812.873..65812.875 rows=3321816 loops=1
                             Buffers: shared hit=15 read=54441 dirtied=36516 written=36489, temp written=23320
                           ->  Parallel Seq Scan on public.t_big_2 b  (cost=0.00..205601.81 rows=4166681 width=35) (actual time=1.490..47839.237 rows=3333333 loops=3)
                                 Output: b.c1, b.id
                                 Buffers: shared hit=32 read=163903 dirtied=109398 written=109319
                                 Worker 0: actual time=1.464..47814.446 rows=3345876 loops=1
                                   Buffers: shared hit=11 read=54840 dirtied=36404 written=36377
                                 Worker 1: actual time=1.470..47104.413 rows=3321816 loops=1
                                   Buffers: shared hit=15 read=54441 dirtied=36516 written=36489
               ->  Parallel Hash  (cost=205601.81..205601.81 rows=4166681 width=62) (actual time=113720.080..113720.080 rows=3333333 loops=3)
                     Output: c.c2, c.c1
                     Buckets: 65536  Batches: 512  Memory Usage: 2432kB
                     Buffers: shared read=163935 dirtied=133935 written=133865, temp written=103856
                     Worker 0: actual time=113719.124..113719.124 rows=3332395 loops=1
                       Buffers: shared read=54630 dirtied=44552 written=44528, temp written=34648
                     Worker 1: actual time=113720.557..113720.558 rows=3329197 loops=1
                       Buffers: shared read=54577 dirtied=44548 written=44525, temp written=34576
                     ->  Parallel Seq Scan on public.t_big_3 c  (cost=0.00..205601.81 rows=4166681 width=62) (actual time=0.126..80608.068 rows=3333333 loops=3)
                           Output: c.c2, c.c1
                           Buffers: shared read=163935 dirtied=133935 written=133865
                           Worker 0: actual time=0.260..80737.065 rows=3332395 loops=1
                             Buffers: shared read=54630 dirtied=44552 written=44528
                           Worker 1: actual time=0.049..80943.448 rows=3329197 loops=1
                             Buffers: shared read=54577 dirtied=44548 written=44525
         ->  Parallel Hash  (cost=205601.02..205601.02 rows=4166602 width=62) (actual time=10279.722..10279.722 rows=3333333 loops=3)
               Output: d.c3, d.c2
               Buckets: 65536  Batches: 512  Memory Usage: 2400kB
               Buffers: shared hit=32 read=163903 dirtied=133935 written=133839, temp written=103004
               Worker 0: actual time=10222.812..10222.812 rows=3297904 loops=1
                 Buffers: shared hit=9 read=54055 dirtied=44154 written=44122, temp written=34236
               Worker 1: actual time=10222.839..10222.839 rows=3258559 loops=1
                 Buffers: shared hit=6 read=53413 dirtied=43464 written=43432, temp written=33504
               ->  Parallel Seq Scan on public.t_big_4 d  (cost=0.00..205601.02 rows=4166602 width=62) (actual time=0.163..7282.409 rows=3333333 loops=3)
                     Output: d.c3, d.c2
                     Buffers: shared hit=32 read=163903 dirtied=133935 written=133839
                     Worker 0: actual time=0.108..7244.071 rows=3297904 loops=1
                       Buffers: shared hit=9 read=54055 dirtied=44154 written=44122
                     Worker 1: actual time=0.034..7223.191 rows=3258559 loops=1
                       Buffers: shared hit=6 read=53413 dirtied=43464 written=43432
 Planning Time: 1.134 ms
 Execution Time: 405878.841 ms
(83 rows)
[local:/data/run/pg12]:5120 pg12@testdb=#

可以看到，执行计划中的成本回归一个正常的数值，算法使用Hash Join。由于内存不足，PG把数据拆分为N份，使用临时表来临时缓存Hash Table，使用不同的Batch来执行Join。

到此，关于“分析PostgreSQL中的大表连接”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注创新互联网站，小编会继续努力为大家带来更多实用的文章！

网站名称：分析PostgreSQL中的大表连接
URL标题：http://chengdu.cdxwcx.cn/article/ihcdgp.html

甜橘子，专注成都网站制作网站设计与营销型网站建设与优化

首页

网站建设

网站制作案例

解决方案

网站设计报价

网站制作动态

关于我们

联系我们

成都网站建设设计将想法与焦点和您一起共享

分析PostgreSQL中的大表连接

其他资讯

使用Redis加速计算经纬度（redis 计算经纬度）

系统监控——了解LinuxTop用户（linuxtop用户）

网络营销是什么？

外国服务器怎么租用？(租境外服务器价格)

怎么查云层高度？(怎么看云高)

甜橘子，专注成都网站制作网站设计与营销型网站建设与优化

成都网站建设设计 将想法与焦点和您一起共享

分析PostgreSQL中的大表连接

其他资讯

使用Redis加速计算经纬度（redis 计算经纬度）

系统监控——了解LinuxTop用户（linuxtop用户）

网络营销是什么？

外国服务器怎么租用？(租境外服务器价格)

怎么查云层高度？(怎么看云高)

成都网站建设设计将想法与焦点和您一起共享