檢查 PostgreSQL 中的連線效能

引言

在 SQL 中，常常需要從多個表格中獲取數據。
此時，使用聯接是常見的做法，但在成本上會如何呢？
為了確認這一點，我們將撰寫一個返回相同結果的聯接查詢和子查詢，並比較它們的執行計畫。
如果您不知道如何查看執行計畫，可以參考「在 PostgreSQL 中查看執行計畫」🙌

準備

首先，我們需要準備查詢所需的表格☝️

創建表格

-- 部門表
CREATE TABLE departments (
  id SERIAL PRIMARY KEY,
  name VARCHAR(50)
);

-- 員工表
CREATE TABLE employees (
  id SERIAL PRIMARY KEY,
  department_id INT,
  name VARCHAR(50),
  salary INT
);

插入數據

INSERT INTO departments (name) VALUES
  ('業務'),
  ('開發'),
  ('人事');

INSERT INTO employees (department_id, name, salary) VALUES
  (1, '佐藤', 300),
  (1, '鈴木', 400),
  (1, '高橋', 500),
  (2, '田中', 600),
  (2, '伊藤', 700),
  (2, '渡邊', 800),
  (3, '山本', 300),
  (3, '中村', 350),
  (3, '小林', 400);

使用聯接查詢和子查詢獲取相同結果

我們將在所有員工中找出薪資高於平均薪資的人。

聯接

SELECT e.name, e.salary
FROM employees e
JOIN departments d
  ON e.department_id = d.id
WHERE e.salary > (
  SELECT AVG(salary) FROM employees
);

子查詢

SELECT name, salary
FROM employees
WHERE salary > (
  SELECT AVG(salary) FROM employees
);

執行結果（聯接查詢 / 子查詢）

執行結果是相同的。

 name | salary 
------+--------
 高橋 |    500
 田中 |    600
 伊藤 |    700
 渡邊 |    800

執行計畫

聯接

                                                      QUERY PLAN                                                      
----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=38.54..56.63 rows=170 width=122) (actual time=0.864..0.870 rows=4 loops=1)
   Hash Cond: (e.department_id = d.id)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=16.38..16.39 rows=1 width=32) (actual time=0.075..0.075 rows=1 loops=1)
           ->  Seq Scan on employees  (cost=0.00..15.10 rows=510 width=4) (actual time=0.003..0.005 rows=9 loops=1)
   ->  Seq Scan on employees e  (cost=0.00..17.65 rows=170 width=126) (actual time=0.475..0.478 rows=4 loops=1)
         Filter: ((salary)::numeric > $0)
         Rows Removed by Filter: 5
   ->  Hash  (cost=15.40..15.40 rows=540 width=4) (actual time=0.141..0.142 rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on departments d  (cost=0.00..15.40 rows=540 width=4) (actual time=0.111..0.113 rows=3 loops=1)
 Planning Time: 1.599 ms
 Execution Time: 1.426 ms
(13 rows)

子查詢

                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on employees  (cost=16.39..34.04 rows=170 width=122) (actual time=0.417..0.420 rows=4 loops=1)
   Filter: ((salary)::numeric > $0)
   Rows Removed by Filter: 5
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=16.38..16.39 rows=1 width=32) (actual time=0.067..0.067 rows=1 loops=1)
           ->  Seq Scan on employees employees_1  (cost=0.00..15.10 rows=510 width=4) (actual time=0.011..0.012 rows=9 loops=1)
 Planning Time: 1.073 ms
 Execution Time: 1.831 ms
(8 rows)

在這個例子中，聯接查詢稍微快一些呢！

使用聯接查詢和子查詢獲取相同結果 2

接下來，我們根據部門查找薪資高於平均薪資的人。

聯接

SELECT e.name, e.salary, e.department_id
FROM employees e
JOIN (
  SELECT department_id, AVG(salary) AS avg_salary
  FROM employees
  GROUP BY department_id
) avg_table
  ON e.department_id = avg_table.department_id
WHERE e.salary > avg_table.avg_salary;

子查詢

SELECT name, salary, department_id
FROM employees e
WHERE salary > (
  SELECT AVG(salary)
  FROM employees
  WHERE department_id = e.department_id
);

執行結果（聯接查詢 / 子查詢）

執行結果是相同的。

 name | salary | department_id 
------+--------+---------------
 高橋 |    500 |             1
 渡邊 |    800 |             2
 小林 |    400 |             3

執行計畫

聯接

                                                       QUERY PLAN                                                       
------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=22.65..39.10 rows=170 width=126) (actual time=0.340..0.345 rows=3 loops=1)
   Hash Cond: (e.department_id = employees.department_id)
   Join Filter: ((e.salary)::numeric > (avg(employees.salary)))
   Rows Removed by Join Filter: 6
   ->  Seq Scan on employees e  (cost=0.00..15.10 rows=510 width=126) (actual time=0.192..0.193 rows=9 loops=1)
   ->  Hash  (cost=20.15..20.15 rows=200 width=36) (actual time=0.112..0.113 rows=3 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  HashAggregate  (cost=17.65..20.15 rows=200 width=36) (actual time=0.107..0.110 rows=3 loops=1)
               Group Key: employees.department_id
               Batches: 1  Memory Usage: 40kB
               ->  Seq Scan on employees  (cost=0.00..15.10 rows=510 width=8) (actual time=0.004..0.005 rows=9 loops=1)
 Planning Time: 1.232 ms
 Execution Time: 0.705 ms
(13 rows)

子查詢

                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Seq Scan on employees e  (cost=0.00..8379.10 rows=170 width=126) (actual time=0.094..0.134 rows=3 loops=1)
   Filter: ((salary)::numeric > (SubPlan 1))
   Rows Removed by Filter: 6
   SubPlan 1
     ->  Aggregate  (cost=16.39..16.40 rows=1 width=32) (actual time=0.008..0.008 rows=1 loops=9)
           ->  Seq Scan on employees  (cost=0.00..16.38 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=9)
                 Filter: (department_id = e.department_id)
                 Rows Removed by Filter: 6
 Planning Time: 0.305 ms
 Execution Time: 0.237 ms
(10 rows)

這次的例子中，與 departments 表結合的查詢相比，簡單地引用 employees 表的子查詢執行時間更短！