在 SQL 中,常常需要從多個表格中獲取數據。
此時,使用聯接是常見的做法,但在成本上會如何呢?
為了確認這一點,我們將撰寫一個返回相同結果的聯接查詢和子查詢,並比較它們的執行計畫。
如果您不知道如何查看執行計畫,可以參考「在 PostgreSQL 中查看執行計畫」🙌
首先,我們需要準備查詢所需的表格☝️
-- 部門表
CREATE TABLE departments (
id SERIAL PRIMARY KEY,
name VARCHAR(50)
);
-- 員工表
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
department_id INT,
name VARCHAR(50),
salary INT
);
INSERT INTO departments (name) VALUES
('業務'),
('開發'),
('人事');
INSERT INTO employees (department_id, name, salary) VALUES
(1, '佐藤', 300),
(1, '鈴木', 400),
(1, '高橋', 500),
(2, '田中', 600),
(2, '伊藤', 700),
(2, '渡邊', 800),
(3, '山本', 300),
(3, '中村', 350),
(3, '小林', 400);
我們將在所有員工中找出薪資高於平均薪資的人。
SELECT e.name, e.salary
FROM employees e
JOIN departments d
ON e.department_id = d.id
WHERE e.salary > (
SELECT AVG(salary) FROM employees
);
SELECT name, salary
FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
執行結果是相同的。
name | salary
------+--------
高橋 | 500
田中 | 600
伊藤 | 700
渡邊 | 800
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Hash Join (cost=38.54..56.63 rows=170 width=122) (actual time=0.864..0.870 rows=4 loops=1)
Hash Cond: (e.department_id = d.id)
InitPlan 1 (returns $0)
-> Aggregate (cost=16.38..16.39 rows=1 width=32) (actual time=0.075..0.075 rows=1 loops=1)
-> Seq Scan on employees (cost=0.00..15.10 rows=510 width=4) (actual time=0.003..0.005 rows=9 loops=1)
-> Seq Scan on employees e (cost=0.00..17.65 rows=170 width=126) (actual time=0.475..0.478 rows=4 loops=1)
Filter: ((salary)::numeric > $0)
Rows Removed by Filter: 5
-> Hash (cost=15.40..15.40 rows=540 width=4) (actual time=0.141..0.142 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on departments d (cost=0.00..15.40 rows=540 width=4) (actual time=0.111..0.113 rows=3 loops=1)
Planning Time: 1.599 ms
Execution Time: 1.426 ms
(13 rows)
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Seq Scan on employees (cost=16.39..34.04 rows=170 width=122) (actual time=0.417..0.420 rows=4 loops=1)
Filter: ((salary)::numeric > $0)
Rows Removed by Filter: 5
InitPlan 1 (returns $0)
-> Aggregate (cost=16.38..16.39 rows=1 width=32) (actual time=0.067..0.067 rows=1 loops=1)
-> Seq Scan on employees employees_1 (cost=0.00..15.10 rows=510 width=4) (actual time=0.011..0.012 rows=9 loops=1)
Planning Time: 1.073 ms
Execution Time: 1.831 ms
(8 rows)
在這個例子中,聯接查詢稍微快一些呢!
接下來,我們根據部門查找薪資高於平均薪資的人。
SELECT e.name, e.salary, e.department_id
FROM employees e
JOIN (
SELECT department_id, AVG(salary) AS avg_salary
FROM employees
GROUP BY department_id
) avg_table
ON e.department_id = avg_table.department_id
WHERE e.salary > avg_table.avg_salary;
SELECT name, salary, department_id
FROM employees e
WHERE salary > (
SELECT AVG(salary)
FROM employees
WHERE department_id = e.department_id
);
執行結果是相同的。
name | salary | department_id
------+--------+---------------
高橋 | 500 | 1
渡邊 | 800 | 2
小林 | 400 | 3
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=22.65..39.10 rows=170 width=126) (actual time=0.340..0.345 rows=3 loops=1)
Hash Cond: (e.department_id = employees.department_id)
Join Filter: ((e.salary)::numeric > (avg(employees.salary)))
Rows Removed by Join Filter: 6
-> Seq Scan on employees e (cost=0.00..15.10 rows=510 width=126) (actual time=0.192..0.193 rows=9 loops=1)
-> Hash (cost=20.15..20.15 rows=200 width=36) (actual time=0.112..0.113 rows=3 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> HashAggregate (cost=17.65..20.15 rows=200 width=36) (actual time=0.107..0.110 rows=3 loops=1)
Group Key: employees.department_id
Batches: 1 Memory Usage: 40kB
-> Seq Scan on employees (cost=0.00..15.10 rows=510 width=8) (actual time=0.004..0.005 rows=9 loops=1)
Planning Time: 1.232 ms
Execution Time: 0.705 ms
(13 rows)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on employees e (cost=0.00..8379.10 rows=170 width=126) (actual time=0.094..0.134 rows=3 loops=1)
Filter: ((salary)::numeric > (SubPlan 1))
Rows Removed by Filter: 6
SubPlan 1
-> Aggregate (cost=16.39..16.40 rows=1 width=32) (actual time=0.008..0.008 rows=1 loops=9)
-> Seq Scan on employees (cost=0.00..16.38 rows=3 width=4) (actual time=0.002..0.003 rows=3 loops=9)
Filter: (department_id = e.department_id)
Rows Removed by Filter: 6
Planning Time: 0.305 ms
Execution Time: 0.237 ms
(10 rows)
這次的例子中,與 departments 表結合的查詢相比,簡單地引用 employees 表的子查詢執行時間更短!
比較執行計畫後,我們發現有些情況下聯接更快,而有些情況下子查詢更快。
避免不必要的聯接
子查詢只參考必要的表格,處理過程較少,效率更高。
結合多個表或聚合
利用聯接可以在一次掃描中高效處理。
總結來說,「哪一個更快」是依情況而定的,選擇最佳的寫法取決於所需的信息、必要的表格數量和數據量。
「聯接和子查詢哪個更快」並不能一概而論,
而是根據所需的資訊、必要的表格數量和數據量的不同而改變最佳解。
在開發時,意識到「這個查詢是否真的需要聯接?」「是否可以用子查詢解決?」是非常重要的。