Skip to content

Commit 04a92da

Browse files
authored
Blog: Querying MySQL Row Counts the Fast Way (#378)
1 parent 6bfcedd commit 04a92da

File tree

2 files changed

+331
-0
lines changed

2 files changed

+331
-0
lines changed
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
---
2+
3+
title: Querying MySQL Row Counts the Fast Way
4+
date: 2025-10-06T00:34:00
5+
tags:
6+
- databases
7+
- mysql
8+
9+
---
10+
11+
`SELECT COUNT(*)` requires an expensive full index scan, which probably isn't what you want.
12+
13+
I'll start with what you _do_ want, and then explain alternatives after.
14+
15+
## InnoDB's persistent stats
16+
17+
It's highly likely that you're using InnoDB as the engine for your tables as it has been the default since [MySQL v5.5.5 (2010)](https://web.archive.org/web/20190123090733/https://dev.mysql.com/doc/refman/5.5/en/storage-engine-setting.html).
18+
19+
[MySQL v5.6.2 (2011)](https://downloads.mysql.com/docs/mysql-5.6-relnotes-en.pdf) added the ability to [persist optimizer statistics for InnoDB tables](https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html) across server restarts. This behavior is controlled by the [`innodb_stats_persistent`](https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_stats_persistent) setting, which is "ON" by default. Individual tables can override this setting with the [`STATS_PERSISTENT`](https://dev.mysql.com/doc/refman/8.0/en/create-table.html#create-table-options) table setting.
20+
21+
These table stats are persisted in the `mysql.innodb_table_stats` table, and can be queried like this:
22+
23+
```sql
24+
SELECT n_rows
25+
, last_update
26+
FROM mysql.innodb_table_stats
27+
WHERE database_name = :databaseName
28+
AND table_name = :tableName;
29+
```
30+
31+
_Note: you can also find stats about indexes in `mysql.innodb_index_stats`._
32+
33+
**Caveat 1: `mysql.innodb_table_stats.n_rows` is estimated based on sampling index pages.**
34+
35+
InnoDB samples an [`innodb_stats_persistent_sample_pages`](https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_stats_persistent_sample_pages) number of pages (default of "20") from the clustered index and then extrapolates. Individual tables can override this setting with the [`STATS_SAMPLE_PAGES`](https://dev.mysql.com/doc/refman/8.0/en/create-table.html#create-table-options) table setting.
36+
37+
_As a table's size grows, so does the inaccuracy of `n_rows`._
38+
39+
**Caveat 2: a table's stats may not update often.**
40+
41+
A table's `mysql.innodb_table_stats` only updates in these scenarios:
42+
43+
- "When a table undergoes changes to more than 10% of its rows" if the [`innodb_stats_auto_recalc`](https://dev.mysql.com/doc/refman/8.0/en/innodb-parameters.html#sysvar_innodb_stats_auto_recalc) setting is "ON", which it is by default
44+
- [`ANALYZE TABLE <table>`](https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html) is run
45+
- [`OPTIMIZE TABLE <table>`](https://dev.mysql.com/doc/refman/8.0/en/optimize-table.html) is run
46+
- [`TRUNCATE TABLE <table>`](https://dev.mysql.com/doc/refman/8.0/en/truncate-table.html) is run
47+
- Most `ALTER TABLE` statements, including `ADD COLUMN` and `DROP COLUMN`
48+
- `CREATE INDEX` and `DROP INDEX`
49+
50+
_As a table's size grows, it will take more and more operations to hit that 10% threshold._
51+
52+
## Why not `SELECT COUNT(*)`?
53+
54+
Because it requires a full index scan. From the [official documentation](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count):
55+
56+
> `InnoDB` processes `SELECT COUNT(*)` statements by traversing the smallest available secondary index unless an index or optimizer hint directs the optimizer to use a different index. If a secondary index is not present, `InnoDB` processes `SELECT COUNT(*)` statements by scanning the clustered index.
57+
58+
It's worth noting that an InnoDB table will always have a [clustered index](https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html) because that's how it stores row data, so `SELECT COUNT(*)` won't ever cause a full table scan.
59+
60+
Because different InnoDB transactions can have different row counts, the only way to return an accurate count is to fully enumerate an index _within_ the current transaction:
61+
62+
> For transactional storage engines such as `InnoDB`, storing an exact row count is problematic. Multiple transactions may be occurring at the same time, each of which may affect the count.
63+
>
64+
> `InnoDB` does not keep an internal count of rows in a table because concurrent transactions might "see" different numbers of rows at the same time. Consequently, `SELECT COUNT(*)` statements only count rows visible to the current transaction.
65+
66+
Full index scans can be exceptionally slow. To illustrate the problem, let's create a simple table and fill it with up to 20,000,000 rows of non-trivial size:
67+
68+
```sql
69+
CREATE TABLE messages
70+
(
71+
id SERIAL PRIMARY KEY,
72+
message TEXT NOT NULL
73+
);
74+
75+
INSERT INTO messages (message)
76+
SELECT 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer aliquam ornare velit, auctor tempus erat ultrices ut. Phasellus ac nibh ante. Morbi consectetur, lorem in pulvinar tincidunt, augue est cursus ipsum, sed dapibus neque sapien id libero. Donec id felis sem. Morbi quis mi turpis. Nam viverra felis ac ex convallis, in congue nunc ultrices. Curabitur rutrum, lorem sit amet vulputate ultricies, velit odio ultrices dui, sed volutpat lorem felis vitae nibh. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Aenean orci mi, consectetur sed turpis sed, consequat tempor nisi. Cras id venenatis mi. Sed cursus in eros sit amet interdum.'
77+
FROM (SELECT 1
78+
FROM (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a
79+
, (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b
80+
, (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) c
81+
, (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) d
82+
, (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) e
83+
, (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) f
84+
, (SELECT 0 AS n UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) g) temp;
85+
```
86+
87+
You can see how the time it takes to perform a full index scan grows propotionally with the table size:
88+
89+
```sql
90+
SELECT COUNT(*) FROM messages;
91+
```
92+
93+
```vega-lite
94+
{
95+
"data": {
96+
"values": [
97+
{"symbol":"SELECT COUNT(*)", "count":10, "seconds":0.00},
98+
{"symbol":"SELECT COUNT(*)", "count":100, "seconds":0.00},
99+
{"symbol":"SELECT COUNT(*)", "count":1000, "seconds":0.00},
100+
{"symbol":"SELECT COUNT(*)", "count":10000, "seconds":0.01},
101+
{"symbol":"SELECT COUNT(*)", "count":100000, "seconds":0.09},
102+
{"symbol":"SELECT COUNT(*)", "count":1000000, "seconds":0.65},
103+
{"symbol":"SELECT COUNT(*)", "count":5000000, "seconds":4.07},
104+
{"symbol":"SELECT COUNT(*)", "count":10000000, "seconds":19.93},
105+
{"symbol":"SELECT COUNT(*)", "count":20000000, "seconds":76.08}
106+
]
107+
},
108+
"encoding": {
109+
"detail": {"field":"symbol", "legend":null}
110+
},
111+
"layer": [
112+
{
113+
"mark": "line",
114+
"encoding": {
115+
"x": {
116+
"field": "count",
117+
"type": "quantitative",
118+
"scale": {
119+
"type": "pow",
120+
"exponent": 0.67
121+
},
122+
"title": "Rows"
123+
},
124+
"y": {
125+
"field": "seconds",
126+
"type": "quantitative",
127+
"title": "Time (sec)"
128+
}
129+
}
130+
},
131+
{
132+
"encoding": {
133+
"x": {
134+
"aggregate": {"argmax":"count"},
135+
"field": "count",
136+
"type": "quantitative"
137+
},
138+
"y": {
139+
"aggregate": {"argmax":"count"},
140+
"field": "seconds",
141+
"type": "quantitative"
142+
}
143+
},
144+
"layer": [
145+
{"mark":{"type":"circle"}},
146+
{
147+
"mark": {"type":"text", "align":"left", "dx":10},
148+
"encoding": {"text":{"field":"symbol", "type":"nominal"}}
149+
}
150+
]
151+
}
152+
]
153+
}
154+
```
155+
156+
_[MySQL's `OFFSET`](/blog/the-dangers-of-offset-with-mysql) can suffer from a similar exponential cost._
157+
158+
**You cannot avoid `SELECT COUNT(*)` if you have a requirement for exact or realtime row counts.**
159+
160+
## `COUNT(*)` vs. `COUNT(col)`
161+
162+
`COUNT(col)` counts the number of _non-null_ `col` values. If the column is indexed, then that index will be scanned, otherwise a full table scan will be necessary.
163+
164+
From the [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count):
165+
166+
> `COUNT(*)` is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain `NULL` values.
167+
168+
So you should only need `COUNT(col)` under very specific circumstances.
169+
170+
## `COUNT(*)` vs. `COUNT(1)`
171+
172+
From the [MySQL documentation](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count):
173+
174+
> `InnoDB` handles `SELECT COUNT(*)` and `SELECT COUNT(1)` operations in the same way. There is no performance difference.
175+
176+
## Why not `information_schema.tables`?
177+
178+
Because the table statistics columns in `information_schema.tables` are cached, up to a default of 24 hours (controlled by the [`information_schema_stats_expiry`](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_information_schema_stats_expiry) setting).
179+
180+
For InnoDB tables, `information_schema.tables` is populated from `mysql.innodb_table_stats`, so you should query from the source to avoid the cache. For non-InnoDB tables, you have to use `information_schema.tables` if you want to avoid the expensive `SELECT COUNT(*)`.
181+
182+
```sql
183+
-- Row count for every table
184+
SELECT table_schema
185+
, table_name
186+
, table_rows
187+
FROM information_schema.tables
188+
WHERE table_type = 'BASE TABLE'
189+
AND table_schema = :databaseName
190+
ORDER BY table_schema, table_name;
191+
```
192+
193+
```sql
194+
-- Row count for a specific table
195+
SELECT table_schema
196+
, table_name
197+
, table_rows
198+
FROM information_schema.tables
199+
WHERE table_schema = :databaseName
200+
AND table_name = :tableName;
201+
```
202+
203+
A table's `information_schema.tables` statistics are updated in these scenarios:
204+
205+
- The column's cache has expired
206+
- [`ANALYZE TABLE ...`](https://dev.mysql.com/doc/refman/8.0/en/analyze-table.html) is run
207+
208+
Setting the [`information_schema_stats_expiry`](https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_information_schema_stats_expiry) setting to "0" (zero) disables caching, causing queries against `information_schema.tables` to always retrieve the latest statistics from the storage engine (i.e. `mysql.innodb_table_stats`).
209+
210+
## Conclusion
211+
212+
`SELECT COUNT(*)` and similar queries can take an exceptionally long time on large tables. You should strongly consider using the persistent stats stored in [`mysql.innodb_table_stats`](https://dev.mysql.com/doc/refman/8.0/en/innodb-persistent-stats.html) if possible.

0 commit comments

Comments
 (0)