You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: br/backup-and-restore-overview.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,8 @@ Backup and restore might go wrong when some TiDB features are enabled or disable
112
112
113
113
| Feature | Issue | Solution |
114
114
| ---- | ---- | ----- |
115
-
|GBK charset|| BR of versions earlier than v5.4.0 does not support restoring `charset=GBK` tables. No version of BR supports recovering `charset=GBK` tables to TiDB clusters earlier than v5.4.0. |
115
+
|GBK charset|| Before v5.4.0, BR does not support restoring tables with `charset=GBK`. In addition, no version of BR supports restoring tables with `charset=GBK` to TiDB clusters earlier than v5.4.0. |
116
+
|GB18030 charset|| Before v9.0.0, BR does not support restoring tables with `charset=GB18030`. In addition, no version of BR supports restoring tables with `charset=GB18030` to TiDB clusters earlier than v9.0.0.|
116
117
| Clustered index |[#565](https://github.com/pingcap/br/issues/565)| Make sure that the value of the `tidb_enable_clustered_index` global variable during restore is consistent with that during backup. Otherwise, data inconsistency might occur, such as `default not found` error and inconsistent data index. |
117
118
| New collation |[#352](https://github.com/pingcap/br/issues/352)| Make sure that the value of the `new_collation_enabled` variable in the `mysql.tidb` table during restore is consistent with that during backup. Otherwise, inconsistent data index might occur and checksum might fail to pass. For more information, see [FAQ - Why does BR report `new_collations_enabled_on_first_bootstrap` mismatch?](/faq/backup-and-restore-faq.md#why-is-new_collation_enabled-mismatch-reported-during-restore). |
118
119
| Global temporary tables || Make sure that you are using v5.3.0 or a later version of BR to back up and restore data. Otherwise, an error occurs in the definition of the backed global temporary tables. |
@@ -171,7 +174,7 @@ SHOW COLLATION WHERE Charset = 'utf8mb4';
171
174
5 rows in set (0.001 sec)
172
175
```
173
176
174
-
For details about the TiDB support of the GBK characterset, see [GBK](/character-set-gbk.md).
177
+
For details about the GBK character set, see [The GBK Character Set](/character-set-gbk.md). For details about the GB18030 character set, see [The GB18030 Character Set](/character-set-gb18030.md).
@@ -518,7 +521,7 @@ For a TiDB cluster that is already initialized, you can check whether the new co
518
521
SELECT VARIABLE_VALUE FROM mysql.tidb WHERE VARIABLE_NAME='new_collation_enabled';
519
522
```
520
523
521
-
```sql
524
+
```
522
525
+----------------+
523
526
| VARIABLE_VALUE |
524
527
+----------------+
@@ -535,39 +538,39 @@ This new framework supports semantically parsing collations. TiDB enables the ne
535
538
536
539
</CustomContent>
537
540
538
-
Under the new framework, TiDB supports the `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_bin`, `utf8mb4_0900_ai_ci`, `gbk_chinese_ci`, and`gbk_bin` collations, which is compatible with MySQL.
541
+
Under the new framework, TiDB supports the `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_bin`, `utf8mb4_0900_ai_ci`, `gbk_chinese_ci`, `gbk_bin`, `gb18030_chinese_ci` and `gb18030_bin` collations, which is compatible with MySQL.
539
542
540
-
When one of `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_ai_ci`and`gbk_chinese_ci` is used, the string comparison is case-insensitive and accent-insensitive. At the same time, TiDB also corrects the collation's `PADDING` behavior:
543
+
When one of `utf8_general_ci`, `utf8mb4_general_ci`, `utf8_unicode_ci`, `utf8mb4_unicode_ci`, `utf8mb4_0900_ai_ci`, `gbk_chinese_ci` and `gb18030_chinese_ci` is used, the string comparison is case-insensitive and accent-insensitive. At the same time, TiDB also corrects the collation's `PADDING` behavior:
summary: Learn the details of TiDB's support for the GB18030 character set.
4
+
---
5
+
6
+
# The GB18030 Character Set <spanclass="version-mark">New in v9.0.0</span>
7
+
8
+
Starting from v9.0.0, TiDB supports the GB18030-2022 character set. This document describes TiDB's support for and compatibility with the GB18030 character set.
This section describes the compatibility of the GB18030 character set in TiDB with MySQL.
40
+
41
+
### Collation compatibility
42
+
43
+
In MySQL, the default collation for the GB18030 character set is `gb18030_chinese_ci`. In TiDB, the default collation for GB18030 depends on the configuration parameter [`new_collations_enabled_on_first_bootstrap`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file/#new_collations_enabled_on_first_bootstrap):
44
+
45
+
- By default, `new_collations_enabled_on_first_bootstrap` is set to `true`, which means enabling the [new collation framework](/character-set-and-collation.md#new-framework-for-collations). In this case, the default collation for GB18030 is `gb18030_chinese_ci`.
46
+
- If `new_collations_enabled_on_first_bootstrap` is set to `false`, the new framework for collations is disabled, and the default collation for GB18030 is `gb18030_bin`.
47
+
48
+
Additionally, the `gb18030_bin` supported by TiDB differs from MySQL's `gb18030_bin` collation. TiDB converts GB18030 to `utf8mb4` and then performs binary sorting.
49
+
50
+
After enabling the new framework for collations, if you check the collations for the GB18030 character set, you can see that TiDB's default collation for GB18030 is switched to `gb18030_chinese_ci`:
- TiDB supports GB18030-2022 characters, while MySQL supports GB18030-2005 characters. As a result, the encoding and decoding results for certain characters differ between the two systems.
82
+
83
+
- For invalid GB18030 characters, such as `0xFE39FE39`, MySQL allows writing them to the database in hexadecimal form and stores them as `?`. In TiDB, reading or writing invalid GB18030 characters in strict mode returns an error; in non-strict mode, TiDB allows reading or writing invalid GB18030 characters but returns a warning.
84
+
85
+
### Others
86
+
87
+
- Currently, TiDB does not support using the `ALTER TABLE` statement to convert other character sets to `gb18030`, or to convert from `gb18030` to another character set.
88
+
89
+
- TiDB does not support using the `_gb18030` character set introducer. For example:
90
+
91
+
```sql
92
+
CREATETABLEt(a CHAR(10) CHARSET BINARY);
93
+
Query OK, 0 rows affected (0.00 sec)
94
+
INSERT INTO t VALUES (_gb18030'啊');
95
+
ERROR 1115 (42000): Unsupported character introducer: 'gb18030'
96
+
```
97
+
98
+
- For binary characters in`ENUM`and`SET` types, TiDB currently treats them as using the `utf8mb4` character set.
99
+
100
+
## Component compatibility
101
+
102
+
- TiFlash, TiDB Data Migration (DM), and TiCDC currently do not support the GB18030 character set.
103
+
104
+
- Before v9.0.0, Dumpling does not support exporting tables with `charset=GB18030`, and TiDB Lightning does not support importing tables with `charset=GB18030`.
105
+
106
+
- Before v9.0.0, TiDB Backup & Restore (BR) does not support backing up or restoring tables with `charset=GB18030`. In addition, no version of BR supports restoring tables with `charset=GB18030` to TiDB clusters earlier than v9.0.0.
107
+
108
+
## See also
109
+
110
+
* [`SHOW CHARACTER SET`](/sql-statements/sql-statement-show-character-set.md)
0 commit comments