|
前言:
我们都知道 MySQL 8.0 与 MySQL 5.7 的区别之一就是默认字符集从 latin1 改成了 utf8mb4 ,除此之外,MySQL 8.0 下的字符集和比较规则还有没有其他变化呢?本篇文章我们一起来学习下。
utf8mb4 字符集
在 MySQL 8.0 中,utf8mb4 字符集是默认的字符集设置,它是一个真正的 4 字节 UTF-8 编码,能够存储任何 Unicode 字符,包括表情符号、特殊符号以及其他复杂的文字。
utf8mb4 的使用场景包括但不限于:
存储超出 utf8mb3 范围的字符,如某些不常用的汉字和新增的 Unicode 字符。
存储 emoji 表情,这些表情需要四字节的编码。
确保数据库能够支持国际化应用,处理各种语言和特殊字符 。
utf8mb4 是 utf8 的超集,完全兼容它,并且理论上将原有的 utf8(实际上是 utf8mb3)修改为 utf8mb4 不会对已有的数据产生问题。
- # 查看数据库支持的字符集
- # Default collation 列列出了该字符集的默认比较规则,Maxlen 列指出了每个字符的最大字节数
- mysql> SHOW CHARACTER SET;
- +----------+---------------------------------+---------------------+--------+
- | Charset | Description | Default collation | Maxlen |
- +----------+---------------------------------+---------------------+--------+
- | armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
- | ascii | US ASCII | ascii_general_ci | 1 |
- ...
- | utf8mb3 | UTF-8 Unicode | utf8mb3_general_ci | 3 |
- | utf8mb4 | UTF-8 Unicode | utf8mb4_0900_ai_ci | 4 |
- +----------+---------------------------------+---------------------+--------+
- 41 rows in set (0.01 sec)
- # 查看系统字符集
- mysql> SHOW VARIABLES LIKE 'character_set%';
- +--------------------------+----------------------------------+
- | Variable_name | Value |
- +--------------------------+----------------------------------+
- | character_set_client | utf8mb4 |
- | character_set_connection | utf8mb4 |
- | character_set_database | utf8mb4 |
- | character_set_filesystem | binary |
- | character_set_results | utf8mb4 |
- | character_set_server | utf8mb4 |
- | character_set_system | utf8mb3 |
- | character_sets_dir | /usr/local/mysql/share/charsets/ |
- +--------------------------+----------------------------------+
- 8 rows in set (0.01 sec)
复制代码 utf8mb4_0900_ai_ci 比较规则
MySQL 8.0 版本下,utf8mb4 默认的比较规则是 utf8mb4_0900_ai_ci ,而 MySQL 5.7 utf8mb4 默认的比较规则是 utf8mb4_general_ci ,下面我们一起来看下 utf8mb4 字符集下的比较规则。
- # MySQL 8.0 版本utf8mb4字符集下的比较规则
- mysql> SHOW COLLATION WHERE Charset = 'utf8mb4';
- +----------------------------+---------+-----+---------+----------+---------+---------------+
- | Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
- +----------------------------+---------+-----+---------+----------+---------+---------------+
- | utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |
- | utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD |
- | utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD |
- | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE |
- | utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE |
- | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 | PAD SPACE |
- | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 | PAD SPACE |
- ...
- | utf8mb4_vi_0900_as_cs | utf8mb4 | 300 | | Yes | 0 | NO PAD |
- | utf8mb4_zh_0900_as_cs | utf8mb4 | 308 | | Yes | 0 | NO PAD |
- +----------------------------+---------+-----+---------+----------+---------+---------------+
- 89 rows in set (0.00 sec)
- # 查看系统比较规则
- mysql> SHOW variables like 'coll%';
- +----------------------+--------------------+
- | Variable_name | Value |
- +----------------------+--------------------+
- | collation_connection | utf8mb4_0900_ai_ci |
- | collation_database | utf8mb4_0900_ai_ci |
- | collation_server | utf8mb4_0900_ai_ci |
- +----------------------+--------------------+
- 3 rows in set (0.01 sec)
- # MySQL 5.7 版本utf8mb4字符集下的比较规则
- admin@localhost [(none)] 16:03:33>SHOW COLLATION WHERE Charset = 'utf8mb4';
- +------------------------+---------+-----+---------+----------+---------+
- | Collation | Charset | Id | Default | Compiled | Sortlen |
- +------------------------+---------+-----+---------+----------+---------+
- | utf8mb4_general_ci | utf8mb4 | 45 | Yes | Yes | 1 |
- | utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 |
- | utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 |
- | utf8mb4_icelandic_ci | utf8mb4 | 225 | | Yes | 8 |
- | utf8mb4_latvian_ci | utf8mb4 | 226 | | Yes | 8 |
- | utf8mb4_romanian_ci | utf8mb4 | 227 | | Yes | 8 |
- | utf8mb4_slovenian_ci | utf8mb4 | 228 | | Yes | 8 |
- | utf8mb4_polish_ci | utf8mb4 | 229 | | Yes | 8 |
- | utf8mb4_estonian_ci | utf8mb4 | 230 | | Yes | 8 |
- | utf8mb4_spanish_ci | utf8mb4 | 231 | | Yes | 8 |
- | utf8mb4_swedish_ci | utf8mb4 | 232 | | Yes | 8 |
- | utf8mb4_turkish_ci | utf8mb4 | 233 | | Yes | 8 |
- | utf8mb4_czech_ci | utf8mb4 | 234 | | Yes | 8 |
- | utf8mb4_danish_ci | utf8mb4 | 235 | | Yes | 8 |
- | utf8mb4_lithuanian_ci | utf8mb4 | 236 | | Yes | 8 |
- | utf8mb4_slovak_ci | utf8mb4 | 237 | | Yes | 8 |
- | utf8mb4_spanish2_ci | utf8mb4 | 238 | | Yes | 8 |
- | utf8mb4_roman_ci | utf8mb4 | 239 | | Yes | 8 |
- | utf8mb4_persian_ci | utf8mb4 | 240 | | Yes | 8 |
- | utf8mb4_esperanto_ci | utf8mb4 | 241 | | Yes | 8 |
- | utf8mb4_hungarian_ci | utf8mb4 | 242 | | Yes | 8 |
- | utf8mb4_sinhala_ci | utf8mb4 | 243 | | Yes | 8 |
- | utf8mb4_german2_ci | utf8mb4 | 244 | | Yes | 8 |
- | utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 |
- | utf8mb4_unicode_520_ci | utf8mb4 | 246 | | Yes | 8 |
- | utf8mb4_vietnamese_ci | utf8mb4 | 247 | | Yes | 8 |
- +------------------------+---------+-----+---------+----------+---------+
- 26 rows in set (0.00 sec)
复制代码
|
|