You are here
Shinguz
 
    MariaDB and MySQL Character Set Conversion
- Introduction
- Character Sets
- Steps to convert Character Set to utf8mb4
- MySQL Pump
- MySQL Master/Slave Replication for Character Set conversion
- MySQL Shell, mysqlsh Upgrade Checker Utility
Recently we had a consulting engagement where we had to help the customer to migrate from latin1 Character Set to utf8mb4 Character Set. In the same MySQL consulting engagement we considered to upgrade from MySQL 5.6 to MySQL 5.7 as well [ Lit. ]. We decided to split the change in 2 parts: Upgrading to 5.7 in the first step and converting to uft8mb4 in the second step. There were various reasons for this decision:
- 2 smaller changes are easier to control then one big shot.
- We assume that in 5.7 we experience less problems with utf8mb4 because the trend given by MySQL was more towards utf8mb4 in 5.7 than in MySQL 5.6. So we hope to hit less problems and bugs.
For Upgrading see also MariaDB and MySQL Upgrade Problems
Remark: It makes possibly also sens to think about Collations before starting with the conversion!
Character SetsHistorically MariaDB and MySQL had the default Character Set latin1 (Latin-1 or ISO-8859-1) which was sufficient for most of the western hemisphere.
But as technology spreads and demands increase other cultures want to have their characters represented understandably as well. So Unicode standard was invented. And MariaDB and MySQL applied this standard as well.
The original MariaDB/MySQL utf8(mb3) implementation was not perfect or complete so they implemented utf8mb4 as a super set of utf8(mb3). So at least since MariaDB/MySQL version 5.5 latin1, utf8 and utf8mb4 are available. The current MySQL 5.7 utf8mb4 implementation should cover Unicode 9.0.0:
SQL> SELECT * FROM information_schema.character_sets WHERE character_set_name LIKE 'utf8%' OR character_set_name = 'latin1'; +--------------------+----------------------+----------------------+--------+ | CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | +--------------------+----------------------+----------------------+--------+ | latin1 | latin1_swedish_ci | cp1252 West European | 1 | | utf8 | utf8_general_ci | UTF-8 Unicode | 3 | | utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 | +--------------------+----------------------+----------------------+--------+The default Character Set up to MariaDB 10.4 and MySQL 5.7 was latin1. In MySQL 8.0 the default Character Set has changed to utf8mb4. There are no signs so far that MariaDB will take the same step:
SQL> status -------------- mysql Ver 8.0.16 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL) Connection id: 84 Current database: Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 8.0.16 MySQL Community Server - GPL Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8mb4 Db characterset: utf8mb4 Client characterset: utf8mb4 Conn. characterset: utf8mb4 UNIX socket: /var/run/mysqld/mysql-3332.sock Uptime: 3 days 47 min 13 secSo we see a general trend from latin1 to utf8(mb3) to utf8mb4 technically and business wise (aka globalization).
For the DBA this means sooner or later we have to think about a conversion of all tables of the whole database instance (all tables of all schemata) to utf8mb4!
Steps to convert Character Set to utf8mb4 Analyzing the ServerFirst of all one should analyze the system (O/S, database instance and client/application). On the server we can run the following command to verify the actual used and supported Character Set:
# locale LANG=en_GB.UTF-8 LANGUAGE= LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL=On the MariaDB/MySQL database instance we check the current server configuration and the session configuration with the following commands:
SQL> SHOW SESSION VARIABLES WHERE Variable_name LIKE 'character_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+-------------------+ | Variable_name | Value | +--------------------------+-------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | collation_connection | utf8_general_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +--------------------------+-------------------+ SQL> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'character_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+-------------------+ | Variable_name | Value | +--------------------------+-------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +--------------------------+-------------------+These configuration variables are for Client/Server communication: character_set_client, character_set_connection and character_set_results. These for Server configuration: character_set_server and character_set_database (deprecated in MySQL 5.7). And these for System internals and File System access: character_set_system and character_set_filesystem.
Sometimes we see customers using the Logon Trigger init_connect to force clients for a specific Character Set:
SQL> SHOW GLOBAL VARIABLES LIKE 'init_connect'; +---------------+------------------+ | Variable_name | Value | +---------------+------------------+ | init_connect | SET NAMES latin1 | +---------------+------------------+The SET NAMES command sets the character_set_client, character_set_connection and character_set_results session variables. [ Lit. ]
Analyzing the Application and the ClientsSimilar steps to analyze the Application and Clients should be taken. We want to answer the following questions:
- Support of utf8 of Application/Client O/S (Windows)?
- Support of utf8 of Web Server (Apache (AddDefaultCharset utf-8), Nginx, IIS, ...)
- Version of programming language (Java, PHP (5.4 and newer?), ...)
- Version of MariaDB and MySQL Connectors (JDBC (5.1.47 and newer?), ODBC (5.3.11 and newer?), mysqli/mysqlnd (⋝7.0.19?, ⋝7.1.5?), ...)
- Application code (header('Content-Type: text/html; charset=utf-8');, <?xml version="1.0" encoding="UTF-8"?>, <meta http-equiv="Content-Type" content="text/html; charset=utf-8">, <form accept-charset="utf-8">, htmlspecialchars($str, ENT_NOQUOTES, "UTF-8"), $mysqli->set_charset('utf8mb4');, mbstring [ Lit. ], etc.
See also: Configuring Application Character Set and Collation
If you do not have your Application under control your DBA can help you to find out with the General Query Log (SET GLOBAL general_log = 1;) what is going on:
190815 19:03:00 12 Connect root@localhost on 12 Query select @@version_comment limit 1 12 Query SET NAMES latin1 12 Query SET NAMES utf8 190815 19:05:24 12 Quitor with some queries on the PERFORMANCE_SCHEMA:
-- Works since MySQL 5.6/MariaDB 10.0 SQL> SELECT t.thread_id, t.processlist_id, t.processlist_user, t.processlist_host, t.processlist_db , sca.attr_name, sca.attr_value FROM performance_schema.threads AS t JOIN performance_schema.session_connect_attrs AS sca ON sca.processlist_id = t.processlist_id WHERE t.processlist_user IS NOT NULL AND t.thread_id = 103 ; +-----------+----------------+------------------+------------------+----------------+----------------------------------+---------------------+ | thread_id | processlist_id | processlist_user | processlist_host | processlist_db | attr_name | attr_value | +-----------+----------------+------------------+------------------+----------------+----------------------------------+---------------------+ | 103 | 78 | replication | localhost | NULL | _os | linux-glibc2.12 | | 103 | 78 | replication | localhost | NULL | _client_name | libmysql | | 103 | 78 | replication | localhost | NULL | _pid | 29269 | | 103 | 78 | replication | localhost | NULL | program_name | mysqld | | 103 | 78 | replication | localhost | NULL | _platform | x86_64 | | 103 | 78 | replication | localhost | NULL | _client_replication_channel_name | NULL | | 103 | 78 | replication | localhost | NULL | _client_role | binary_log_listener | | 103 | 78 | replication | localhost | NULL | _client_version | 5.7.26 | +-----------+----------------+------------------+------------------+----------------+----------------------------------+---------------------+ -- Works since MySQL 5.7 only SELECT t.thread_id, t.processlist_id, t.processlist_user, t.processlist_host, t.processlist_db , vbt.variable_name, vbt.variable_value FROM performance_schema.threads AS t JOIN performance_schema.variables_by_thread AS vbt ON vbt.thread_id = t.thread_id WHERE t.processlist_user IS NOT NULL AND (vbt.variable_name like 'charac%' OR vbt.variable_name LIKE 'coll%') AND t.thread_id = 103 ; +-----------+----------------+------------------+------------------+----------------+--------------------------+-------------------+ | thread_id | processlist_id | processlist_user | processlist_host | processlist_db | variable_name | variable_value | +-----------+----------------+------------------+------------------+----------------+--------------------------+-------------------+ | 103 | 78 | replication | localhost | NULL | character_set_client | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_connection | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_database | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_filesystem | binary | | 103 | 78 | replication | localhost | NULL | character_set_results | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_server | latin1 | | 103 | 78 | replication | localhost | NULL | collation_connection | latin1_swedish_ci | | 103 | 78 | replication | localhost | NULL | collation_database | latin1_swedish_ci | | 103 | 78 | replication | localhost | NULL | collation_server | latin1_swedish_ci | +-----------+----------------+------------------+------------------+----------------+--------------------------+-------------------+Preparation of the Server Settings and the Application
To have a better control of the impact of some changes we decided to do some changes on the Application first:
- Application is setting the Character Set properly itself ($mysqli->set_charset('utf8mb4') [ Lit. ]). In the same step also sql_mode can be set by the application so we can use the defaults on server side in the future.
- Apache and PHP are configured to support UTF-8.
- After this step init_connect, character_set_server and character_set_database can be changed to utf8mb4 on the Server and --skip-character-set-client-handshake can be removed at the same time [ Lit. ]
First we checked and converted the default Character Set of the Schemata/Databases:
SQL> SELECT schema_name, default_character_set_name, default_collation_name FROM information_schema.schemata WHERE schema_name NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') ; +-------------+----------------------------+------------------------+ | schema_name | default_character_set_name | default_collation_name | +-------------+----------------------------+------------------------+ | focmm | latin1 | latin1_swedish_ci | | test | latin1 | latin1_swedish_ci | | foodmart | latin1 | latin1_swedish_ci | | erp | latin1 | latin1_swedish_ci | | world | latin1 | latin1_swedish_ci | +-------------+----------------------------+------------------------+Converting the Schemata is done as follows:
SQL> ALTER DATABASE ... DEFAULT CHARACTER SET utf8mb4;which is a fast operation.
To convert the tables there are many different possibilities we considered:
- The easy one: ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4;
- The possibly faster one: dump/restore with sed 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4/'
- The possibly fastest one: drop Secondary Indexes, dump/restore with sed, create Secondary Indexes again (mysqlpump).
- The automated one: With Perconas pt-online-schema-change [ Lit. ] or Facebooks OnlineSchemaChange OSC [ Lit. ]
- The most elegant but not supported one: Master/Slave Replication.
Already with converting the first table we run into a problem:
SQL> ALTER TABLE test CONVERT TO CHARACTER SET utf8mb4; ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.This table had a Primary Key of a length of more than 255 bytes and this is not possible with the old InnoDB Antelope File Format:
SQL> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'innodb_file_format%' OR Variable_name LIKE 'innodb_large%'; +--------------------------+----------+ | Variable_name | Value | +--------------------------+----------+ | innodb_file_format | Antelope | | innodb_file_format_check | ON | | innodb_file_format_max | Antelope | | innodb_large_prefix | OFF | +--------------------------+----------+So we have to find out first, which tables are still in old Antelope File Format:
SQL> SELECT table_schema , CASE WHEN row_format = 'Dynamic' THEN 'Barracuda' WHEN row_format = 'Compressed' THEN 'Barracuda' WHEN row_format = 'Compact' THEN 'Antelope' WHEN row_format = 'Redundant' THEN 'Antelope' END AS 'file_format' , COUNT(*) FROM information_schema.tables WHERE engine = 'InnoDB' AND table_schema NOT IN ('information_schema', 'sys', 'mysql') GROUP BY table_schema, file_format ; +--------------+-------------+----------+ | table_schema | file_format | count(*) | +--------------+-------------+----------+ | foodmart | Barracuda | 23 | | test | Barracuda | 1 | | world | Antelope | 2 | | world | Barracuda | 1 | +--------------+-------------+----------+Then we could covert the table correctly forcing the Barracuda File format:
SQL> ALTER TABLE test CONVERT TO CHARACTER SET utf8mb4, ROW_FORMAT=dynamic;Testing of new Character Set
The last but most important step is to test the changes. Here we recommend to do as many as possible different tests:
- MySQL CLI: mysql
- phpMyAdmin
- MySQL Workbench
- Other GUIs
- Your Application
Especially test careful data with umlauts (öäüÄÖÜß), foreign Characters (Turkish, Cyrillic, CJK Characters) and Emojis. Good Candidates are: Lastname, City or free text fields like Comment.
Some aid you also can get from here:
MySQL Pump
mysqlpump — A Database Backup Program
This utility is currently not available for MariaDB. If works for MySQL 5.7 and newer and does NOT support MySQL 5.6. So we cannot use it for the Upgrade Process from MySQL 5.6 to 5.7. Newest MySQL Releases contain Bug fixes and even new Features in mysqlpump so we can assume it is still supported and actively maintained. Recent releases contain some fixes for trivial bugs so we can assume mysqlpump is not widely used yet and not as mature yet as mysqldump. An alternative product would be MyDumper from Domas@Facebook (Lauchpad, GitHub).
Interesting features are:
- Parallel dumping of databases.
- Secondary Index restore separated from Table Restore.
Missing features are:
- Does not support MariaDB and MySQL 5.6 and earlier.
- No binary log position is stored with backup (--master-data) but only GTID? So we have to use/enable GTID based Replication.
First test:
# mysqlpump --user=root --all-databases --single-transaction --triggers --routines --events --hex-blob > /tmp/mysqlpump_backup.sql Dump progress: 1/4 tables, 0/87 rows Dump progress: 41/46 tables, 789469/6956116 rows Dump progress: 41/46 tables, 1608219/6956116 rows Dump progress: 45/46 tables, 2358412/6956116 rows Dump progress: 45/46 tables, 3437912/6956116 rows Dump progress: 45/46 tables, 4493662/6956116 rows Dump progress: 45/46 tables, 5177662/6956116 rows Dump progress: 45/46 tables, 5568662/6956116 rows Dump progress: 45/46 tables, 6216162/6956116 rows Dump progress: 45/46 tables, 6716662/6956116 rows Dump completed in 9860 milliseconds ALTER TABLE `foodmart`.`account` ADD KEY `i_account_parent` (`account_parent_id`); ALTER TABLE `foodmart`.`category` ADD KEY `i_category_parent` (`category_parent_id`); ALTER TABLE `foodmart`.`employee` ADD KEY `i_employee_department_id` (`department_id`); ALTER TABLE `foodmart`.`employee` ADD KEY `i_employee_store_id` (`store_id`); ALTER TABLE `foodmart`.`employee` ADD KEY `i_employee_supervisor_id` (`supervisor_id`); ALTER TABLE `foodmart`.`expense_fact` ADD KEY `i_expense_store_id` (`store_id`); ALTER TABLE `foodmart`.`expense_fact` ADD KEY `i_expense_account_id` (`account_id`); ALTER TABLE `foodmart`.`expense_fact` ADD KEY `i_expense_time_id` (`time_id`);Test against MySQL 5.6:
Server version is not compatible. Server version should be 5.7.8 or above.MySQL Master/Slave Replication for Character Set conversion
Replication of Columns Having Different Data Types should work since MySQL 5.1.21. General rule when Statement Based Replication is used: If the query would work executed directly on the Slave it should also work on the Slave with Replication! So utf8mb4 on the Slave should also be possible with SBR. In MySQL 5.5.3 Attribute Promotion and Attribute Demotion was introduced with LOSSY and NON_LOSSY slave_type_conversions. But: Replication between columns using different character sets is not supported!
Some tests:
MasterSQL> INSERT INTO test VALUES (NULL, 'öäüÄÖÜß', NULL); MasterSQL> SELECT * FROM test; +----+----------------+---------------------+ | id | data | ts | +----+----------------+---------------------+ | 1 | öäüÄÖÜß | 2019-08-15 19:48:48 | | 2 | öäüÄÖÜß | 2019-08-15 19:50:00 | | 3 | öäüÄÖÜß | 2019-08-15 19:52:29 | +----+----------------+---------------------+ SlaveSQL> SHOW SLAVE STATUS\G ... Last_SQL_Errno: 1677 Last_SQL_Error: Column 1 of table 'test.test' cannot be converted from type 'varchar(255(bytes))' to type 'varchar(1020(bytes) utf8mb4)' ... SlaveSQL> SELECT * FROM test; +----+----------------+---------------------+ | id | data | ts | +----+----------------+---------------------+ | 1 | öäüÄÖÜß | 2019-08-15 19:48:48 | | 2 | öäüÄÖÜß | 2019-08-15 19:50:00 | +----+----------------+---------------------+See also: Why is varchar(255) not varchar(255)? And:
MasterSQL> SET SESSION binlog_format=STATEMENT; MasterSQL> INSERT INTO test VALUES (NULL, 'öäüÄÖÜß', NULL); SlaveSQL> SELECT * FROM test; +----+----------------+---------------------+ | id | data | ts | +----+----------------+---------------------+ | 1 | öäüÄÖÜß | 2019-08-15 19:48:48 | | 2 | öäüÄÖÜß | 2019-08-15 19:50:00 | | 4 | öäüÄÖÜß | 2019-08-15 19:54:16 | +----+----------------+---------------------+MySQL Shell, mysqlsh Upgrade Checker Utility
Since MySQL 5.7 we have a new MySQL Utility, the MySQL Shell (mysqlsh). The Upgrade Checker Utility in MySQL Shell will replace mysql_upgrade in MySQL 8.0. So we wanted to know if it is of any use for converting Character Set to utf8mb4? You can use the Upgrade Checker Utility to check MySQL 5.7 server instances for compatibility errors and issues for upgrading. But: The Upgrade Checker Utility does not support checking MySQL Server instances 5.6 an earlier. MySQL Shell 8.0 is highly recommended for use with MySQL Server 8.0 and 5.7. Please upgrade to MySQL Shell 8.0. [ Lit. ]
# ./mysqlsh root@localhost:3319 JS > util.checkForServerUpgrade() The MySQL server at localhost:3319, version 5.6.42-log - MySQL Community Server (GPL), will now be checked for compatibility issues for upgrade to MySQL 8.0.17... Util.checkForServerUpgrade: Detected MySQL server version is 5.6.42, but this tool requires server to be at least at version 5.7 (LogicError) JS > util.checkForServerUpgrade('root@127.0.0.1:3309') The MySQL server at localhost:3309, version 5.7.26-log - MySQL Community Server (GPL), will now be checked for compatibility issues for upgrade to MySQL 8.0.17... ... 2) Usage of db objects with names conflicting with reserved keywords in 8.0 No issues found 3) Usage of utf8mb3 charset Warning: The following objects use the utf8mb3 character set. It is recommended to convert them to use utf8mb4 instead, for improved Unicode support. More information: https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html test.test.data - column's default character set: utf8 4) Table names in the mysql schema conflicting with new tables in 8.0 No issues found ... Errors: 0 Warnings: 1 Notices: 0 No fatal errors were found that would prevent an upgrade, but some potential issues were detected. Please ensure that the reported issues are not significant before upgrading. JS > util.checkForServerUpgrade('root@127.0.0.1:3309', {'targetVersion':'5.7.27'}) The MySQL server at 127.0.0.1:3309, version 5.7.26-log - MySQL Community Server (GPL), will now be checked for compatibility issues for upgrade to MySQL 5.7.27... Util.checkForServerUpgrade: This tool supports checking upgrade to MySQL server versions 8.0.11 to 8.0.17 (LogicError)Because latin1 tables are still supported in MySQL 8.0 the Upgrade Checker Utility does not complain or do anything for latin1 tables. So it is not of any use for the utf8mb4 conversion task except you already have some old utf8(mb3) tables.
Taxonomy upgrade extras: character setconversionupgradelatin1utf8utf8mb4replicationmysqlshmysqlpumpMariaDB and MySQL Character Set Conversion
- Introduction
- Character Sets
- Steps to convert Character Set to utf8mb4
- MySQL Pump
- MySQL Master/Slave Replication for Character Set conversion
- MySQL Shell, mysqlsh Upgrade Checker Utility
Recently we had a consulting engagement where we had to help the customer to migrate from latin1 Character Set to utf8mb4 Character Set. In the same MySQL consulting engagement we considered to upgrade from MySQL 5.6 to MySQL 5.7 as well [ Lit. ]. We decided to split the change in 2 parts: Upgrading to 5.7 in the first step and converting to uft8mb4 in the second step. There were various reasons for this decision:
- 2 smaller changes are easier to control then one big shot.
- We assume that in 5.7 we experience less problems with utf8mb4 because the trend given by MySQL was more towards utf8mb4 in 5.7 than in MySQL 5.6. So we hope to hit less problems and bugs.
For Upgrading see also MariaDB and MySQL Upgrade Problems
Remark: It makes possibly also sens to think about Collations before starting with the conversion!
Character SetsHistorically MariaDB and MySQL had the default Character Set latin1 (Latin-1 or ISO-8859-1) which was sufficient for most of the western hemisphere.
But as technology spreads and demands increase other cultures want to have their characters represented understandably as well. So Unicode standard was invented. And MariaDB and MySQL applied this standard as well.
The original MariaDB/MySQL utf8(mb3) implementation was not perfect or complete so they implemented utf8mb4 as a super set of utf8(mb3). So at least since MariaDB/MySQL version 5.5 latin1, utf8 and utf8mb4 are available. The current MySQL 5.7 utf8mb4 implementation should cover Unicode 9.0.0:
SQL> SELECT * FROM information_schema.character_sets WHERE character_set_name LIKE 'utf8%' OR character_set_name = 'latin1'; +--------------------+----------------------+----------------------+--------+ | CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION | MAXLEN | +--------------------+----------------------+----------------------+--------+ | latin1 | latin1_swedish_ci | cp1252 West European | 1 | | utf8 | utf8_general_ci | UTF-8 Unicode | 3 | | utf8mb4 | utf8mb4_general_ci | UTF-8 Unicode | 4 | +--------------------+----------------------+----------------------+--------+The default Character Set up to MariaDB 10.4 and MySQL 5.7 was latin1. In MySQL 8.0 the default Character Set has changed to utf8mb4. There are no signs so far that MariaDB will take the same step:
SQL> status -------------- mysql Ver 8.0.16 for linux-glibc2.12 on x86_64 (MySQL Community Server - GPL) Connection id: 84 Current database: Current user: root@localhost SSL: Not in use Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 8.0.16 MySQL Community Server - GPL Protocol version: 10 Connection: Localhost via UNIX socket Server characterset: utf8mb4 Db characterset: utf8mb4 Client characterset: utf8mb4 Conn. characterset: utf8mb4 UNIX socket: /var/run/mysqld/mysql-3332.sock Uptime: 3 days 47 min 13 secSo we see a general trend from latin1 to utf8(mb3) to utf8mb4 technically and business wise (aka globalization).
For the DBA this means sooner or later we have to think about a conversion of all tables of the whole database instance (all tables of all schemata) to utf8mb4!
Steps to convert Character Set to utf8mb4 Analyzing the ServerFirst of all one should analyze the system (O/S, database instance and client/application). On the server we can run the following command to verify the actual used and supported Character Set:
# locale LANG=en_GB.UTF-8 LANGUAGE= LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL=On the MariaDB/MySQL database instance we check the current server configuration and the session configuration with the following commands:
SQL> SHOW SESSION VARIABLES WHERE Variable_name LIKE 'character_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+-------------------+ | Variable_name | Value | +--------------------------+-------------------+ | character_set_client | utf8 | | character_set_connection | utf8 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | utf8 | | character_set_server | latin1 | | character_set_system | utf8 | | collation_connection | utf8_general_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +--------------------------+-------------------+ SQL> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'character_set\_%' OR Variable_name LIKE 'collation%'; +--------------------------+-------------------+ | Variable_name | Value | +--------------------------+-------------------+ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results | latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | collation_connection | latin1_swedish_ci | | collation_database | latin1_swedish_ci | | collation_server | latin1_swedish_ci | +--------------------------+-------------------+These configuration variables are for Client/Server communication: character_set_client, character_set_connection and character_set_results. These for Server configuration: character_set_server and character_set_database (deprecated in MySQL 5.7). And these for System internals and File System access: character_set_system and character_set_filesystem.
Sometimes we see customers using the Logon Trigger init_connect to force clients for a specific Character Set:
SQL> SHOW GLOBAL VARIABLES LIKE 'init_connect'; +---------------+------------------+ | Variable_name | Value | +---------------+------------------+ | init_connect | SET NAMES latin1 | +---------------+------------------+The SET NAMES command sets the character_set_client, character_set_connection and character_set_results session variables. [ Lit. ]
Analyzing the Application and the ClientsSimilar steps to analyze the Application and Clients should be taken. We want to answer the following questions:
- Support of utf8 of Application/Client O/S (Windows)?
- Support of utf8 of Web Server (Apache (AddDefaultCharset utf-8), Nginx, IIS, ...)
- Version of programming language (Java, PHP (5.4 and newer?), ...)
- Version of MariaDB and MySQL Connectors (JDBC (5.1.47 and newer?), ODBC (5.3.11 and newer?), mysqli/mysqlnd (⋝7.0.19?, ⋝7.1.5?), ...)
- Application code (header('Content-Type: text/html; charset=utf-8');, <?xml version="1.0" encoding="UTF-8"?>, <meta http-equiv="Content-Type" content="text/html; charset=utf-8">, <form accept-charset="utf-8">, htmlspecialchars($str, ENT_NOQUOTES, "UTF-8"), $mysqli->set_charset('utf8mb4');, mbstring [ Lit. ], etc.
See also: Configuring Application Character Set and Collation
If you do not have your Application under control your DBA can help you to find out with the General Query Log (SET GLOBAL general_log = 1;) what is going on:
190815 19:03:00 12 Connect root@localhost on 12 Query select @@version_comment limit 1 12 Query SET NAMES latin1 12 Query SET NAMES utf8 190815 19:05:24 12 Quitor with some queries on the PERFORMANCE_SCHEMA:
-- Works since MySQL 5.6/MariaDB 10.0 SQL> SELECT t.thread_id, t.processlist_id, t.processlist_user, t.processlist_host, t.processlist_db , sca.attr_name, sca.attr_value FROM performance_schema.threads AS t JOIN performance_schema.session_connect_attrs AS sca ON sca.processlist_id = t.processlist_id WHERE t.processlist_user IS NOT NULL AND t.thread_id = 103 ; +-----------+----------------+------------------+------------------+----------------+----------------------------------+---------------------+ | thread_id | processlist_id | processlist_user | processlist_host | processlist_db | attr_name | attr_value | +-----------+----------------+------------------+------------------+----------------+----------------------------------+---------------------+ | 103 | 78 | replication | localhost | NULL | _os | linux-glibc2.12 | | 103 | 78 | replication | localhost | NULL | _client_name | libmysql | | 103 | 78 | replication | localhost | NULL | _pid | 29269 | | 103 | 78 | replication | localhost | NULL | program_name | mysqld | | 103 | 78 | replication | localhost | NULL | _platform | x86_64 | | 103 | 78 | replication | localhost | NULL | _client_replication_channel_name | NULL | | 103 | 78 | replication | localhost | NULL | _client_role | binary_log_listener | | 103 | 78 | replication | localhost | NULL | _client_version | 5.7.26 | +-----------+----------------+------------------+------------------+----------------+----------------------------------+---------------------+ -- Works since MySQL 5.7 only SELECT t.thread_id, t.processlist_id, t.processlist_user, t.processlist_host, t.processlist_db , vbt.variable_name, vbt.variable_value FROM performance_schema.threads AS t JOIN performance_schema.variables_by_thread AS vbt ON vbt.thread_id = t.thread_id WHERE t.processlist_user IS NOT NULL AND (vbt.variable_name like 'charac%' OR vbt.variable_name LIKE 'coll%') AND t.thread_id = 103 ; +-----------+----------------+------------------+------------------+----------------+--------------------------+-------------------+ | thread_id | processlist_id | processlist_user | processlist_host | processlist_db | variable_name | variable_value | +-----------+----------------+------------------+------------------+----------------+--------------------------+-------------------+ | 103 | 78 | replication | localhost | NULL | character_set_client | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_connection | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_database | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_filesystem | binary | | 103 | 78 | replication | localhost | NULL | character_set_results | latin1 | | 103 | 78 | replication | localhost | NULL | character_set_server | latin1 | | 103 | 78 | replication | localhost | NULL | collation_connection | latin1_swedish_ci | | 103 | 78 | replication | localhost | NULL | collation_database | latin1_swedish_ci | | 103 | 78 | replication | localhost | NULL | collation_server | latin1_swedish_ci | +-----------+----------------+------------------+------------------+----------------+--------------------------+-------------------+Preparation of the Server Settings and the Application
To have a better control of the impact of some changes we decided to do some changes on the Application first:
- Application is setting the Character Set properly itself ($mysqli->set_charset('utf8mb4') [ Lit. ]). In the same step also sql_mode can be set by the application so we can use the defaults on server side in the future.
- Apache and PHP are configured to support UTF-8.
- After this step init_connect, character_set_server and character_set_database can be changed to utf8mb4 on the Server and --skip-character-set-client-handshake can be removed at the same time [ Lit. ]
First we checked and converted the default Character Set of the Schemata/Databases:
SQL> SELECT schema_name, default_character_set_name, default_collation_name FROM information_schema.schemata WHERE schema_name NOT IN ('mysql', 'information_schema', 'performance_schema', 'sys') ; +-------------+----------------------------+------------------------+ | schema_name | default_character_set_name | default_collation_name | +-------------+----------------------------+------------------------+ | focmm | latin1 | latin1_swedish_ci | | test | latin1 | latin1_swedish_ci | | foodmart | latin1 | latin1_swedish_ci | | erp | latin1 | latin1_swedish_ci | | world | latin1 | latin1_swedish_ci | +-------------+----------------------------+------------------------+Converting the Schemata is done as follows:
SQL> ALTER DATABASE ... DEFAULT CHARACTER SET utf8mb4;which is a fast operation.
To convert the tables there are many different possibilities we considered:
- The easy one: ALTER TABLE ... CONVERT TO CHARACTER SET utf8mb4;
- The possibly faster one: dump/restore with sed 's/DEFAULT CHARSET=latin1/DEFAULT CHARSET=utf8mb4/'
- The possibly fastest one: drop Secondary Indexes, dump/restore with sed, create Secondary Indexes again (mysqlpump).
- The automated one: With Perconas pt-online-schema-change [ Lit. ] or Facebooks OnlineSchemaChange OSC [ Lit. ]
- The most elegant but not supported one: Master/Slave Replication.
Already with converting the first table we run into a problem:
SQL> ALTER TABLE test CONVERT TO CHARACTER SET utf8mb4; ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.This table had a Primary Key of a length of more than 255 bytes and this is not possible with the old InnoDB Antelope File Format:
SQL> SHOW GLOBAL VARIABLES WHERE Variable_name LIKE 'innodb_file_format%' OR Variable_name LIKE 'innodb_large%'; +--------------------------+----------+ | Variable_name | Value | +--------------------------+----------+ | innodb_file_format | Antelope | | innodb_file_format_check | ON | | innodb_file_format_max | Antelope | | innodb_large_prefix | OFF | +--------------------------+----------+So we have to find out first, which tables are still in old Antelope File Format:
SQL> SELECT table_schema , CASE WHEN row_format = 'Dynamic' THEN 'Barracuda' WHEN row_format = 'Compressed' THEN 'Barracuda' WHEN row_format = 'Compact' THEN 'Antelope' WHEN row_format = 'Redundant' THEN 'Antelope' END AS 'file_format' , COUNT(*) FROM information_schema.tables WHERE engine = 'InnoDB' AND table_schema NOT IN ('information_schema', 'sys', 'mysql') GROUP BY table_schema, file_format ; +--------------+-------------+----------+ | table_schema | file_format | count(*) | +--------------+-------------+----------+ | foodmart | Barracuda | 23 | | test | Barracuda | 1 | | world | Antelope | 2 | | world | Barracuda | 1 | +--------------+-------------+----------+Then we could covert the table correctly forcing the Barracuda File format:
SQL> ALTER TABLE test CONVERT TO CHARACTER SET utf8mb4, ROW_FORMAT=dynamic;Testing of new Character Set
The last but most important step is to test the changes. Here we recommend to do as many as possible different tests:
- MySQL CLI: mysql
- phpMyAdmin
- MySQL Workbench
- Other GUIs
- Your Application
Especially test careful data with umlauts (öäüÄÖÜß), foreign Characters (Turkish, Cyrillic, CJK Characters) and Emojis. Good Candidates are: Lastname, City or free text fields like Comment.
Some aid you also can get from here:
MySQL Pump
mysqlpump — A Database Backup Program
This utility is currently not available for MariaDB. If works for MySQL 5.7 and newer and does NOT support MySQL 5.6. So we cannot use it for the Upgrade Process from MySQL 5.6 to 5.7. Newest MySQL Releases contain Bug fixes and even new Features in mysqlpump so we can assume it is still supported and actively maintained. Recent releases contain some fixes for trivial bugs so we can assume mysqlpump is not widely used yet and not as mature yet as mysqldump. An alternative product would be MyDumper from Domas@Facebook (Lauchpad, GitHub).
Interesting features are:
- Parallel dumping of databases.
- Secondary Index restore separated from Table Restore.
Missing features are:
- Does not support MariaDB and MySQL 5.6 and earlier.
- No binary log position is stored with backup (--master-data) but only GTID? So we have to use/enable GTID based Replication.
First test:
# mysqlpump --user=root --all-databases --single-transaction --triggers --routines --events --hex-blob > /tmp/mysqlpump_backup.sql Dump progress: 1/4 tables, 0/87 rows Dump progress: 41/46 tables, 789469/6956116 rows Dump progress: 41/46 tables, 1608219/6956116 rows Dump progress: 45/46 tables, 2358412/6956116 rows Dump progress: 45/46 tables, 3437912/6956116 rows Dump progress: 45/46 tables, 4493662/6956116 rows Dump progress: 45/46 tables, 5177662/6956116 rows Dump progress: 45/46 tables, 5568662/6956116 rows Dump progress: 45/46 tables, 6216162/6956116 rows Dump progress: 45/46 tables, 6716662/6956116 rows Dump completed in 9860 milliseconds ALTER TABLE `foodmart`.`account` ADD KEY `i_account_parent` (`account_parent_id`); ALTER TABLE `foodmart`.`category` ADD KEY `i_category_parent` (`category_parent_id`); ALTER TABLE `foodmart`.`employee` ADD KEY `i_employee_department_id` (`department_id`); ALTER TABLE `foodmart`.`employee` ADD KEY `i_employee_store_id` (`store_id`); ALTER TABLE `foodmart`.`employee` ADD KEY `i_employee_supervisor_id` (`supervisor_id`); ALTER TABLE `foodmart`.`expense_fact` ADD KEY `i_expense_store_id` (`store_id`); ALTER TABLE `foodmart`.`expense_fact` ADD KEY `i_expense_account_id` (`account_id`); ALTER TABLE `foodmart`.`expense_fact` ADD KEY `i_expense_time_id` (`time_id`);Test against MySQL 5.6:
Server version is not compatible. Server version should be 5.7.8 or above.MySQL Master/Slave Replication for Character Set conversion
Replication of Columns Having Different Data Types should work since MySQL 5.1.21. General rule when Statement Based Replication is used: If the query would work executed directly on the Slave it should also work on the Slave with Replication! So utf8mb4 on the Slave should also be possible with SBR. In MySQL 5.5.3 Attribute Promotion and Attribute Demotion was introduced with LOSSY and NON_LOSSY slave_type_conversions. But: Replication between columns using different character sets is not supported!
Some tests:
MasterSQL> INSERT INTO test VALUES (NULL, 'öäüÄÖÜß', NULL); MasterSQL> SELECT * FROM test; +----+----------------+---------------------+ | id | data | ts | +----+----------------+---------------------+ | 1 | öäüÄÖÜß | 2019-08-15 19:48:48 | | 2 | öäüÄÖÜß | 2019-08-15 19:50:00 | | 3 | öäüÄÖÜß | 2019-08-15 19:52:29 | +----+----------------+---------------------+ SlaveSQL> SHOW SLAVE STATUS\G ... Last_SQL_Errno: 1677 Last_SQL_Error: Column 1 of table 'test.test' cannot be converted from type 'varchar(255(bytes))' to type 'varchar(1020(bytes) utf8mb4)' ... SlaveSQL> SELECT * FROM test; +----+----------------+---------------------+ | id | data | ts | +----+----------------+---------------------+ | 1 | öäüÄÖÜß | 2019-08-15 19:48:48 | | 2 | öäüÄÖÜß | 2019-08-15 19:50:00 | +----+----------------+---------------------+See also: Why is varchar(255) not varchar(255)? And:
MasterSQL> SET SESSION binlog_format=STATEMENT; MasterSQL> INSERT INTO test VALUES (NULL, 'öäüÄÖÜß', NULL); SlaveSQL> SELECT * FROM test; +----+----------------+---------------------+ | id | data | ts | +----+----------------+---------------------+ | 1 | öäüÄÖÜß | 2019-08-15 19:48:48 | | 2 | öäüÄÖÜß | 2019-08-15 19:50:00 | | 4 | öäüÄÖÜß | 2019-08-15 19:54:16 | +----+----------------+---------------------+MySQL Shell, mysqlsh Upgrade Checker Utility
Since MySQL 5.7 we have a new MySQL Utility, the MySQL Shell (mysqlsh). The Upgrade Checker Utility in MySQL Shell will replace mysql_upgrade in MySQL 8.0. So we wanted to know if it is of any use for converting Character Set to utf8mb4? You can use the Upgrade Checker Utility to check MySQL 5.7 server instances for compatibility errors and issues for upgrading. But: The Upgrade Checker Utility does not support checking MySQL Server instances 5.6 an earlier. MySQL Shell 8.0 is highly recommended for use with MySQL Server 8.0 and 5.7. Please upgrade to MySQL Shell 8.0. [ Lit. ]
# ./mysqlsh root@localhost:3319 JS > util.checkForServerUpgrade() The MySQL server at localhost:3319, version 5.6.42-log - MySQL Community Server (GPL), will now be checked for compatibility issues for upgrade to MySQL 8.0.17... Util.checkForServerUpgrade: Detected MySQL server version is 5.6.42, but this tool requires server to be at least at version 5.7 (LogicError) JS > util.checkForServerUpgrade('root@127.0.0.1:3309') The MySQL server at localhost:3309, version 5.7.26-log - MySQL Community Server (GPL), will now be checked for compatibility issues for upgrade to MySQL 8.0.17... ... 2) Usage of db objects with names conflicting with reserved keywords in 8.0 No issues found 3) Usage of utf8mb3 charset Warning: The following objects use the utf8mb3 character set. It is recommended to convert them to use utf8mb4 instead, for improved Unicode support. More information: https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8mb3.html test.test.data - column's default character set: utf8 4) Table names in the mysql schema conflicting with new tables in 8.0 No issues found ... Errors: 0 Warnings: 1 Notices: 0 No fatal errors were found that would prevent an upgrade, but some potential issues were detected. Please ensure that the reported issues are not significant before upgrading. JS > util.checkForServerUpgrade('root@127.0.0.1:3309', {'targetVersion':'5.7.27'}) The MySQL server at 127.0.0.1:3309, version 5.7.26-log - MySQL Community Server (GPL), will now be checked for compatibility issues for upgrade to MySQL 5.7.27... Util.checkForServerUpgrade: This tool supports checking upgrade to MySQL server versions 8.0.11 to 8.0.17 (LogicError)Because latin1 tables are still supported in MySQL 8.0 the Upgrade Checker Utility does not complain or do anything for latin1 tables. So it is not of any use for the utf8mb4 conversion task except you already have some old utf8(mb3) tables.
Taxonomy upgrade extras: character setconversionupgradelatin1utf8utf8mb4replicationmysqlshmysqlpumpFromDual Ops Center for MariaDB and MySQL 0.9.2 has been released
FromDual has the pleasure to announce the release of the new version 0.9.2 of its popular FromDual Ops Center for MariaDB and MySQL focmm.
The FromDual Ops Center for MariaDB and MySQL (focmm) helps DBA's and System Administrators to manage MariaDB and MySQL database farms. Ops Center makes DBA and Admins life easier!
The main task of Ops Center is to support you in your daily MySQL and MariaDB operation tasks. More information about FromDual Ops Center you can find here.
DownloadThe new FromDual Ops Center for MariaDB and MySQL (focmm) can be downloaded from here. How to install and use focmm is documented in the Ops Center User Guide.
In the inconceivable case that you find a bug in the FromDual Ops Center for MariaDB and MySQL please report it to the FromDual bug tracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Installation of Ops Center 0.9.2A complete guide on how to install FromDual Ops Center you can find in the Ops Center User Guide.
Upgrade from 0.9.x to 0.9.2Upgrade from 0.9.x to 0.9.2 should happen automatically. Please do a backup of your Ops Center Instance before you upgrade! Please also check Upgrading.
Changes in Ops Center 0.9.2 Instance- Replication: Mariadb 10.4 problem with mysql.user table fixed.
- Restore: Clean-up work done.
- Restore: Point-in-Time-Recovery is implemented.
- Operations: Binary Log state added.
- Potential XSS hole fixed.
- Bootstrap CSS updated to 4.3.1
- Newest MyEnv Library added.
- Function array_pop replaced by array_shift.
- Tests against MariaDB 10.4 were successful. MariaDB 10.4 is officially supported now.
- Dependency bug in Debian control file fixed.
FromDual Ops Center for MariaDB and MySQL 0.9.2 has been released
FromDual has the pleasure to announce the release of the new version 0.9.2 of its popular FromDual Ops Center for MariaDB and MySQL focmm.
The FromDual Ops Center for MariaDB and MySQL (focmm) helps DBA's and System Administrators to manage MariaDB and MySQL database farms. Ops Center makes DBA and Admins life easier!
The main task of Ops Center is to support you in your daily MySQL and MariaDB operation tasks. More information about FromDual Ops Center you can find here.
DownloadThe new FromDual Ops Center for MariaDB and MySQL (focmm) can be downloaded from here. How to install and use focmm is documented in the Ops Center User Guide.
In the inconceivable case that you find a bug in the FromDual Ops Center for MariaDB and MySQL please report it to the FromDual bug tracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Installation of Ops Center 0.9.2A complete guide on how to install FromDual Ops Center you can find in the Ops Center User Guide.
Upgrade from 0.9.x to 0.9.2Upgrade from 0.9.x to 0.9.2 should happen automatically. Please do a backup of your Ops Center Instance before you upgrade! Please also check Upgrading.
Changes in Ops Center 0.9.2 Instance- Replication: Mariadb 10.4 problem with mysql.user table fixed.
- Restore: Clean-up work done.
- Restore: Point-in-Time-Recovery is implemented.
- Operations: Binary Log state added.
- Potential XSS hole fixed.
- Bootstrap CSS updated to 4.3.1
- Newest MyEnv Library added.
- Function array_pop replaced by array_shift.
- Tests against MariaDB 10.4 were successful. MariaDB 10.4 is officially supported now.
- Dependency bug in Debian control file fixed.
FromDual Ops Center for MariaDB and MySQL 0.9.2 has been released
FromDual has the pleasure to announce the release of the new version 0.9.2 of its popular FromDual Ops Center for MariaDB and MySQL focmm.
The FromDual Ops Center for MariaDB and MySQL (focmm) helps DBA's and System Administrators to manage MariaDB and MySQL database farms. Ops Center makes DBA and Admins life easier!
The main task of Ops Center is to support you in your daily MySQL and MariaDB operation tasks. More information about FromDual Ops Center you can find here.
DownloadThe new FromDual Ops Center for MariaDB and MySQL (focmm) can be downloaded from here. How to install and use focmm is documented in the Ops Center User Guide.
In the inconceivable case that you find a bug in the FromDual Ops Center for MariaDB and MySQL please report it to the FromDual bug tracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Installation of Ops Center 0.9.2A complete guide on how to install FromDual Ops Center you can find in the Ops Center User Guide.
Upgrade from 0.9.x to 0.9.2Upgrade from 0.9.x to 0.9.2 should happen automatically. Please do a backup of your Ops Center Instance before you upgrade! Please also check Upgrading.
Changes in Ops Center 0.9.2 Instance- Replication: Mariadb 10.4 problem with mysql.user table fixed.
- Restore: Clean-up work done.
- Restore: Point-in-Time-Recovery is implemented.
- Operations: Binary Log state added.
- Potential XSS hole fixed.
- Bootstrap CSS updated to 4.3.1
- Newest MyEnv Library added.
- Function array_pop replaced by array_shift.
- Tests against MariaDB 10.4 were successful. MariaDB 10.4 is officially supported now.
- Dependency bug in Debian control file fixed.
FromDual Backup and Recovery Manager for MariaDB and MySQL 2.2.1 has been released
FromDual has the pleasure to announce the release of the new version 2.2.1 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).
The new FromDual Backup and Recovery Manager can be downloaded from here. The FromDual Repositories were updated. How to install and use the Backup and Recovery Manager is describe in FromDual Backup and Recovery Manager (brman) installation guide.
In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 2.x to 2.2.1 shell> cd ${HOME}/product shell> tar xf /download/brman-2.2.1.tar.gz shell> rm -f brman shell> ln -s brman-2.2.1 brmanChanges in FromDual Backup and Recovery Manager 2.2.1
This release is a new minor release. It contains only bug fixes. We have tried to maintain backward-compatibility with the 1.2, 2.0 and 2.1 release series. But you should test the new release seriously!
You can verify your current FromDual Backup Manager version with the following command:
shell> fromdual_bman --version shell> bman --version shell> rman --versionGeneral
- No changes.
FromDual Backup Manager
- Workaround for MariaDB binlog purge problem added do binlog backup.
FromDual Recovery Manager
- PiTR did not find binary logs in case they were from a foreign instance.
FromDual brman Catalog
- No changes.
Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.
Taxonomy upgrade extras: BackupRestoreRecoverypitrbrmanreleasebmanrmanFromDual Backup and Recovery ManagerFromDual Backup and Recovery Manager for MariaDB and MySQL 2.2.1 has been released
FromDual has the pleasure to announce the release of the new version 2.2.1 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).
The new FromDual Backup and Recovery Manager can be downloaded from here. The FromDual Repositories were updated. How to install and use the Backup and Recovery Manager is describe in FromDual Backup and Recovery Manager (brman) installation guide.
In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 2.x to 2.2.1 shell> cd ${HOME}/product shell> tar xf /download/brman-2.2.1.tar.gz shell> rm -f brman shell> ln -s brman-2.2.1 brmanChanges in FromDual Backup and Recovery Manager 2.2.1
This release is a new minor release. It contains only bug fixes. We have tried to maintain backward-compatibility with the 1.2, 2.0 and 2.1 release series. But you should test the new release seriously!
You can verify your current FromDual Backup Manager version with the following command:
shell> fromdual_bman --version shell> bman --version shell> rman --versionGeneral
- No changes.
FromDual Backup Manager
- Workaround for MariaDB binlog purge problem added do binlog backup.
FromDual Recovery Manager
- PiTR did not find binary logs in case they were from a foreign instance.
FromDual brman Catalog
- No changes.
Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.
Taxonomy upgrade extras: BackupRestoreRecoverypitrbrmanreleasebmanrmanFromDual Backup and Recovery ManagerFromDual Backup and Recovery Manager for MariaDB and MySQL 2.2.1 has been released
FromDual has the pleasure to announce the release of the new version 2.2.1 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).
The new FromDual Backup and Recovery Manager can be downloaded from here. The FromDual Repositories were updated. How to install and use the Backup and Recovery Manager is describe in FromDual Backup and Recovery Manager (brman) installation guide.
In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 2.x to 2.2.1 shell> cd ${HOME}/product shell> tar xf /download/brman-2.2.1.tar.gz shell> rm -f brman shell> ln -s brman-2.2.1 brmanChanges in FromDual Backup and Recovery Manager 2.2.1
This release is a new minor release. It contains only bug fixes. We have tried to maintain backward-compatibility with the 1.2, 2.0 and 2.1 release series. But you should test the new release seriously!
You can verify your current FromDual Backup Manager version with the following command:
shell> fromdual_bman --version shell> bman --version shell> rman --versionGeneral
- No changes.
FromDual Backup Manager
- Workaround for MariaDB binlog purge problem added do binlog backup.
FromDual Recovery Manager
- PiTR did not find binary logs in case they were from a foreign instance.
FromDual brman Catalog
- No changes.
Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.
Taxonomy upgrade extras: BackupRestoreRecoverypitrbrmanreleasebmanrmanFromDual Backup and Recovery ManagerFromDual Backup and Recovery Manager for MariaDB and MySQL 2.2.0 has been released
FromDual has the pleasure to announce the release of the new version 2.2.0 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).
The new FromDual Backup and Recovery Manager can be downloaded from here. The FromDual Repositories were updated. How to install and use the Backup and Recovery Manager is describe in FromDual Backup and Recovery Manager (brman) installation guide.
In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 2.x to 2.2.0 shell> cd ${HOME}/product shell> tar xf /download/brman-2.2.0.tar.gz shell> rm -f brman shell> ln -s brman-2.2.0 brmanChanges in FromDual Backup and Recovery Manager 2.2.0
This release is a new major release series. It contains some new features. We have tried to maintain backward-compatibility with the 1.2, 2.0 and 2.1 release series. But you should test the new release seriously!
You can verify your current FromDual Backup Manager version with the following command:
shell> fromdual_bman --version shell> bman --version shell> rman --versionGeneral
- Automated testing improved.
- Packaging optimized. Especially unnecessary dependencies removed.
- Code redundancy removed, made code more readable and partially code reorganized.
FromDual Backup Manager
- Requirement checks improved. Warnings do not spoil rc any more.
- Bug with --pass-through and --per-schema fixed (MGB).
- Backup size is now correctly reported to fpmmm. And message is written to the log.
- Timestamp format was wrong in start backup tag.
- Globals removed from function doPrivilegeBackup.
- Columns of table mysql.user quoted for MariaDB 10.4 compatibility.
- Test on fpmmm-cache-file fixed from dir to file.
- Some bugs in function binlogBackup simulation fixed.
- Too verbose debugging disabled.
- Sftp archiving did not fail or complain if directory does not exist.
- Better input validation for option --archivedestination.
- Function checkArguments renamed to checkBmanOptions.
- Wrong error message for servers without binary log removed.
- Skip gathering of binlog file and pos in case of binary log was not enabled.
FromDual Recovery Manager
- Missing --stop-instance option added to usage.
- Omitted option --policy PHP error fixed for logical and physical restore, this allows us to restore from conventional backup infrastructure as well.
- Tables slave_master_info and slave_relay_log_info in mysql schema are omitted in restore progress output.
- Error message for missing --backup-name option improved.
- Error handling and messages improved as preparation for PiTR.
- Binlog restore type removed and check made accordingly that it is not possible any more.
- PiTR pre Recover test implemented.
- Various tests for test automation improved.
- Binlog position discovery possible also for compressed files.
- Execution time information added.
- Option --debug added to rman.
- Point-in-Time-Recovery for logical and physical backup implemented.
- Instance starting for physical backup was implemented.
- Option --simulate was implemented everywhere.
- Option --disable-log-bin added and error handling improved.
- Option --config activated in rman.
- Fast PiTR is implemented.
- Only full physical restore is supported atm. This was not caught correctly.
- Schema PiTR also possible.
FromDual brman Catalog
- No changes.
Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.
Taxonomy upgrade extras: BackupRestoreRecoverypitrbrmanreleasebmanrmanFromDual Backup and Recovery Manager for MariaDB and MySQL 2.2.0 has been released
FromDual has the pleasure to announce the release of the new version 2.2.0 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).
The new FromDual Backup and Recovery Manager can be downloaded from here. The FromDual Repositories were updated. How to install and use the Backup and Recovery Manager is describe in FromDual Backup and Recovery Manager (brman) installation guide.
In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 2.x to 2.2.0 shell> cd ${HOME}/product shell> tar xf /download/brman-2.2.0.tar.gz shell> rm -f brman shell> ln -s brman-2.2.0 brmanChanges in FromDual Backup and Recovery Manager 2.2.0
This release is a new major release series. It contains some new features. We have tried to maintain backward-compatibility with the 1.2, 2.0 and 2.1 release series. But you should test the new release seriously!
You can verify your current FromDual Backup Manager version with the following command:
shell> fromdual_bman --version shell> bman --version shell> rman --versionGeneral
- Automated testing improved.
- Packaging optimized. Especially unnecessary dependencies removed.
- Code redundancy removed, made code more readable and partially code reorganized.
FromDual Backup Manager
- Requirement checks improved. Warnings do not spoil rc any more.
- Bug with --pass-through and --per-schema fixed (MGB).
- Backup size is now correctly reported to fpmmm. And message is written to the log.
- Timestamp format was wrong in start backup tag.
- Globals removed from function doPrivilegeBackup.
- Columns of table mysql.user quoted for MariaDB 10.4 compatibility.
- Test on fpmmm-cache-file fixed from dir to file.
- Some bugs in function binlogBackup simulation fixed.
- Too verbose debugging disabled.
- Sftp archiving did not fail or complain if directory does not exist.
- Better input validation for option --archivedestination.
- Function checkArguments renamed to checkBmanOptions.
- Wrong error message for servers without binary log removed.
- Skip gathering of binlog file and pos in case of binary log was not enabled.
FromDual Recovery Manager
- Missing --stop-instance option added to usage.
- Omitted option --policy PHP error fixed for logical and physical restore, this allows us to restore from conventional backup infrastructure as well.
- Tables slave_master_info and slave_relay_log_info in mysql schema are omitted in restore progress output.
- Error message for missing --backup-name option improved.
- Error handling and messages improved as preparation for PiTR.
- Binlog restore type removed and check made accordingly that it is not possible any more.
- PiTR pre Recover test implemented.
- Various tests for test automation improved.
- Binlog position discovery possible also for compressed files.
- Execution time information added.
- Option --debug added to rman.
- Point-in-Time-Recovery for logical and physical backup implemented.
- Instance starting for physical backup was implemented.
- Option --simulate was implemented everywhere.
- Option --disable-log-bin added and error handling improved.
- Option --config activated in rman.
- Fast PiTR is implemented.
- Only full physical restore is supported atm. This was not caught correctly.
- Schema PiTR also possible.
FromDual brman Catalog
- No changes.
Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.
Taxonomy upgrade extras: BackupRestoreRecoverypitrbrmanreleasebmanrmanFromDual Backup and Recovery Manager for MariaDB and MySQL 2.2.0 has been released
FromDual has the pleasure to announce the release of the new version 2.2.0 of its popular Backup and Recovery Manager for MariaDB and MySQL (brman).
The new FromDual Backup and Recovery Manager can be downloaded from here. The FromDual Repositories were updated. How to install and use the Backup and Recovery Manager is describe in FromDual Backup and Recovery Manager (brman) installation guide.
In the inconceivable case that you find a bug in the FromDual Backup and Recovery Manager please report it to the FromDual Bugtracker or just send us an email.
Any feedback, statements and testimonials are welcome as well! Please send them to feedback@fromdual.com.
Upgrade from 2.x to 2.2.0 shell> cd ${HOME}/product shell> tar xf /download/brman-2.2.0.tar.gz shell> rm -f brman shell> ln -s brman-2.2.0 brmanChanges in FromDual Backup and Recovery Manager 2.2.0
This release is a new major release series. It contains some new features. We have tried to maintain backward-compatibility with the 1.2, 2.0 and 2.1 release series. But you should test the new release seriously!
You can verify your current FromDual Backup Manager version with the following command:
shell> fromdual_bman --version shell> bman --version shell> rman --versionGeneral
- Automated testing improved.
- Packaging optimized. Especially unnecessary dependencies removed.
- Code redundancy removed, made code more readable and partially code reorganized.
FromDual Backup Manager
- Requirement checks improved. Warnings do not spoil rc any more.
- Bug with --pass-through and --per-schema fixed (MGB).
- Backup size is now correctly reported to fpmmm. And message is written to the log.
- Timestamp format was wrong in start backup tag.
- Globals removed from function doPrivilegeBackup.
- Columns of table mysql.user quoted for MariaDB 10.4 compatibility.
- Test on fpmmm-cache-file fixed from dir to file.
- Some bugs in function binlogBackup simulation fixed.
- Too verbose debugging disabled.
- Sftp archiving did not fail or complain if directory does not exist.
- Better input validation for option --archivedestination.
- Function checkArguments renamed to checkBmanOptions.
- Wrong error message for servers without binary log removed.
- Skip gathering of binlog file and pos in case of binary log was not enabled.
FromDual Recovery Manager
- Missing --stop-instance option added to usage.
- Omitted option --policy PHP error fixed for logical and physical restore, this allows us to restore from conventional backup infrastructure as well.
- Tables slave_master_info and slave_relay_log_info in mysql schema are omitted in restore progress output.
- Error message for missing --backup-name option improved.
- Error handling and messages improved as preparation for PiTR.
- Binlog restore type removed and check made accordingly that it is not possible any more.
- PiTR pre Recover test implemented.
- Various tests for test automation improved.
- Binlog position discovery possible also for compressed files.
- Execution time information added.
- Option --debug added to rman.
- Point-in-Time-Recovery for logical and physical backup implemented.
- Instance starting for physical backup was implemented.
- Option --simulate was implemented everywhere.
- Option --disable-log-bin added and error handling improved.
- Option --config activated in rman.
- Fast PiTR is implemented.
- Only full physical restore is supported atm. This was not caught correctly.
- Schema PiTR also possible.
FromDual brman Catalog
- No changes.
Subscriptions for commercial use of FromDual Backup and Recovery Manager you can get from from us.
Taxonomy upgrade extras: BackupRestoreRecoverypitrbrmanreleasebmanrmanSchulung MariaDB/MySQL für Fortgeschrittene im August 2019 in Köln
Sommerferien-Zeit – für all die Daheimgebliebenen dürfte es jetzt hoffentlich etwas ruhiger zu und her gehen. Zeit für eine Weiterbildung? Es bleibt nicht mehr viel Zeit, das Jahres-Schulungs-Budget aufzubrauchen!
Vom 19. bis 23. August führt FromDual wieder einmal die Schulung MariaDB und MySQL für Fortgeschrittene in Köln durch. Siehe auch unsere Schulungstermine.
Es hat noch Plätze frei! Anmelden kann man sich direkt bei unserem Schulungs-Partner, der GFU Cyrus AG.
Diese MariaDB/MySQL Weiterbildung richtet sich an alle DBAs, DevOps und System Administratoren, welche MariaDB und MySQL Datenbanken zu betreuen habe und gerne besser verstehen wollen, wie man das noch besser macht.
In dieser Schulung behandeln wir Backup, Restore und Point-in-Time-Recovery sowohl einer kleinen wie auch einer grossen Datenbank. Aufsetzen von hochverfügbaren MariaDB und MySQL Datenbanken mittels der Master/Slave Replikation sowie dem Galera Cluster inklusive Switch-Over-Techniken. Schliesslich und endlich beschäftigen wir uns auch noch zwei Tage mit Datenbank Performance Tuning (Hardware, O/S, DB Konfiguration, Schema Tuning und SQL Query Tuning, etc.).
Das Ganze ist mir zahlreichen Übungen versehen, damit man das gelernte auch gleich praktisch anwenden kann!
Die Schulung findet in deutscher Sprache statt.
Den detaillierten Inhalt dieser fünftägigen MySQL/MariaDB Schulung finden Sie hier.
Bei weiteren Fragen nehmen Sie bitte mit uns Kontakt auf.
Taxonomy upgrade extras: schulungtraininggaleraBackupRestorepoint-in-time-recoveryreplikationPerformance TuningSchulung MariaDB/MySQL für Fortgeschrittene im August 2019 in Köln
Sommerferien-Zeit – für all die Daheimgebliebenen dürfte es jetzt hoffentlich etwas ruhiger zu und her gehen. Zeit für eine Weiterbildung? Es bleibt nicht mehr viel Zeit, das Jahres-Schulungs-Budget aufzubrauchen!
Vom 19. bis 23. August führt FromDual wieder einmal die Schulung MariaDB und MySQL für Fortgeschrittene in Köln durch. Siehe auch unsere Schulungstermine.
Es hat noch Plätze frei! Anmelden kann man sich direkt bei unserem Schulungs-Partner, der GFU Cyrus AG.
Diese MariaDB/MySQL Weiterbildung richtet sich an alle DBAs, DevOps und System Administratoren, welche MariaDB und MySQL Datenbanken zu betreuen habe und gerne besser verstehen wollen, wie man das noch besser macht.
In dieser Schulung behandeln wir Backup, Restore und Point-in-Time-Recovery sowohl einer kleinen wie auch einer grossen Datenbank. Aufsetzen von hochverfügbaren MariaDB und MySQL Datenbanken mittels der Master/Slave Replikation sowie dem Galera Cluster inklusive Switch-Over-Techniken. Schliesslich und endlich beschäftigen wir uns auch noch zwei Tage mit Datenbank Performance Tuning (Hardware, O/S, DB Konfiguration, Schema Tuning und SQL Query Tuning, etc.).
Das Ganze ist mir zahlreichen Übungen versehen, damit man das gelernte auch gleich praktisch anwenden kann!
Die Schulung findet in deutscher Sprache statt.
Den detaillierten Inhalt dieser fünftägigen MySQL/MariaDB Schulung finden Sie hier.
Bei weiteren Fragen nehmen Sie bitte mit uns Kontakt auf.
Taxonomy upgrade extras: schulungtraininggaleraBackupRestorepoint-in-time-recoveryreplikationPerformance TuningSchulung MariaDB/MySQL für Fortgeschrittene im August 2019 in Köln
Sommerferien-Zeit – für all die Daheimgebliebenen dürfte es jetzt hoffentlich etwas ruhiger zu und her gehen. Zeit für eine Weiterbildung? Es bleibt nicht mehr viel Zeit, das Jahres-Schulungs-Budget aufzubrauchen!
Vom 19. bis 23. August führt FromDual wieder einmal die Schulung MariaDB und MySQL für Fortgeschrittene in Köln durch. Siehe auch unsere Schulungstermine.
Es hat noch Plätze frei! Anmelden kann man sich direkt bei unserem Schulungs-Partner, der GFU Cyrus AG.
Diese MariaDB/MySQL Weiterbildung richtet sich an alle DBAs, DevOps und System Administratoren, welche MariaDB und MySQL Datenbanken zu betreuen habe und gerne besser verstehen wollen, wie man das noch besser macht.
In dieser Schulung behandeln wir Backup, Restore und Point-in-Time-Recovery sowohl einer kleinen wie auch einer grossen Datenbank. Aufsetzen von hochverfügbaren MariaDB und MySQL Datenbanken mittels der Master/Slave Replikation sowie dem Galera Cluster inklusive Switch-Over-Techniken. Schliesslich und endlich beschäftigen wir uns auch noch zwei Tage mit Datenbank Performance Tuning (Hardware, O/S, DB Konfiguration, Schema Tuning und SQL Query Tuning, etc.).
Das Ganze ist mir zahlreichen Übungen versehen, damit man das gelernte auch gleich praktisch anwenden kann!
Die Schulung findet in deutscher Sprache statt.
Den detaillierten Inhalt dieser fünftägigen MySQL/MariaDB Schulung finden Sie hier.
Bei weiteren Fragen nehmen Sie bitte mit uns Kontakt auf.
Taxonomy upgrade extras: schulungtraininggaleraBackupRestorepoint-in-time-recoveryreplikationPerformance TuningWho else is using my memory - File System Cache analysis
See also our former articles:
- Do not underestimate performance impacts of swapping on NUMA database systems
- MariaDB and MySQL swap analysis
When we do analysis of MariaDB Database servers we also check the memory (RAM and Swap) available:
# free --kilo --wide total used free shared buffers cache available Mem: 16106252 4329952 703356 199008 307872 10765072 11042748 Swap: 31250428 528684 30721744The values for buffers and especially for cache can be sometimes quite big. In this case they use about 10 GiB. So let us have a look what these things called buffers and cache are, using our valuable RAM... When we check the man pages of free we will find:
# man free ... buffers Memory used by kernel buffers (Buffers in /proc/meminfo) cache Memory used by the page cache and slabs (Cached and Slab in /proc/meminfo) buff/cache Sum of buffers and cacheSo let us check a more fine grained information in /proc/meminfo which is an interface to the kernel data structures:
# cat /proc/meminfo | grep -e ^Cached -e Slab -e Buffers Buffers: 307872 kB Cached: 10155156 kB Slab: 609916 kBSame values! Then let us have a look at the man pages of proc what we can find about these values:
# man proc ... Buffers Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so). Cached In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached. Slab In-kernel data structures cache.So it looks like we have a raw I/O Cache (called Buffer Cache) and a File System I/O Cache (called Page Cache). So how does this work? What is a raw I/O? And is a Files System I/O cached once (Cached) or twice (Cached and Buffers)?
When we dig a bit deeper we can find that prior to Linux Kernels 2.4 the two Caches were distinct. So that was a waste of memory (RAM). It seems like today this is not the case any more [1], [2], [3]. And man pages are a bit out of date or at least not very precise?
Analysing the Linux Page CacheA very good source when it comes to Linux Performance Tuning and Measuring is Brendan Gregg's Website. To measure Linux Page Cache Hit Ratio he provides a tool called cachestat which is part of the perf-tools collection on GitHub.
With cachestat we get a per second statistics of the Buffer Cache and the Page Cache (without Slabs), Cache Hits, Cache Misses, Dirty Buffer Entries in the Cache and a Cache Hit Ratio:
# sudo cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 1419 8 0 99.4% 338 9406 1368 0 0 100.0% 338 9406 1391 0 0 100.0% 338 9406 8558 0 29 100.0% 338 9406 31870 0 163 100.0% 338 9406 1374 0 24 100.0% 338 9406 1388 0 0 100.0% 338 9406 1370 0 0 100.0% 338 9406 1388 0 0 100.0% 338 9406Brendan Gregg also mentions a tool called pcstat (on GitHub) by Al Tobey which gets Page Cache Statistics for Files. Unfortunately I had some problems building it on my Ubuntu 16.04 with Go version 1.6. So I built it on an Ubuntu 18.04 (Go 1.10) and copied it over to to Ubuntu 16.04):
# export GOPATH=/tmp/ # cd $GOPATH # go get golang.org/x/sys/unix # go get github.com/tobert/pcstat/pcstat # bin/pcstat $GOPATH/bin/pcstatThen I tried pcstat out against a MariaDB 10.4 instance. In the output we can see how big the files are in bytes, how many pages of 4 kib this corresponds to, how many of these 4 kib pages are cached and the percentage of pages cached:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 14642 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 13631488 | 3328 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+When we run pcstat over time with the famous watch command we can even see how the Page Cache is heating up:
# watch -d -n 1 'pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* ; free -w' +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 14642 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 2416 | 012.418 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 3165 | 004.829 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 5890 | 008.987 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 1 | 100.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 13631488 | 3328 | 1164 | 034.976 | +------------------------------------------------------+----------------+------------+-----------+---------+ total used free shared buffers cache available Mem: 16106252 4329952 703356 199008 307872 10765072 11042748 Swap: 31250428 528684 30721744An other tool which was discussed on Brendans Website was vmtouch - the Virtual Memory Toucher (on GitHub, Documentation). With vmtouch we can see for example how much of the directory /home/mysql/database/mariadb-104/data (datadir) is currently in cache:
# vmtouch -f /home/mysql/database/mariadb-104/data Files: 503 Directories: 9 Resident Pages: 29356/231060 114M/902M 12.7% Elapsed: 0.009668 secondsOr more fine grained how much of InnoDB System Files are currently in memory:
# vmtouch -f -v /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/ib_buffer_pool [ ] 0/4 /home/mysql/database/mariadb-104/data/ibdata1 [oOooooo ooooo ] 2416/19456 /home/mysql/database/mariadb-104/data/ib_logfile0 [o oOO] 3165/65536 /home/mysql/database/mariadb-104/data/ib_logfile1 [OOOOOOOOOOOOOOOOOOOOOo ] 23192/65536 /home/mysql/database/mariadb-104/data/ibtmp1 [ ] 0/3072 Files: 5 Directories: 0 Resident Pages: 28773/153604 112M/600M 18.7% Elapsed: 0.005499 secondsA further question to answer is: Can I see all files cached in the Page Cache? So it seem like this is not easily possible:
There is no efficient search mechanism for doing the reverse - getting a file name belonging to a data block would require reading all inodes and indirect blocks on the file system. If you need to know about every single file's blocks stored in the page cache, you would need to supply a list of all files on your file system(s) to fincore. But that again is likely to spoil the measurement as a large amount of data would be read traversing the directories and getting all inodes and indirect blocks - putting them into the page cache and evicting the very page cache data you were trying to examine. [5]Also in this article we can read about the Linux File Tools (linux-ftools) by Google. It seems to be a bit more complicated to make them work. So I let it be.
How is the Page Cache related to MariaDBAfter all this technical O/S discussion, how is Linux Page Cache related to your MariaDB Database? Your MariaDB Database caches Data and Indexes as well. For the InnoDB Storage Engine this is the InnoDB Buffer Pool and for the Aria Storage Engine this is the Aria Page Cache Buffer. So if your MariaDB Database caches pages and if your Linux O/S caches pages the probability is high they cache the same data twice and thus waste valuable RAM! Fortunately InnoDB is configurable in a way it does NOT cache InnoDB files in the Page Cache. This is controlled with the InnoDB Server Variable innodb_flush_method.
When we look at InnoDB Files which were opened in a "normal" way (default: innodb_flush_method = fsync) we get the following information about how the files were opened (man 2 open and [6]):
# lsof +fg ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG RW,LG,0x80000 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG RW,LG,0x80000 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG RW,LG,0x80000 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG RW,LG,0x80000 8,1 12582912 9175280 ./ibtmp1The interesting column here is the FILE-FLAG column which indicates (man lsof):
# man lsof FILE-FLAG when g or G has been specified to +f, this field contains the contents of the f_flag[s] member of the kernel file structure and the kernel's per-process open file flags (if available); `G' causes them to be displayed in hexadecimal; `g', as short-hand names; two lists may be displayed with entries separated by commas, the lists separated by a semicolon (`;'); the first list may contain short-hand names for f_flag[s] values from the following table: DIR direct LG large file RW read and write accessThe output is not so clear or completely understandable yet thus we want to have the open file flags in hexadecimal notation:
# lsof +fG ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG 0x88002;0x0 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG 0x88002;0x0 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG 0x88002;0x0 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG 0x88002;0x0 8,1 12582912 9175280 ./ibtmp1The Linux Kernel open file flags can be found here: fcntl.h. I have extracted the most relevant open file flags for our examination:
#define O_RDWR 00000002 (oct, 0x00002) #define O_DIRECT 00040000 (oct, 0x04000) /* direct disk access hint */ #define O_LARGEFILE 00100000 (oct, 0x08000) #define O_CLOEXEC 02000000 (oct, 0x80000) /* set close_on_exec */So we can see that these 4 InnoDB files where opened with O_RDWR (RW), O_LARGE_FILE (LG) and O_CLOEXEC (not available (yet?) in lsof translation output).
Now let us start the MariaDB Database with the server variable set to: innodb_flush_method = O_DIRECT and check how the files where opened:
# lsof +fg ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG RW,DIR,LG,0x80000 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG RW,LG,0x80000 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG RW,LG,0x80000 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG RW,DIR,LG,0x80000 8,1 12582912 9175280 ./ibtmp1 # lsof +fG ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG 0x8c002;0x0 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG 0x88002;0x0 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG 0x88002;0x0 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG 0x8c002;0x0 8,1 12582912 9175280 ./ibtmp1We can see a new flag DIR or 0x04000 which means the files where opened with O_DIRECT. But only the InnoDB Temporary Table Tablespace and the InnoDB System Tablespace but not the two InnoDB Transaction Logs.
Translation of hex to oct: 0x8c002 = 02140002.
But what does O_DIRECT mean? Looking at the open(2) man pages we can find:
# man 2 open O_DIRECT (since Linux 2.4.10) Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary meta‐ data are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT.So O_DIRECT is exactly what we want in this case: Bypassing the File System Page Cache to not cache the Database blocks twice!
To verify the impact we run pcstat again:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 4 | 100.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 140 | 000.720 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 36844 | 056.219 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 65536 | 100.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 1 | 100.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 67108864 | 16384 | 13400 | 081.787 | +------------------------------------------------------+----------------+------------+-----------+---------+But... part of the InnoDB Tablespace files is still cached! Also checking the total amount of Buffers and Cache shows the same:
# free total used free shared buff/cache available Mem: 16106252 4401788 368200 456716 11336264 10691792 Swap: 31250428 1348440 29901988So restarting the MariaDB database does not purge the Page Cache! Note: This is important to notice because bypassing the Page Cache helps to not wasting valuable RAM but it makes Database restart much more costly because Page Cache does not help/support InnoDB Buffer Pool heating any more!
Then let us clear the Linux Page Cache and check the result:
# echo 1 > /proc/sys/vm/drop_caches # free -w total used free shared buffers cache available Mem: 16106252 4395892 10539864 441708 696 1169800 10882984 Swap: 31250428 1348428 29902000Checking with pcstat shows now that all InnoDB pages are wiped out of the Page Cache:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 67108864 | 16384 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+And after a while running traffic on the test table we can see that InnoDB Transaction Log Files are cached again in the Page Cache but NOT the InnoDB Tablespace files:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 3012 | 004.596 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 71303168 | 17408 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+Also with vmtouch we can see the difference:
./vmtouch -f -v /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* /home/mysql/database/mariadb-104/data/ib_buffer_pool [ ] 0/4 /home/mysql/database/mariadb-104/data/ibdata1 [ ] 0/19456 /home/mysql/database/mariadb-104/data/ib_logfile0 [o oOOOo ] 4252/65536 /home/mysql/database/mariadb-104/data/ib_logfile1 [ ] 0/65536 /home/mysql/database/mariadb-104/data/ibtmp1 [ ] 0/3072 /home/mysql/database/mariadb-104/data/test/test.frm [ ] 0/1 /home/mysql/database/mariadb-104/data/test/test.ibd [ ] 0/17408 Files: 7 Directories: 0 Resident Pages: 4252/171013 16M/668M 2.49% Elapsed: 0.003264 secondsAnd also cachestat shows the effect of a flushed Buffer Cache and Page Cache:
# ./cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 677882 19 740 100.0% 67 1087 679213 10 700 100.0% 67 1087 677236 0 732 100.0% 67 1087 685673 11 932 100.0% 67 1088 677933 5 703 100.0% 67 1088Caution: Depending on your underlying I/O system it makes nevertheless sense to run your MariaDB Database with innodb_flush_method = fsync in certain cases! See also PostgreSQL behaviour.
Note: This information could also be interesting for PostgreSQL DBAs because they do redundant buffering with their shared_buffers (why plural? It is just one!?!) and the O/S Page Cache as well!
What is SlabBeside Buffer Cache and the Page Cache itself we have a third thing in the /proc/meminfo statistics listed as Slabs. So what are Slabs? Slab seems to be a specific memory management (allocation) mechanism. It is used for frequently used objects in the Linux Kernel (buffer heads, inodes, dentries, etc.) [7-15]. So it contains something like other Linux Kernel Buffers and Kernel Caches.
What kind of other Linux Kernel Buffer and Kernel Caches exists can be found with the following command:
# sudo cat /proc/slabinfo slabinfo - version: 2.1 # name : tunables : slabdata nf_conntrack_1 14183 15275 320 25 2 : tunables 0 0 0 : slabdata 611 611 0 ext4_groupinfo_4k 8575 8596 144 28 1 : tunables 0 0 0 : slabdata 307 307 0 i915_gem_vma 523 950 320 25 2 : tunables 0 0 0 : slabdata 38 38 0 UDPv6 120 120 1088 30 8 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCPv6 2668 2668 280 29 2 : tunables 0 0 0 : slabdata 92 92 0 request_sock_TCPv6 24 72 328 24 2 : tunables 0 0 0 : slabdata 3 3 0 TCPv6 68 105 2112 15 8 : tunables 0 0 0 : slabdata 7 7 0 cfq_queue 391 442 232 17 1 : tunables 0 0 0 : slabdata 26 26 0 mqueue_inode_cache 72 72 896 18 4 : tunables 0 0 0 : slabdata 4 4 0 fuse_request 20 40 400 20 2 : tunables 0 0 0 : slabdata 2 2 0 fuse_inode 1 21 768 21 4 : tunables 0 0 0 : slabdata 1 1 0 fat_cache 102 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0 hugetlbfs_inode_cache 28 84 584 28 4 : tunables 0 0 0 : slabdata 3 3 0 squashfs_inode_cache 25 50 640 25 4 : tunables 0 0 0 : slabdata 2 2 0 jbd2_journal_handle 340 340 48 85 1 : tunables 0 0 0 : slabdata 4 4 0 jbd2_journal_head 2040 2040 120 34 1 : tunables 0 0 0 : slabdata 60 60 0 jbd2_revoke_table_s 260 512 16 256 1 : tunables 0 0 0 : slabdata 2 2 0 jbd2_revoke_record_s1152 1408 32 128 1 : tunables 0 0 0 : slabdata 11 11 0 ext4_inode_cache 208751 210840 1072 30 8 : tunables 0 0 0 : slabdata 7028 7028 0 ext4_free_data 320 448 64 64 1 : tunables 0 0 0 : slabdata 7 7 0 ext4_allocation_cont 128 128 128 32 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_io_end 392 560 72 56 1 : tunables 0 0 0 : slabdata 10 10 0 ext4_extent_status 64412 77928 40 102 1 : tunables 0 0 0 : slabdata 764 764 0 dquot 144 160 256 16 1 : tunables 0 0 0 : slabdata 10 10 0 mbcache 226 292 56 73 1 : tunables 0 0 0 : slabdata 4 4 0 dio 273 350 640 25 4 : tunables 0 0 0 : slabdata 14 14 0 pid_namespace 42 42 2224 14 8 : tunables 0 0 0 : slabdata 3 3 0 ip4-frags 32 64 248 16 1 : tunables 0 0 0 : slabdata 4 4 0 RAW 396 396 896 18 4 : tunables 0 0 0 : slabdata 22 22 0 UDP 68 68 960 17 4 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCP 10750 11136 280 29 2 : tunables 0 0 0 : slabdata 384 384 0 request_sock_TCP 96 96 328 24 2 : tunables 0 0 0 : slabdata 4 4 0 TCP 119 136 1920 17 8 : tunables 0 0 0 : slabdata 8 8 0 blkdev_queue 27 48 1336 24 8 : tunables 0 0 0 : slabdata 2 2 0 blkdev_requests 394 506 368 22 2 : tunables 0 0 0 : slabdata 23 23 0 blkdev_ioc 516 546 104 39 1 : tunables 0 0 0 : slabdata 14 14 0 user_namespace 104 104 304 26 2 : tunables 0 0 0 : slabdata 4 4 0 dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 1707 1950 640 25 4 : tunables 0 0 0 : slabdata 78 78 0 file_lock_cache 665 665 208 19 1 : tunables 0 0 0 : slabdata 35 35 0 net_namespace 40 40 7296 4 8 : tunables 0 0 0 : slabdata 10 10 0 shmem_inode_cache 3315 3432 656 24 4 : tunables 0 0 0 : slabdata 143 143 0 taskstats 96 96 328 24 2 : tunables 0 0 0 : slabdata 4 4 0 proc_inode_cache 6895 7072 624 26 4 : tunables 0 0 0 : slabdata 272 272 0 sigqueue 100 100 160 25 1 : tunables 0 0 0 : slabdata 4 4 0 bdev_cache 29 76 832 19 4 : tunables 0 0 0 : slabdata 4 4 0 kernfs_node_cache 43625 44982 120 34 1 : tunables 0 0 0 : slabdata 1323 1323 0 mnt_cache 518 546 384 21 2 : tunables 0 0 0 : slabdata 26 26 0 inode_cache 17519 17668 568 28 4 : tunables 0 0 0 : slabdata 631 631 0 dentry 424185 439992 192 21 1 : tunables 0 0 0 : slabdata 20952 20952 0 buffer_head 1112865 1112865 104 39 1 : tunables 0 0 0 : slabdata 28535 28535 0 vm_area_struct 53945 55300 200 20 1 : tunables 0 0 0 : slabdata 2765 2765 0 files_cache 260 299 704 23 4 : tunables 0 0 0 : slabdata 13 13 0 signal_cache 509 630 1088 30 8 : tunables 0 0 0 : slabdata 21 21 0 sighand_cache 346 405 2112 15 8 : tunables 0 0 0 : slabdata 27 27 0 task_struct 1189 1269 3584 9 8 : tunables 0 0 0 : slabdata 141 141 0 Acpi-Operand 5703 5824 72 56 1 : tunables 0 0 0 : slabdata 104 104 0 Acpi-Parse 1314 1314 56 73 1 : tunables 0 0 0 : slabdata 18 18 0 Acpi-State 204 204 80 51 1 : tunables 0 0 0 : slabdata 4 4 0 Acpi-Namespace 4077 4182 40 102 1 : tunables 0 0 0 : slabdata 41 41 0 anon_vma 19831 21522 80 51 1 : tunables 0 0 0 : slabdata 422 422 0 numa_policy 170 170 24 170 1 : tunables 0 0 0 : slabdata 1 1 0 radix_tree_node 321937 327740 584 28 4 : tunables 0 0 0 : slabdata 11705 11705 0 trace_event_file 3985 4002 88 46 1 : tunables 0 0 0 : slabdata 87 87 0 ftrace_event_field 86541 88570 48 85 1 : tunables 0 0 0 : slabdata 1042 1042 0 idr_layer_cache 533 555 2096 15 8 : tunables 0 0 0 : slabdata 37 37 0 kmalloc-8192 1246 1246 8192 4 8 : tunables 0 0 0 : slabdata 502 502 0 kmalloc-4096 658 720 4096 8 8 : tunables 0 0 0 : slabdata 90 90 0 kmalloc-2048 1955 2144 2048 16 8 : tunables 0 0 0 : slabdata 134 134 0 kmalloc-1024 44217 44384 1024 16 4 : tunables 0 0 0 : slabdata 2774 2774 0 kmalloc-512 3037 3808 512 16 2 : tunables 0 0 0 : slabdata 238 238 0 kmalloc-256 17465 20384 256 16 1 : tunables 0 0 0 : slabdata 1274 1274 0 kmalloc-192 27708 28665 192 21 1 : tunables 0 0 0 : slabdata 1365 1365 0 kmalloc-128 140581 143744 128 32 1 : tunables 0 0 0 : slabdata 4492 4492 0 kmalloc-96 168044 168378 96 42 1 : tunables 0 0 0 : slabdata 4009 4009 0 kmalloc-64 117533 123264 64 64 1 : tunables 0 0 0 : slabdata 1926 1926 0 kmalloc-32 80425 90368 32 128 1 : tunables 0 0 0 : slabdata 706 706 0 kmalloc-16 9513 11264 16 256 1 : tunables 0 0 0 : slabdata 44 44 0 kmalloc-8 6616 7168 8 512 1 : tunables 0 0 0 : slabdata 14 14 0 kmem_cache_node 320 320 64 64 1 : tunables 0 0 0 : slabdata 5 5 0 kmem_cache 208 208 256 16 1 : tunables 0 0 0 : slabdata 13 13 0If you want to see the most frequently used (hottest) Slabs you can see them top-like with slabtop. If you press c you can sort the Slabs by CACHE_SIZE:
# sudo slabtop Active / Total Objects (% used) : 2249113 / 2280136 (98.6%) Active / Total Slabs (% used) : 70256 / 70256 (100.0%) Active / Total Caches (% used) : 86 / 121 (71.1%) Active / Total Size (% used) : 597547.86K / 605445.30K (98.7%) Minimum / Average / Maximum Object : 0.01K / 0.26K / 18.56K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 294308 289889 98% 0.57K 10511 28 168176K radix_tree_node 105030 104435 99% 1.05K 3501 30 112032K ext4_inode_cache 745446 745446 100% 0.10K 19114 39 76456K buffer_head 59984 59909 99% 1.00K 3749 16 59984K ecryptfs_inode_cache 47520 47157 99% 1.00K 2970 16 47520K kmalloc-1024 215166 214987 99% 0.19K 10246 21 40984K dentry 139744 138452 99% 0.12K 4367 32 17468K kmalloc-128 179508 179011 99% 0.09K 4274 42 17096K kmalloc-96 47140 45768 97% 0.20K 2357 20 9428K vm_area_struct 14700 14700 100% 0.55K 525 28 8400K inode_cache ...Literature
- [1] What is the major difference between the buffer cache and the page cache? Why were they separate entities in older kernels? Why were they merged later on? (2012-10-21)
- [2] What is the difference between Buffers and Cached columns in /proc/meminfo output? (2012-10-19)
- [3] The future of the page cache (2017-01-25)
- [4] Linux Page Cache Basics
- [5] Is it possible to list the files that are cached?
- [6] RFC: Clarifying Direct I/O Semantics
- [7] Slab allocation
- [8] Memory management
- [9] Overview of Linux Memory Management Concepts: Slabs
- [10] Memory management - Buddy allocator, Slab allocator
- [11] The SLUB allocator
- [12] Slab Allocator
- [13] What are the different caches reported by slabtop?
- [14] How /proc/slabinfo is not quite telling you what it looks like
- [15] Short users guide for SLUB
Who else is using my memory - File System Cache analysis
See also our former articles:
- Do not underestimate performance impacts of swapping on NUMA database systems
- MariaDB and MySQL swap analysis
When we do analysis of MariaDB Database servers we also check the memory (RAM and Swap) available:
# free --kilo --wide total used free shared buffers cache available Mem: 16106252 4329952 703356 199008 307872 10765072 11042748 Swap: 31250428 528684 30721744The values for buffers and especially for cache can be sometimes quite big. In this case they use about 10 GiB. So let us have a look what these things called buffers and cache are, using our valuable RAM... When we check the man pages of free we will find:
# man free ... buffers Memory used by kernel buffers (Buffers in /proc/meminfo) cache Memory used by the page cache and slabs (Cached and Slab in /proc/meminfo) buff/cache Sum of buffers and cacheSo let us check a more fine grained information in /proc/meminfo which is an interface to the kernel data structures:
# cat /proc/meminfo | grep -e ^Cached -e Slab -e Buffers Buffers: 307872 kB Cached: 10155156 kB Slab: 609916 kBSame values! Then let us have a look at the man pages of proc what we can find about these values:
# man proc ... Buffers Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so). Cached In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached. Slab In-kernel data structures cache.So it looks like we have a raw I/O Cache (called Buffer Cache) and a File System I/O Cache (called Page Cache). So how does this work? What is a raw I/O? And is a Files System I/O cached once (Cached) or twice (Cached and Buffers)?
When we dig a bit deeper we can find that prior to Linux Kernels 2.4 the two Caches were distinct. So that was a waste of memory (RAM). It seems like today this is not the case any more [1], [2], [3]. And man pages are a bit out of date or at least not very precise?
Analysing the Linux Page CacheA very good source when it comes to Linux Performance Tuning and Measuring is Brendan Gregg's Website. To measure Linux Page Cache Hit Ratio he provides a tool called cachestat which is part of the perf-tools collection on GitHub.
With cachestat we get a per second statistics of the Buffer Cache and the Page Cache (without Slabs), Cache Hits, Cache Misses, Dirty Buffer Entries in the Cache and a Cache Hit Ratio:
# sudo cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 1419 8 0 99.4% 338 9406 1368 0 0 100.0% 338 9406 1391 0 0 100.0% 338 9406 8558 0 29 100.0% 338 9406 31870 0 163 100.0% 338 9406 1374 0 24 100.0% 338 9406 1388 0 0 100.0% 338 9406 1370 0 0 100.0% 338 9406 1388 0 0 100.0% 338 9406Brendan Gregg also mentions a tool called pcstat (on GitHub) by Al Tobey which gets Page Cache Statistics for Files. Unfortunately I had some problems building it on my Ubuntu 16.04 with Go version 1.6. So I built it on an Ubuntu 18.04 (Go 1.10) and copied it over to to Ubuntu 16.04):
# export GOPATH=/tmp/ # cd $GOPATH # go get golang.org/x/sys/unix # go get github.com/tobert/pcstat/pcstat # bin/pcstat $GOPATH/bin/pcstatThen I tried pcstat out against a MariaDB 10.4 instance. In the output we can see how big the files are in bytes, how many pages of 4 kib this corresponds to, how many of these 4 kib pages are cached and the percentage of pages cached:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 14642 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 13631488 | 3328 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+When we run pcstat over time with the famous watch command we can even see how the Page Cache is heating up:
# watch -d -n 1 'pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* ; free -w' +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 14642 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 2416 | 012.418 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 3165 | 004.829 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 5890 | 008.987 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 1 | 100.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 13631488 | 3328 | 1164 | 034.976 | +------------------------------------------------------+----------------+------------+-----------+---------+ total used free shared buffers cache available Mem: 16106252 4329952 703356 199008 307872 10765072 11042748 Swap: 31250428 528684 30721744An other tool which was discussed on Brendans Website was vmtouch - the Virtual Memory Toucher (on GitHub, Documentation). With vmtouch we can see for example how much of the directory /home/mysql/database/mariadb-104/data (datadir) is currently in cache:
# vmtouch -f /home/mysql/database/mariadb-104/data Files: 503 Directories: 9 Resident Pages: 29356/231060 114M/902M 12.7% Elapsed: 0.009668 secondsOr more fine grained how much of InnoDB System Files are currently in memory:
# vmtouch -f -v /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/ib_buffer_pool [ ] 0/4 /home/mysql/database/mariadb-104/data/ibdata1 [oOooooo ooooo ] 2416/19456 /home/mysql/database/mariadb-104/data/ib_logfile0 [o oOO] 3165/65536 /home/mysql/database/mariadb-104/data/ib_logfile1 [OOOOOOOOOOOOOOOOOOOOOo ] 23192/65536 /home/mysql/database/mariadb-104/data/ibtmp1 [ ] 0/3072 Files: 5 Directories: 0 Resident Pages: 28773/153604 112M/600M 18.7% Elapsed: 0.005499 secondsA further question to answer is: Can I see all files cached in the Page Cache? So it seem like this is not easily possible:
There is no efficient search mechanism for doing the reverse - getting a file name belonging to a data block would require reading all inodes and indirect blocks on the file system. If you need to know about every single file's blocks stored in the page cache, you would need to supply a list of all files on your file system(s) to fincore. But that again is likely to spoil the measurement as a large amount of data would be read traversing the directories and getting all inodes and indirect blocks - putting them into the page cache and evicting the very page cache data you were trying to examine. [5]Also in this article we can read about the Linux File Tools (linux-ftools) by Google. It seems to be a bit more complicated to make them work. So I let it be.
How is the Page Cache related to MariaDBAfter all this technical O/S discussion, how is Linux Page Cache related to your MariaDB Database? Your MariaDB Database caches Data and Indexes as well. For the InnoDB Storage Engine this is the InnoDB Buffer Pool and for the Aria Storage Engine this is the Aria Page Cache Buffer. So if your MariaDB Database caches pages and if your Linux O/S caches pages the probability is high they cache the same data twice and thus waste valuable RAM! Fortunately InnoDB is configurable in a way it does NOT cache InnoDB files in the Page Cache. This is controlled with the InnoDB Server Variable innodb_flush_method.
When we look at InnoDB Files which were opened in a "normal" way (default: innodb_flush_method = fsync) we get the following information about how the files were opened (man 2 open and [6]):
# lsof +fg ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG RW,LG,0x80000 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG RW,LG,0x80000 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG RW,LG,0x80000 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG RW,LG,0x80000 8,1 12582912 9175280 ./ibtmp1The interesting column here is the FILE-FLAG column which indicates (man lsof):
# man lsof FILE-FLAG when g or G has been specified to +f, this field contains the contents of the f_flag[s] member of the kernel file structure and the kernel's per-process open file flags (if available); `G' causes them to be displayed in hexadecimal; `g', as short-hand names; two lists may be displayed with entries separated by commas, the lists separated by a semicolon (`;'); the first list may contain short-hand names for f_flag[s] values from the following table: DIR direct LG large file RW read and write accessThe output is not so clear or completely understandable yet thus we want to have the open file flags in hexadecimal notation:
# lsof +fG ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG 0x88002;0x0 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG 0x88002;0x0 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG 0x88002;0x0 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG 0x88002;0x0 8,1 12582912 9175280 ./ibtmp1The Linux Kernel open file flags can be found here: fcntl.h. I have extracted the most relevant open file flags for our examination:
#define O_RDWR 00000002 (oct, 0x00002) #define O_DIRECT 00040000 (oct, 0x04000) /* direct disk access hint */ #define O_LARGEFILE 00100000 (oct, 0x08000) #define O_CLOEXEC 02000000 (oct, 0x80000) /* set close_on_exec */So we can see that these 4 InnoDB files where opened with O_RDWR (RW), O_LARGE_FILE (LG) and O_CLOEXEC (not available (yet?) in lsof translation output).
Now let us start the MariaDB Database with the server variable set to: innodb_flush_method = O_DIRECT and check how the files where opened:
# lsof +fg ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG RW,DIR,LG,0x80000 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG RW,LG,0x80000 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG RW,LG,0x80000 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG RW,DIR,LG,0x80000 8,1 12582912 9175280 ./ibtmp1 # lsof +fG ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG 0x8c002;0x0 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG 0x88002;0x0 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG 0x88002;0x0 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG 0x8c002;0x0 8,1 12582912 9175280 ./ibtmp1We can see a new flag DIR or 0x04000 which means the files where opened with O_DIRECT. But only the InnoDB Temporary Table Tablespace and the InnoDB System Tablespace but not the two InnoDB Transaction Logs.
Translation of hex to oct: 0x8c002 = 02140002.
But what does O_DIRECT mean? Looking at the open(2) man pages we can find:
# man 2 open O_DIRECT (since Linux 2.4.10) Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary meta‐ data are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT.So O_DIRECT is exactly what we want in this case: Bypassing the File System Page Cache to not cache the Database blocks twice!
To verify the impact we run pcstat again:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 4 | 100.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 140 | 000.720 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 36844 | 056.219 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 65536 | 100.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 1 | 100.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 67108864 | 16384 | 13400 | 081.787 | +------------------------------------------------------+----------------+------------+-----------+---------+But... part of the InnoDB Tablespace files is still cached! Also checking the total amount of Buffers and Cache shows the same:
# free total used free shared buff/cache available Mem: 16106252 4401788 368200 456716 11336264 10691792 Swap: 31250428 1348440 29901988So restarting the MariaDB database does not purge the Page Cache! Note: This is important to notice because bypassing the Page Cache helps to not wasting valuable RAM but it makes Database restart much more costly because Page Cache does not help/support InnoDB Buffer Pool heating any more!
Then let us clear the Linux Page Cache and check the result:
# echo 1 > /proc/sys/vm/drop_caches # free -w total used free shared buffers cache available Mem: 16106252 4395892 10539864 441708 696 1169800 10882984 Swap: 31250428 1348428 29902000Checking with pcstat shows now that all InnoDB pages are wiped out of the Page Cache:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 67108864 | 16384 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+And after a while running traffic on the test table we can see that InnoDB Transaction Log Files are cached again in the Page Cache but NOT the InnoDB Tablespace files:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 3012 | 004.596 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 71303168 | 17408 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+Also with vmtouch we can see the difference:
./vmtouch -f -v /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* /home/mysql/database/mariadb-104/data/ib_buffer_pool [ ] 0/4 /home/mysql/database/mariadb-104/data/ibdata1 [ ] 0/19456 /home/mysql/database/mariadb-104/data/ib_logfile0 [o oOOOo ] 4252/65536 /home/mysql/database/mariadb-104/data/ib_logfile1 [ ] 0/65536 /home/mysql/database/mariadb-104/data/ibtmp1 [ ] 0/3072 /home/mysql/database/mariadb-104/data/test/test.frm [ ] 0/1 /home/mysql/database/mariadb-104/data/test/test.ibd [ ] 0/17408 Files: 7 Directories: 0 Resident Pages: 4252/171013 16M/668M 2.49% Elapsed: 0.003264 secondsAnd also cachestat shows the effect of a flushed Buffer Cache and Page Cache:
# ./cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 677882 19 740 100.0% 67 1087 679213 10 700 100.0% 67 1087 677236 0 732 100.0% 67 1087 685673 11 932 100.0% 67 1088 677933 5 703 100.0% 67 1088Caution: Depending on your underlying I/O system it makes nevertheless sense to run your MariaDB Database with innodb_flush_method = fsync in certain cases! See also PostgreSQL behaviour.
Note: This information could also be interesting for PostgreSQL DBAs because they do redundant buffering with their shared_buffers (why plural? It is just one!?!) and the O/S Page Cache as well!
What is SlabBeside Buffer Cache and the Page Cache itself we have a third thing in the /proc/meminfo statistics listed as Slabs. So what are Slabs? Slab seems to be a specific memory management (allocation) mechanism. It is used for frequently used objects in the Linux Kernel (buffer heads, inodes, dentries, etc.) [7-15]. So it contains something like other Linux Kernel Buffers and Kernel Caches.
What kind of other Linux Kernel Buffer and Kernel Caches exists can be found with the following command:
# sudo cat /proc/slabinfo slabinfo - version: 2.1 # name : tunables : slabdata nf_conntrack_1 14183 15275 320 25 2 : tunables 0 0 0 : slabdata 611 611 0 ext4_groupinfo_4k 8575 8596 144 28 1 : tunables 0 0 0 : slabdata 307 307 0 i915_gem_vma 523 950 320 25 2 : tunables 0 0 0 : slabdata 38 38 0 UDPv6 120 120 1088 30 8 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCPv6 2668 2668 280 29 2 : tunables 0 0 0 : slabdata 92 92 0 request_sock_TCPv6 24 72 328 24 2 : tunables 0 0 0 : slabdata 3 3 0 TCPv6 68 105 2112 15 8 : tunables 0 0 0 : slabdata 7 7 0 cfq_queue 391 442 232 17 1 : tunables 0 0 0 : slabdata 26 26 0 mqueue_inode_cache 72 72 896 18 4 : tunables 0 0 0 : slabdata 4 4 0 fuse_request 20 40 400 20 2 : tunables 0 0 0 : slabdata 2 2 0 fuse_inode 1 21 768 21 4 : tunables 0 0 0 : slabdata 1 1 0 fat_cache 102 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0 hugetlbfs_inode_cache 28 84 584 28 4 : tunables 0 0 0 : slabdata 3 3 0 squashfs_inode_cache 25 50 640 25 4 : tunables 0 0 0 : slabdata 2 2 0 jbd2_journal_handle 340 340 48 85 1 : tunables 0 0 0 : slabdata 4 4 0 jbd2_journal_head 2040 2040 120 34 1 : tunables 0 0 0 : slabdata 60 60 0 jbd2_revoke_table_s 260 512 16 256 1 : tunables 0 0 0 : slabdata 2 2 0 jbd2_revoke_record_s1152 1408 32 128 1 : tunables 0 0 0 : slabdata 11 11 0 ext4_inode_cache 208751 210840 1072 30 8 : tunables 0 0 0 : slabdata 7028 7028 0 ext4_free_data 320 448 64 64 1 : tunables 0 0 0 : slabdata 7 7 0 ext4_allocation_cont 128 128 128 32 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_io_end 392 560 72 56 1 : tunables 0 0 0 : slabdata 10 10 0 ext4_extent_status 64412 77928 40 102 1 : tunables 0 0 0 : slabdata 764 764 0 dquot 144 160 256 16 1 : tunables 0 0 0 : slabdata 10 10 0 mbcache 226 292 56 73 1 : tunables 0 0 0 : slabdata 4 4 0 dio 273 350 640 25 4 : tunables 0 0 0 : slabdata 14 14 0 pid_namespace 42 42 2224 14 8 : tunables 0 0 0 : slabdata 3 3 0 ip4-frags 32 64 248 16 1 : tunables 0 0 0 : slabdata 4 4 0 RAW 396 396 896 18 4 : tunables 0 0 0 : slabdata 22 22 0 UDP 68 68 960 17 4 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCP 10750 11136 280 29 2 : tunables 0 0 0 : slabdata 384 384 0 request_sock_TCP 96 96 328 24 2 : tunables 0 0 0 : slabdata 4 4 0 TCP 119 136 1920 17 8 : tunables 0 0 0 : slabdata 8 8 0 blkdev_queue 27 48 1336 24 8 : tunables 0 0 0 : slabdata 2 2 0 blkdev_requests 394 506 368 22 2 : tunables 0 0 0 : slabdata 23 23 0 blkdev_ioc 516 546 104 39 1 : tunables 0 0 0 : slabdata 14 14 0 user_namespace 104 104 304 26 2 : tunables 0 0 0 : slabdata 4 4 0 dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 1707 1950 640 25 4 : tunables 0 0 0 : slabdata 78 78 0 file_lock_cache 665 665 208 19 1 : tunables 0 0 0 : slabdata 35 35 0 net_namespace 40 40 7296 4 8 : tunables 0 0 0 : slabdata 10 10 0 shmem_inode_cache 3315 3432 656 24 4 : tunables 0 0 0 : slabdata 143 143 0 taskstats 96 96 328 24 2 : tunables 0 0 0 : slabdata 4 4 0 proc_inode_cache 6895 7072 624 26 4 : tunables 0 0 0 : slabdata 272 272 0 sigqueue 100 100 160 25 1 : tunables 0 0 0 : slabdata 4 4 0 bdev_cache 29 76 832 19 4 : tunables 0 0 0 : slabdata 4 4 0 kernfs_node_cache 43625 44982 120 34 1 : tunables 0 0 0 : slabdata 1323 1323 0 mnt_cache 518 546 384 21 2 : tunables 0 0 0 : slabdata 26 26 0 inode_cache 17519 17668 568 28 4 : tunables 0 0 0 : slabdata 631 631 0 dentry 424185 439992 192 21 1 : tunables 0 0 0 : slabdata 20952 20952 0 buffer_head 1112865 1112865 104 39 1 : tunables 0 0 0 : slabdata 28535 28535 0 vm_area_struct 53945 55300 200 20 1 : tunables 0 0 0 : slabdata 2765 2765 0 files_cache 260 299 704 23 4 : tunables 0 0 0 : slabdata 13 13 0 signal_cache 509 630 1088 30 8 : tunables 0 0 0 : slabdata 21 21 0 sighand_cache 346 405 2112 15 8 : tunables 0 0 0 : slabdata 27 27 0 task_struct 1189 1269 3584 9 8 : tunables 0 0 0 : slabdata 141 141 0 Acpi-Operand 5703 5824 72 56 1 : tunables 0 0 0 : slabdata 104 104 0 Acpi-Parse 1314 1314 56 73 1 : tunables 0 0 0 : slabdata 18 18 0 Acpi-State 204 204 80 51 1 : tunables 0 0 0 : slabdata 4 4 0 Acpi-Namespace 4077 4182 40 102 1 : tunables 0 0 0 : slabdata 41 41 0 anon_vma 19831 21522 80 51 1 : tunables 0 0 0 : slabdata 422 422 0 numa_policy 170 170 24 170 1 : tunables 0 0 0 : slabdata 1 1 0 radix_tree_node 321937 327740 584 28 4 : tunables 0 0 0 : slabdata 11705 11705 0 trace_event_file 3985 4002 88 46 1 : tunables 0 0 0 : slabdata 87 87 0 ftrace_event_field 86541 88570 48 85 1 : tunables 0 0 0 : slabdata 1042 1042 0 idr_layer_cache 533 555 2096 15 8 : tunables 0 0 0 : slabdata 37 37 0 kmalloc-8192 1246 1246 8192 4 8 : tunables 0 0 0 : slabdata 502 502 0 kmalloc-4096 658 720 4096 8 8 : tunables 0 0 0 : slabdata 90 90 0 kmalloc-2048 1955 2144 2048 16 8 : tunables 0 0 0 : slabdata 134 134 0 kmalloc-1024 44217 44384 1024 16 4 : tunables 0 0 0 : slabdata 2774 2774 0 kmalloc-512 3037 3808 512 16 2 : tunables 0 0 0 : slabdata 238 238 0 kmalloc-256 17465 20384 256 16 1 : tunables 0 0 0 : slabdata 1274 1274 0 kmalloc-192 27708 28665 192 21 1 : tunables 0 0 0 : slabdata 1365 1365 0 kmalloc-128 140581 143744 128 32 1 : tunables 0 0 0 : slabdata 4492 4492 0 kmalloc-96 168044 168378 96 42 1 : tunables 0 0 0 : slabdata 4009 4009 0 kmalloc-64 117533 123264 64 64 1 : tunables 0 0 0 : slabdata 1926 1926 0 kmalloc-32 80425 90368 32 128 1 : tunables 0 0 0 : slabdata 706 706 0 kmalloc-16 9513 11264 16 256 1 : tunables 0 0 0 : slabdata 44 44 0 kmalloc-8 6616 7168 8 512 1 : tunables 0 0 0 : slabdata 14 14 0 kmem_cache_node 320 320 64 64 1 : tunables 0 0 0 : slabdata 5 5 0 kmem_cache 208 208 256 16 1 : tunables 0 0 0 : slabdata 13 13 0If you want to see the most frequently used (hottest) Slabs you can see them top-like with slabtop. If you press c you can sort the Slabs by CACHE_SIZE:
# sudo slabtop Active / Total Objects (% used) : 2249113 / 2280136 (98.6%) Active / Total Slabs (% used) : 70256 / 70256 (100.0%) Active / Total Caches (% used) : 86 / 121 (71.1%) Active / Total Size (% used) : 597547.86K / 605445.30K (98.7%) Minimum / Average / Maximum Object : 0.01K / 0.26K / 18.56K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 294308 289889 98% 0.57K 10511 28 168176K radix_tree_node 105030 104435 99% 1.05K 3501 30 112032K ext4_inode_cache 745446 745446 100% 0.10K 19114 39 76456K buffer_head 59984 59909 99% 1.00K 3749 16 59984K ecryptfs_inode_cache 47520 47157 99% 1.00K 2970 16 47520K kmalloc-1024 215166 214987 99% 0.19K 10246 21 40984K dentry 139744 138452 99% 0.12K 4367 32 17468K kmalloc-128 179508 179011 99% 0.09K 4274 42 17096K kmalloc-96 47140 45768 97% 0.20K 2357 20 9428K vm_area_struct 14700 14700 100% 0.55K 525 28 8400K inode_cache ...Literature
- [1] What is the major difference between the buffer cache and the page cache? Why were they separate entities in older kernels? Why were they merged later on? (2012-10-21)
- [2] What is the difference between Buffers and Cached columns in /proc/meminfo output? (2012-10-19)
- [3] The future of the page cache (2017-01-25)
- [4] Linux Page Cache Basics
- [5] Is it possible to list the files that are cached?
- [6] RFC: Clarifying Direct I/O Semantics
- [7] Slab allocation
- [8] Memory management
- [9] Overview of Linux Memory Management Concepts: Slabs
- [10] Memory management - Buddy allocator, Slab allocator
- [11] The SLUB allocator
- [12] Slab Allocator
- [13] What are the different caches reported by slabtop?
- [14] How /proc/slabinfo is not quite telling you what it looks like
- [15] Short users guide for SLUB
Who else is using my memory - File System Cache analysis
See also our former articles:
- Do not underestimate performance impacts of swapping on NUMA database systems
- MariaDB and MySQL swap analysis
When we do analysis of MariaDB Database servers we also check the memory (RAM and Swap) available:
# free --kilo --wide total used free shared buffers cache available Mem: 16106252 4329952 703356 199008 307872 10765072 11042748 Swap: 31250428 528684 30721744The values for buffers and especially for cache can be sometimes quite big. In this case they use about 10 GiB. So let us have a look what these things called buffers and cache are, using our valuable RAM... When we check the man pages of free we will find:
# man free ... buffers Memory used by kernel buffers (Buffers in /proc/meminfo) cache Memory used by the page cache and slabs (Cached and Slab in /proc/meminfo) buff/cache Sum of buffers and cacheSo let us check a more fine grained information in /proc/meminfo which is an interface to the kernel data structures:
# cat /proc/meminfo | grep -e ^Cached -e Slab -e Buffers Buffers: 307872 kB Cached: 10155156 kB Slab: 609916 kBSame values! Then let us have a look at the man pages of proc what we can find about these values:
# man proc ... Buffers Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so). Cached In-memory cache for files read from the disk (the page cache). Doesn't include SwapCached. Slab In-kernel data structures cache.So it looks like we have a raw I/O Cache (called Buffer Cache) and a File System I/O Cache (called Page Cache). So how does this work? What is a raw I/O? And is a Files System I/O cached once (Cached) or twice (Cached and Buffers)?
When we dig a bit deeper we can find that prior to Linux Kernels 2.4 the two Caches were distinct. So that was a waste of memory (RAM). It seems like today this is not the case any more [1], [2], [3]. And man pages are a bit out of date or at least not very precise?
Analysing the Linux Page CacheA very good source when it comes to Linux Performance Tuning and Measuring is Brendan Gregg's Website. To measure Linux Page Cache Hit Ratio he provides a tool called cachestat which is part of the perf-tools collection on GitHub.
With cachestat we get a per second statistics of the Buffer Cache and the Page Cache (without Slabs), Cache Hits, Cache Misses, Dirty Buffer Entries in the Cache and a Cache Hit Ratio:
# sudo cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 1419 8 0 99.4% 338 9406 1368 0 0 100.0% 338 9406 1391 0 0 100.0% 338 9406 8558 0 29 100.0% 338 9406 31870 0 163 100.0% 338 9406 1374 0 24 100.0% 338 9406 1388 0 0 100.0% 338 9406 1370 0 0 100.0% 338 9406 1388 0 0 100.0% 338 9406Brendan Gregg also mentions a tool called pcstat (on GitHub) by Al Tobey which gets Page Cache Statistics for Files. Unfortunately I had some problems building it on my Ubuntu 16.04 with Go version 1.6. So I built it on an Ubuntu 18.04 (Go 1.10) and copied it over to to Ubuntu 16.04):
# export GOPATH=/tmp/ # cd $GOPATH # go get golang.org/x/sys/unix # go get github.com/tobert/pcstat/pcstat # bin/pcstat $GOPATH/bin/pcstatThen I tried pcstat out against a MariaDB 10.4 instance. In the output we can see how big the files are in bytes, how many pages of 4 kib this corresponds to, how many of these 4 kib pages are cached and the percentage of pages cached:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 14642 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 13631488 | 3328 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+When we run pcstat over time with the famous watch command we can even see how the Page Cache is heating up:
# watch -d -n 1 'pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* ; free -w' +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 14642 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 2416 | 012.418 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 3165 | 004.829 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 5890 | 008.987 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 1 | 100.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 13631488 | 3328 | 1164 | 034.976 | +------------------------------------------------------+----------------+------------+-----------+---------+ total used free shared buffers cache available Mem: 16106252 4329952 703356 199008 307872 10765072 11042748 Swap: 31250428 528684 30721744An other tool which was discussed on Brendans Website was vmtouch - the Virtual Memory Toucher (on GitHub, Documentation). With vmtouch we can see for example how much of the directory /home/mysql/database/mariadb-104/data (datadir) is currently in cache:
# vmtouch -f /home/mysql/database/mariadb-104/data Files: 503 Directories: 9 Resident Pages: 29356/231060 114M/902M 12.7% Elapsed: 0.009668 secondsOr more fine grained how much of InnoDB System Files are currently in memory:
# vmtouch -f -v /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/ib_buffer_pool [ ] 0/4 /home/mysql/database/mariadb-104/data/ibdata1 [oOooooo ooooo ] 2416/19456 /home/mysql/database/mariadb-104/data/ib_logfile0 [o oOO] 3165/65536 /home/mysql/database/mariadb-104/data/ib_logfile1 [OOOOOOOOOOOOOOOOOOOOOo ] 23192/65536 /home/mysql/database/mariadb-104/data/ibtmp1 [ ] 0/3072 Files: 5 Directories: 0 Resident Pages: 28773/153604 112M/600M 18.7% Elapsed: 0.005499 secondsA further question to answer is: Can I see all files cached in the Page Cache? So it seem like this is not easily possible:
There is no efficient search mechanism for doing the reverse - getting a file name belonging to a data block would require reading all inodes and indirect blocks on the file system. If you need to know about every single file's blocks stored in the page cache, you would need to supply a list of all files on your file system(s) to fincore. But that again is likely to spoil the measurement as a large amount of data would be read traversing the directories and getting all inodes and indirect blocks - putting them into the page cache and evicting the very page cache data you were trying to examine. [5]Also in this article we can read about the Linux File Tools (linux-ftools) by Google. It seems to be a bit more complicated to make them work. So I let it be.
How is the Page Cache related to MariaDBAfter all this technical O/S discussion, how is Linux Page Cache related to your MariaDB Database? Your MariaDB Database caches Data and Indexes as well. For the InnoDB Storage Engine this is the InnoDB Buffer Pool and for the Aria Storage Engine this is the Aria Page Cache Buffer. So if your MariaDB Database caches pages and if your Linux O/S caches pages the probability is high they cache the same data twice and thus waste valuable RAM! Fortunately InnoDB is configurable in a way it does NOT cache InnoDB files in the Page Cache. This is controlled with the InnoDB Server Variable innodb_flush_method.
When we look at InnoDB Files which were opened in a "normal" way (default: innodb_flush_method = fsync) we get the following information about how the files were opened (man 2 open and [6]):
# lsof +fg ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG RW,LG,0x80000 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG RW,LG,0x80000 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG RW,LG,0x80000 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG RW,LG,0x80000 8,1 12582912 9175280 ./ibtmp1The interesting column here is the FILE-FLAG column which indicates (man lsof):
# man lsof FILE-FLAG when g or G has been specified to +f, this field contains the contents of the f_flag[s] member of the kernel file structure and the kernel's per-process open file flags (if available); `G' causes them to be displayed in hexadecimal; `g', as short-hand names; two lists may be displayed with entries separated by commas, the lists separated by a semicolon (`;'); the first list may contain short-hand names for f_flag[s] values from the following table: DIR direct LG large file RW read and write accessThe output is not so clear or completely understandable yet thus we want to have the open file flags in hexadecimal notation:
# lsof +fG ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG 0x88002;0x0 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG 0x88002;0x0 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG 0x88002;0x0 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG 0x88002;0x0 8,1 12582912 9175280 ./ibtmp1The Linux Kernel open file flags can be found here: fcntl.h. I have extracted the most relevant open file flags for our examination:
#define O_RDWR 00000002 (oct, 0x00002) #define O_DIRECT 00040000 (oct, 0x04000) /* direct disk access hint */ #define O_LARGEFILE 00100000 (oct, 0x08000) #define O_CLOEXEC 02000000 (oct, 0x80000) /* set close_on_exec */So we can see that these 4 InnoDB files where opened with O_RDWR (RW), O_LARGE_FILE (LG) and O_CLOEXEC (not available (yet?) in lsof translation output).
Now let us start the MariaDB Database with the server variable set to: innodb_flush_method = O_DIRECT and check how the files where opened:
# lsof +fg ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG RW,DIR,LG,0x80000 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG RW,LG,0x80000 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG RW,LG,0x80000 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG RW,DIR,LG,0x80000 8,1 12582912 9175280 ./ibtmp1 # lsof +fG ./ib* COMMAND PID USER FD TYPE FILE-FLAG DEVICE SIZE/OFF NODE NAME mysqld 2098 mysql 7uW REG 0x8c002;0x0 8,1 79691776 9175185 ./ibdata1 mysqld 2098 mysql 11uW REG 0x88002;0x0 8,1 268435456 9175186 ./ib_logfile0 mysqld 2098 mysql 12uW REG 0x88002;0x0 8,1 268435456 9175187 ./ib_logfile1 mysqld 2098 mysql 13uW REG 0x8c002;0x0 8,1 12582912 9175280 ./ibtmp1We can see a new flag DIR or 0x04000 which means the files where opened with O_DIRECT. But only the InnoDB Temporary Table Tablespace and the InnoDB System Tablespace but not the two InnoDB Transaction Logs.
Translation of hex to oct: 0x8c002 = 02140002.
But what does O_DIRECT mean? Looking at the open(2) man pages we can find:
# man 2 open O_DIRECT (since Linux 2.4.10) Try to minimize cache effects of the I/O to and from this file. In general this will degrade performance, but it is useful in special situations, such as when applications do their own caching. File I/O is done directly to/from user-space buffers. The O_DIRECT flag on its own makes an effort to transfer data synchronously, but does not give the guarantees of the O_SYNC flag that data and necessary meta‐ data are transferred. To guarantee synchronous I/O, O_SYNC must be used in addition to O_DIRECT.So O_DIRECT is exactly what we want in this case: Bypassing the File System Page Cache to not cache the Database blocks twice!
To verify the impact we run pcstat again:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 4 | 100.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 140 | 000.720 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 36844 | 056.219 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 65536 | 100.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 1 | 100.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 67108864 | 16384 | 13400 | 081.787 | +------------------------------------------------------+----------------+------------+-----------+---------+But... part of the InnoDB Tablespace files is still cached! Also checking the total amount of Buffers and Cache shows the same:
# free total used free shared buff/cache available Mem: 16106252 4401788 368200 456716 11336264 10691792 Swap: 31250428 1348440 29901988So restarting the MariaDB database does not purge the Page Cache! Note: This is important to notice because bypassing the Page Cache helps to not wasting valuable RAM but it makes Database restart much more costly because Page Cache does not help/support InnoDB Buffer Pool heating any more!
Then let us clear the Linux Page Cache and check the result:
# echo 1 > /proc/sys/vm/drop_caches # free -w total used free shared buffers cache available Mem: 16106252 4395892 10539864 441708 696 1169800 10882984 Swap: 31250428 1348428 29902000Checking with pcstat shows now that all InnoDB pages are wiped out of the Page Cache:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 67108864 | 16384 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+And after a while running traffic on the test table we can see that InnoDB Transaction Log Files are cached again in the Page Cache but NOT the InnoDB Tablespace files:
# pcstat /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* +------------------------------------------------------+----------------+------------+-----------+---------+ | Name | Size (bytes) | Pages | Cached | Percent | |------------------------------------------------------+----------------+------------+-----------+---------| | /home/mysql/database/mariadb-104/data/ib_buffer_pool | 16020 | 4 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibdata1 | 79691776 | 19456 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ib_logfile0 | 268435456 | 65536 | 3012 | 004.596 | | /home/mysql/database/mariadb-104/data/ib_logfile1 | 268435456 | 65536 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/ibtmp1 | 12582912 | 3072 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.frm | 1097 | 1 | 0 | 000.000 | | /home/mysql/database/mariadb-104/data/test/test.ibd | 71303168 | 17408 | 0 | 000.000 | +------------------------------------------------------+----------------+------------+-----------+---------+Also with vmtouch we can see the difference:
./vmtouch -f -v /home/mysql/database/mariadb-104/data/ib* /home/mysql/database/mariadb-104/data/test/test* /home/mysql/database/mariadb-104/data/ib_buffer_pool [ ] 0/4 /home/mysql/database/mariadb-104/data/ibdata1 [ ] 0/19456 /home/mysql/database/mariadb-104/data/ib_logfile0 [o oOOOo ] 4252/65536 /home/mysql/database/mariadb-104/data/ib_logfile1 [ ] 0/65536 /home/mysql/database/mariadb-104/data/ibtmp1 [ ] 0/3072 /home/mysql/database/mariadb-104/data/test/test.frm [ ] 0/1 /home/mysql/database/mariadb-104/data/test/test.ibd [ ] 0/17408 Files: 7 Directories: 0 Resident Pages: 4252/171013 16M/668M 2.49% Elapsed: 0.003264 secondsAnd also cachestat shows the effect of a flushed Buffer Cache and Page Cache:
# ./cachestat Counting cache functions... Output every 1 seconds. HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 677882 19 740 100.0% 67 1087 679213 10 700 100.0% 67 1087 677236 0 732 100.0% 67 1087 685673 11 932 100.0% 67 1088 677933 5 703 100.0% 67 1088Caution: Depending on your underlying I/O system it makes nevertheless sense to run your MariaDB Database with innodb_flush_method = fsync in certain cases! See also PostgreSQL behaviour.
Note: This information could also be interesting for PostgreSQL DBAs because they do redundant buffering with their shared_buffers (why plural? It is just one!?!) and the O/S Page Cache as well!
What is SlabBeside Buffer Cache and the Page Cache itself we have a third thing in the /proc/meminfo statistics listed as Slabs. So what are Slabs? Slab seems to be a specific memory management (allocation) mechanism. It is used for frequently used objects in the Linux Kernel (buffer heads, inodes, dentries, etc.) [7-15]. So it contains something like other Linux Kernel Buffers and Kernel Caches.
What kind of other Linux Kernel Buffer and Kernel Caches exists can be found with the following command:
# sudo cat /proc/slabinfo slabinfo - version: 2.1 # name : tunables : slabdata nf_conntrack_1 14183 15275 320 25 2 : tunables 0 0 0 : slabdata 611 611 0 ext4_groupinfo_4k 8575 8596 144 28 1 : tunables 0 0 0 : slabdata 307 307 0 i915_gem_vma 523 950 320 25 2 : tunables 0 0 0 : slabdata 38 38 0 UDPv6 120 120 1088 30 8 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCPv6 2668 2668 280 29 2 : tunables 0 0 0 : slabdata 92 92 0 request_sock_TCPv6 24 72 328 24 2 : tunables 0 0 0 : slabdata 3 3 0 TCPv6 68 105 2112 15 8 : tunables 0 0 0 : slabdata 7 7 0 cfq_queue 391 442 232 17 1 : tunables 0 0 0 : slabdata 26 26 0 mqueue_inode_cache 72 72 896 18 4 : tunables 0 0 0 : slabdata 4 4 0 fuse_request 20 40 400 20 2 : tunables 0 0 0 : slabdata 2 2 0 fuse_inode 1 21 768 21 4 : tunables 0 0 0 : slabdata 1 1 0 fat_cache 102 408 40 102 1 : tunables 0 0 0 : slabdata 4 4 0 hugetlbfs_inode_cache 28 84 584 28 4 : tunables 0 0 0 : slabdata 3 3 0 squashfs_inode_cache 25 50 640 25 4 : tunables 0 0 0 : slabdata 2 2 0 jbd2_journal_handle 340 340 48 85 1 : tunables 0 0 0 : slabdata 4 4 0 jbd2_journal_head 2040 2040 120 34 1 : tunables 0 0 0 : slabdata 60 60 0 jbd2_revoke_table_s 260 512 16 256 1 : tunables 0 0 0 : slabdata 2 2 0 jbd2_revoke_record_s1152 1408 32 128 1 : tunables 0 0 0 : slabdata 11 11 0 ext4_inode_cache 208751 210840 1072 30 8 : tunables 0 0 0 : slabdata 7028 7028 0 ext4_free_data 320 448 64 64 1 : tunables 0 0 0 : slabdata 7 7 0 ext4_allocation_cont 128 128 128 32 1 : tunables 0 0 0 : slabdata 4 4 0 ext4_io_end 392 560 72 56 1 : tunables 0 0 0 : slabdata 10 10 0 ext4_extent_status 64412 77928 40 102 1 : tunables 0 0 0 : slabdata 764 764 0 dquot 144 160 256 16 1 : tunables 0 0 0 : slabdata 10 10 0 mbcache 226 292 56 73 1 : tunables 0 0 0 : slabdata 4 4 0 dio 273 350 640 25 4 : tunables 0 0 0 : slabdata 14 14 0 pid_namespace 42 42 2224 14 8 : tunables 0 0 0 : slabdata 3 3 0 ip4-frags 32 64 248 16 1 : tunables 0 0 0 : slabdata 4 4 0 RAW 396 396 896 18 4 : tunables 0 0 0 : slabdata 22 22 0 UDP 68 68 960 17 4 : tunables 0 0 0 : slabdata 4 4 0 tw_sock_TCP 10750 11136 280 29 2 : tunables 0 0 0 : slabdata 384 384 0 request_sock_TCP 96 96 328 24 2 : tunables 0 0 0 : slabdata 4 4 0 TCP 119 136 1920 17 8 : tunables 0 0 0 : slabdata 8 8 0 blkdev_queue 27 48 1336 24 8 : tunables 0 0 0 : slabdata 2 2 0 blkdev_requests 394 506 368 22 2 : tunables 0 0 0 : slabdata 23 23 0 blkdev_ioc 516 546 104 39 1 : tunables 0 0 0 : slabdata 14 14 0 user_namespace 104 104 304 26 2 : tunables 0 0 0 : slabdata 4 4 0 dmaengine-unmap-256 15 15 2112 15 8 : tunables 0 0 0 : slabdata 1 1 0 sock_inode_cache 1707 1950 640 25 4 : tunables 0 0 0 : slabdata 78 78 0 file_lock_cache 665 665 208 19 1 : tunables 0 0 0 : slabdata 35 35 0 net_namespace 40 40 7296 4 8 : tunables 0 0 0 : slabdata 10 10 0 shmem_inode_cache 3315 3432 656 24 4 : tunables 0 0 0 : slabdata 143 143 0 taskstats 96 96 328 24 2 : tunables 0 0 0 : slabdata 4 4 0 proc_inode_cache 6895 7072 624 26 4 : tunables 0 0 0 : slabdata 272 272 0 sigqueue 100 100 160 25 1 : tunables 0 0 0 : slabdata 4 4 0 bdev_cache 29 76 832 19 4 : tunables 0 0 0 : slabdata 4 4 0 kernfs_node_cache 43625 44982 120 34 1 : tunables 0 0 0 : slabdata 1323 1323 0 mnt_cache 518 546 384 21 2 : tunables 0 0 0 : slabdata 26 26 0 inode_cache 17519 17668 568 28 4 : tunables 0 0 0 : slabdata 631 631 0 dentry 424185 439992 192 21 1 : tunables 0 0 0 : slabdata 20952 20952 0 buffer_head 1112865 1112865 104 39 1 : tunables 0 0 0 : slabdata 28535 28535 0 vm_area_struct 53945 55300 200 20 1 : tunables 0 0 0 : slabdata 2765 2765 0 files_cache 260 299 704 23 4 : tunables 0 0 0 : slabdata 13 13 0 signal_cache 509 630 1088 30 8 : tunables 0 0 0 : slabdata 21 21 0 sighand_cache 346 405 2112 15 8 : tunables 0 0 0 : slabdata 27 27 0 task_struct 1189 1269 3584 9 8 : tunables 0 0 0 : slabdata 141 141 0 Acpi-Operand 5703 5824 72 56 1 : tunables 0 0 0 : slabdata 104 104 0 Acpi-Parse 1314 1314 56 73 1 : tunables 0 0 0 : slabdata 18 18 0 Acpi-State 204 204 80 51 1 : tunables 0 0 0 : slabdata 4 4 0 Acpi-Namespace 4077 4182 40 102 1 : tunables 0 0 0 : slabdata 41 41 0 anon_vma 19831 21522 80 51 1 : tunables 0 0 0 : slabdata 422 422 0 numa_policy 170 170 24 170 1 : tunables 0 0 0 : slabdata 1 1 0 radix_tree_node 321937 327740 584 28 4 : tunables 0 0 0 : slabdata 11705 11705 0 trace_event_file 3985 4002 88 46 1 : tunables 0 0 0 : slabdata 87 87 0 ftrace_event_field 86541 88570 48 85 1 : tunables 0 0 0 : slabdata 1042 1042 0 idr_layer_cache 533 555 2096 15 8 : tunables 0 0 0 : slabdata 37 37 0 kmalloc-8192 1246 1246 8192 4 8 : tunables 0 0 0 : slabdata 502 502 0 kmalloc-4096 658 720 4096 8 8 : tunables 0 0 0 : slabdata 90 90 0 kmalloc-2048 1955 2144 2048 16 8 : tunables 0 0 0 : slabdata 134 134 0 kmalloc-1024 44217 44384 1024 16 4 : tunables 0 0 0 : slabdata 2774 2774 0 kmalloc-512 3037 3808 512 16 2 : tunables 0 0 0 : slabdata 238 238 0 kmalloc-256 17465 20384 256 16 1 : tunables 0 0 0 : slabdata 1274 1274 0 kmalloc-192 27708 28665 192 21 1 : tunables 0 0 0 : slabdata 1365 1365 0 kmalloc-128 140581 143744 128 32 1 : tunables 0 0 0 : slabdata 4492 4492 0 kmalloc-96 168044 168378 96 42 1 : tunables 0 0 0 : slabdata 4009 4009 0 kmalloc-64 117533 123264 64 64 1 : tunables 0 0 0 : slabdata 1926 1926 0 kmalloc-32 80425 90368 32 128 1 : tunables 0 0 0 : slabdata 706 706 0 kmalloc-16 9513 11264 16 256 1 : tunables 0 0 0 : slabdata 44 44 0 kmalloc-8 6616 7168 8 512 1 : tunables 0 0 0 : slabdata 14 14 0 kmem_cache_node 320 320 64 64 1 : tunables 0 0 0 : slabdata 5 5 0 kmem_cache 208 208 256 16 1 : tunables 0 0 0 : slabdata 13 13 0If you want to see the most frequently used (hottest) Slabs you can see them top-like with slabtop. If you press c you can sort the Slabs by CACHE_SIZE:
# sudo slabtop Active / Total Objects (% used) : 2249113 / 2280136 (98.6%) Active / Total Slabs (% used) : 70256 / 70256 (100.0%) Active / Total Caches (% used) : 86 / 121 (71.1%) Active / Total Size (% used) : 597547.86K / 605445.30K (98.7%) Minimum / Average / Maximum Object : 0.01K / 0.26K / 18.56K OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME 294308 289889 98% 0.57K 10511 28 168176K radix_tree_node 105030 104435 99% 1.05K 3501 30 112032K ext4_inode_cache 745446 745446 100% 0.10K 19114 39 76456K buffer_head 59984 59909 99% 1.00K 3749 16 59984K ecryptfs_inode_cache 47520 47157 99% 1.00K 2970 16 47520K kmalloc-1024 215166 214987 99% 0.19K 10246 21 40984K dentry 139744 138452 99% 0.12K 4367 32 17468K kmalloc-128 179508 179011 99% 0.09K 4274 42 17096K kmalloc-96 47140 45768 97% 0.20K 2357 20 9428K vm_area_struct 14700 14700 100% 0.55K 525 28 8400K inode_cache ...Literature
- [1] What is the major difference between the buffer cache and the page cache? Why were they separate entities in older kernels? Why were they merged later on? (2012-10-21)
- [2] What is the difference between Buffers and Cached columns in /proc/meminfo output? (2012-10-19)
- [3] The future of the page cache (2017-01-25)
- [4] Linux Page Cache Basics
- [5] Is it possible to list the files that are cached?
- [6] RFC: Clarifying Direct I/O Semantics
- [7] Slab allocation
- [8] Memory management
- [9] Overview of Linux Memory Management Concepts: Slabs
- [10] Memory management - Buddy allocator, Slab allocator
- [11] The SLUB allocator
- [12] Slab Allocator
- [13] What are the different caches reported by slabtop?
- [14] How /proc/slabinfo is not quite telling you what it looks like
- [15] Short users guide for SLUB
Enable General Query Log per Connection in MariaDB
The General Query Log in MariaDB is a Log (file or table) that logs every statement sent from the Application to the MariaDB Database Server from the Connect Command to the Quit Command. A simple example you can find here:
bin/mysqld, Version: 10.4.6-MariaDB-log (MariaDB Server). started with: Tcp port: 3306 Unix socket: /var/run/mysqld/mysql.sock Time Id Command Argument 190709 21:27:30 3872 Connect root@localhost as anonymous on test 3872 Query INSERT INTO test VALUES (NULL, CONCAT('Test data insert from laptop5 on ', @@hostname), CURRENT_TIMESTAMP()) 3872 QuitThe General Query Log can be enabled dynamically with the following command:
SQL> SET GLOBAL general_log = 1;and is written either to a file (defined with general_log_file) or a table (mysql.general_log, defined with log_output=TABLE).
But unfortunately the General Query Log logs all Commands from all Connections. So it can grow quickly and you loose overview over what happens...
With the MariaDB Server System Variables sql_log_off you can configure to NOT log a Connection to the General Query Log. This is the opposite of what we want to have. But with a clever combination of these variables we can manage how to log only specific Connections to the General Query Log.
Preparation for Connection Logging to the General Query LogFirst of all we have to enable the General Query Log but disable it for all Connections so nothing is written to the General Query Log:
SQL> SET GLOBAL sql_log_off = on; -- 1 = no logging SQL> SET SESSION sql_log_off = on; -- also for my current session! SQL> SET GLOBAL general_log = 1; -- 1 = enable general log fileThen we need a place where to specify and control which user from which host should be logged. So we can control General Query Log logging dynamically. For this purpose we create a table called general_log_settings in the sys Schema of the MariaDB sys Schema:
SQL> CREATE SCHEMA IF NOT EXISTS `sys`; SQL> use `sys`; SQL> DROP TABLE IF EXISTS `general_log_settings`; SQL> CREATE TABLE `general_log_settings` ( `HOST` char(60) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '%', `USER` char(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '%', `ENABLED` enum('YES','NO') NOT NULL DEFAULT 'YES' ) ENGINE=Aria DEFAULT CHARSET=utf8; SQL> INSERT INTO `general_log_settings` (`HOST`, `USER`, `ENABLED`) VALUES ('%', '%', 'NO');Now we need a mechanism to enable General Query Log logging for every user. For this we use a Stored Procedure:
SQL> use `sys`; SQL> DROP PROCEDURE IF EXISTS `enable_general_log`; SQL> DELIMITER // SQL> CREATE OR REPLACE DEFINER=`root`@`localhost` PROCEDURE `enable_general_log`() DETERMINISTIC READS SQL DATA SQL SECURITY DEFINER BEGIN DECLARE user VARCHAR(32); DECLARE host VARCHAR(60); DECLARE cnt SMALLINT; SELECT SUBSTRING_INDEX(SESSION_USER(), '@', 1) INTO user; SELECT SUBSTRING_INDEX(SESSION_USER(), '@', -1) INTO host; SELECT COUNT(*) INTO cnt FROM `general_log_settings` WHERE `ENABLED` = 'YES' AND (`HOST` = host OR `HOST` = '%') AND (`USER`= user OR `USER`= '%') ; IF cnt > 0 THEN SET SESSION sql_log_off = off; DO 'Connect'; END IF; END; // SQL> DELIMITER ;Then we have to grant to ALL! users without the SUPER privilege the privilege to execute this procedure. These users can be found with the following query:
SQL> SELECT user, host, super_priv FROM mysql.user WHERE super_priv != 'Y'; +------+-----------+------------+ | User | Host | Super_priv | +------+-----------+------------+ | oli | localhost | N | | app | % | N | +------+-----------+------------+To grant the privilege to execute the procedure you must run the queries as follows:
SQL> GRANT EXECUTE ON PROCEDURE `sys`.`enable_general_log` TO 'oli'@'localhost'; SQL> GRANT EXECUTE ON PROCEDURE `sys`.`enable_general_log` TO 'app'@'%';Caution: If you do NOT grant the execute privilege to all users their new connections will fail as soon as you have enabled the logon trigger!
Then you have to add the users you want to log to the General Query Log to the general_log_settings table:
SQL> INSERT INTO `sys`.`general_log_settings` VALUES ('%', 'app', 'YES'); SQL> SELECT * FROM `sys`.`general_log_settings`; +------+------+---------+ | HOST | USER | ENABLED | +------+------+---------+ | % | % | NO | | % | app | YES | +------+------+---------+You can test the logging to the General Query Log in your Connection with the following command now:
SQL> CALL `sys`.`enable_general_log`();If everything is done correctly you should see some entries in your General Query Log as follows:
190709 23:51:21 6695 Query DO 'Connect' 6695 Query select count(*) from test.test 6695 QuitAnd you should notice that the Connect Command is missing (see above). This is because enabling the General Query Log happens after the Connect. To give an equivalent entry tag in the General Query Log we have added the DO 'Connect' tag in the Stored Procedure.
Up to this point here all actions should not have any impact (neigther operational nor performance wise) to your MariaDB database system yet.
Activating General Query Log per ConnectionNow we want to activate General Query Log logging for every new Connection to your MariaDB database server. This step can have an impact on your MariaDB database system and you should know how an be prepared to revert this step:
SQL> SET GLOBAL init_connect = 'CALL `sys`.`enable_general_log`();';The init_connect Server System Variable specifies a command or a list of commands that will be executed for each new client connection. We also call this a Logon Trigger. These statements are only executed for users without the SUPER privilege. So you cannot track admin accounts with this method.
Further readings- Logging users to the MySQL error log
- Last login of MySQL database users
- MySQL logon and logoff trigger for auditing
Please let us know if you think we should add this tool to the MariaDB sys Schema or not.
Taxonomy upgrade extras: general query logconnectionlogon triggerSQL/PSMEnable General Query Log per Connection in MariaDB
The General Query Log in MariaDB is a Log (file or table) that logs every statement sent from the Application to the MariaDB Database Server from the Connect Command to the Quit Command. A simple example you can find here:
bin/mysqld, Version: 10.4.6-MariaDB-log (MariaDB Server). started with: Tcp port: 3306 Unix socket: /var/run/mysqld/mysql.sock Time Id Command Argument 190709 21:27:30 3872 Connect root@localhost as anonymous on test 3872 Query INSERT INTO test VALUES (NULL, CONCAT('Test data insert from laptop5 on ', @@hostname), CURRENT_TIMESTAMP()) 3872 QuitThe General Query Log can be enabled dynamically with the following command:
SQL> SET GLOBAL general_log = 1;and is written either to a file (defined with general_log_file) or a table (mysql.general_log, defined with log_output=TABLE).
But unfortunately the General Query Log logs all Commands from all Connections. So it can grow quickly and you loose overview over what happens...
With the MariaDB Server System Variables sql_log_off you can configure to NOT log a Connection to the General Query Log. This is the opposite of what we want to have. But with a clever combination of these variables we can manage how to log only specific Connections to the General Query Log.
Preparation for Connection Logging to the General Query LogFirst of all we have to enable the General Query Log but disable it for all Connections so nothing is written to the General Query Log:
SQL> SET GLOBAL sql_log_off = on; -- 1 = no logging SQL> SET SESSION sql_log_off = on; -- also for my current session! SQL> SET GLOBAL general_log = 1; -- 1 = enable general log fileThen we need a place where to specify and control which user from which host should be logged. So we can control General Query Log logging dynamically. For this purpose we create a table called general_log_settings in the sys Schema of the MariaDB sys Schema:
SQL> CREATE SCHEMA IF NOT EXISTS `sys`; SQL> use `sys`; SQL> DROP TABLE IF EXISTS `general_log_settings`; SQL> CREATE TABLE `general_log_settings` ( `HOST` char(60) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '%', `USER` char(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '%', `ENABLED` enum('YES','NO') NOT NULL DEFAULT 'YES' ) ENGINE=Aria DEFAULT CHARSET=utf8; SQL> INSERT INTO `general_log_settings` (`HOST`, `USER`, `ENABLED`) VALUES ('%', '%', 'NO');Now we need a mechanism to enable General Query Log logging for every user. For this we use a Stored Procedure:
SQL> use `sys`; SQL> DROP PROCEDURE IF EXISTS `enable_general_log`; SQL> DELIMITER // SQL> CREATE OR REPLACE DEFINER=`root`@`localhost` PROCEDURE `enable_general_log`() DETERMINISTIC READS SQL DATA SQL SECURITY DEFINER BEGIN DECLARE user VARCHAR(32); DECLARE host VARCHAR(60); DECLARE cnt SMALLINT; SELECT SUBSTRING_INDEX(SESSION_USER(), '@', 1) INTO user; SELECT SUBSTRING_INDEX(SESSION_USER(), '@', -1) INTO host; SELECT COUNT(*) INTO cnt FROM `general_log_settings` WHERE `ENABLED` = 'YES' AND (`HOST` = host OR `HOST` = '%') AND (`USER`= user OR `USER`= '%') ; IF cnt > 0 THEN SET SESSION sql_log_off = off; DO 'Connect'; END IF; END; // SQL> DELIMITER ;Then we have to grant to ALL! users without the SUPER privilege the privilege to execute this procedure. These users can be found with the following query:
SQL> SELECT user, host, super_priv FROM mysql.user WHERE super_priv != 'Y'; +------+-----------+------------+ | User | Host | Super_priv | +------+-----------+------------+ | oli | localhost | N | | app | % | N | +------+-----------+------------+To grant the privilege to execute the procedure you must run the queries as follows:
SQL> GRANT EXECUTE ON PROCEDURE `sys`.`enable_general_log` TO 'oli'@'localhost'; SQL> GRANT EXECUTE ON PROCEDURE `sys`.`enable_general_log` TO 'app'@'%';Caution: If you do NOT grant the execute privilege to all users their new connections will fail as soon as you have enabled the logon trigger!
Then you have to add the users you want to log to the General Query Log to the general_log_settings table:
SQL> INSERT INTO `sys`.`general_log_settings` VALUES ('%', 'app', 'YES'); SQL> SELECT * FROM `sys`.`general_log_settings`; +------+------+---------+ | HOST | USER | ENABLED | +------+------+---------+ | % | % | NO | | % | app | YES | +------+------+---------+You can test the logging to the General Query Log in your Connection with the following command now:
SQL> CALL `sys`.`enable_general_log`();If everything is done correctly you should see some entries in your General Query Log as follows:
190709 23:51:21 6695 Query DO 'Connect' 6695 Query select count(*) from test.test 6695 QuitAnd you should notice that the Connect Command is missing (see above). This is because enabling the General Query Log happens after the Connect. To give an equivalent entry tag in the General Query Log we have added the DO 'Connect' tag in the Stored Procedure.
Up to this point here all actions should not have any impact (neigther operational nor performance wise) to your MariaDB database system yet.
Activating General Query Log per ConnectionNow we want to activate General Query Log logging for every new Connection to your MariaDB database server. This step can have an impact on your MariaDB database system and you should know how an be prepared to revert this step:
SQL> SET GLOBAL init_connect = 'CALL `sys`.`enable_general_log`();';The init_connect Server System Variable specifies a command or a list of commands that will be executed for each new client connection. We also call this a Logon Trigger. These statements are only executed for users without the SUPER privilege. So you cannot track admin accounts with this method.
Further readings- Logging users to the MySQL error log
- Last login of MySQL database users
- MySQL logon and logoff trigger for auditing
Please let us know if you think we should add this tool to the MariaDB sys Schema or not.
Taxonomy upgrade extras: general query logconnectionlogon triggerSQL/PSMEnable General Query Log per Connection in MariaDB
The General Query Log in MariaDB is a Log (file or table) that logs every statement sent from the Application to the MariaDB Database Server from the Connect Command to the Quit Command. A simple example you can find here:
bin/mysqld, Version: 10.4.6-MariaDB-log (MariaDB Server). started with: Tcp port: 3306 Unix socket: /var/run/mysqld/mysql.sock Time Id Command Argument 190709 21:27:30 3872 Connect root@localhost as anonymous on test 3872 Query INSERT INTO test VALUES (NULL, CONCAT('Test data insert from laptop5 on ', @@hostname), CURRENT_TIMESTAMP()) 3872 QuitThe General Query Log can be enabled dynamically with the following command:
SQL> SET GLOBAL general_log = 1;and is written either to a file (defined with general_log_file) or a table (mysql.general_log, defined with log_output=TABLE).
But unfortunately the General Query Log logs all Commands from all Connections. So it can grow quickly and you loose overview over what happens...
With the MariaDB Server System Variables sql_log_off you can configure to NOT log a Connection to the General Query Log. This is the opposite of what we want to have. But with a clever combination of these variables we can manage how to log only specific Connections to the General Query Log.
Preparation for Connection Logging to the General Query LogFirst of all we have to enable the General Query Log but disable it for all Connections so nothing is written to the General Query Log:
SQL> SET GLOBAL sql_log_off = on; -- 1 = no logging SQL> SET SESSION sql_log_off = on; -- also for my current session! SQL> SET GLOBAL general_log = 1; -- 1 = enable general log fileThen we need a place where to specify and control which user from which host should be logged. So we can control General Query Log logging dynamically. For this purpose we create a table called general_log_settings in the sys Schema of the MariaDB sys Schema:
SQL> CREATE SCHEMA IF NOT EXISTS `sys`; SQL> use `sys`; SQL> DROP TABLE IF EXISTS `general_log_settings`; SQL> CREATE TABLE `general_log_settings` ( `HOST` char(60) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '%', `USER` char(32) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT '%', `ENABLED` enum('YES','NO') NOT NULL DEFAULT 'YES' ) ENGINE=Aria DEFAULT CHARSET=utf8; SQL> INSERT INTO `general_log_settings` (`HOST`, `USER`, `ENABLED`) VALUES ('%', '%', 'NO');Now we need a mechanism to enable General Query Log logging for every user. For this we use a Stored Procedure:
SQL> use `sys`; SQL> DROP PROCEDURE IF EXISTS `enable_general_log`; SQL> DELIMITER // SQL> CREATE OR REPLACE DEFINER=`root`@`localhost` PROCEDURE `enable_general_log`() DETERMINISTIC READS SQL DATA SQL SECURITY DEFINER BEGIN DECLARE user VARCHAR(32); DECLARE host VARCHAR(60); DECLARE cnt SMALLINT; SELECT SUBSTRING_INDEX(SESSION_USER(), '@', 1) INTO user; SELECT SUBSTRING_INDEX(SESSION_USER(), '@', -1) INTO host; SELECT COUNT(*) INTO cnt FROM `general_log_settings` WHERE `ENABLED` = 'YES' AND (`HOST` = host OR `HOST` = '%') AND (`USER`= user OR `USER`= '%') ; IF cnt > 0 THEN SET SESSION sql_log_off = off; DO 'Connect'; END IF; END; // SQL> DELIMITER ;Then we have to grant to ALL! users without the SUPER privilege the privilege to execute this procedure. These users can be found with the following query:
SQL> SELECT user, host, super_priv FROM mysql.user WHERE super_priv != 'Y'; +------+-----------+------------+ | User | Host | Super_priv | +------+-----------+------------+ | oli | localhost | N | | app | % | N | +------+-----------+------------+To grant the privilege to execute the procedure you must run the queries as follows:
SQL> GRANT EXECUTE ON PROCEDURE `sys`.`enable_general_log` TO 'oli'@'localhost'; SQL> GRANT EXECUTE ON PROCEDURE `sys`.`enable_general_log` TO 'app'@'%';Caution: If you do NOT grant the execute privilege to all users their new connections will fail as soon as you have enabled the logon trigger!
Then you have to add the users you want to log to the General Query Log to the general_log_settings table:
SQL> INSERT INTO `sys`.`general_log_settings` VALUES ('%', 'app', 'YES'); SQL> SELECT * FROM `sys`.`general_log_settings`; +------+------+---------+ | HOST | USER | ENABLED | +------+------+---------+ | % | % | NO | | % | app | YES | +------+------+---------+You can test the logging to the General Query Log in your Connection with the following command now:
SQL> CALL `sys`.`enable_general_log`();If everything is done correctly you should see some entries in your General Query Log as follows:
190709 23:51:21 6695 Query DO 'Connect' 6695 Query select count(*) from test.test 6695 QuitAnd you should notice that the Connect Command is missing (see above). This is because enabling the General Query Log happens after the Connect. To give an equivalent entry tag in the General Query Log we have added the DO 'Connect' tag in the Stored Procedure.
Up to this point here all actions should not have any impact (neigther operational nor performance wise) to your MariaDB database system yet.
Activating General Query Log per ConnectionNow we want to activate General Query Log logging for every new Connection to your MariaDB database server. This step can have an impact on your MariaDB database system and you should know how an be prepared to revert this step:
SQL> SET GLOBAL init_connect = 'CALL `sys`.`enable_general_log`();';The init_connect Server System Variable specifies a command or a list of commands that will be executed for each new client connection. We also call this a Logon Trigger. These statements are only executed for users without the SUPER privilege. So you cannot track admin accounts with this method.
Further readings- Logging users to the MySQL error log
- Last login of MySQL database users
- MySQL logon and logoff trigger for auditing
Please let us know if you think we should add this tool to the MariaDB sys Schema or not.
Taxonomy upgrade extras: general query logconnectionlogon trigger 
      


