You are here

dbstat for MariaDB after one month of productive use

Table of contents


Review

After we introduced dbstat for MariaDB (and MySQL) a good 5 weeks ago, we naturally also rolled it out on our systems to test the behaviour in daily use (eat your own dog food).

This went quite well until we came up with the idea of activating dbstat on the passive dbstat node on our MariaDB active/passive master/master replication cluster (a similar situation would also occur with a Galera cluster). We realised that the design of dbstat still had potential. After this problem was fixed (v0.0.2 and v0.0.3) and the problem of how to activate events on master AND slave was solved (MDEV-33782: Event is always disabled on slave), everything seemed fine at first glance. Unfortunately, we did not realise that the data also had to be adjusted. As a result, our replication came to a complete stop over the Easter holidays, which then led to another problem when catching up (MDEV-33923: MariaDB parallel replication causes Foreign Key errors).

After this minor incident was also resolved, dbstat has been running flawlessly on our MariaDB master/master replication cluster ever since... The product dbstat is open source (GPLv2) and can be downloaded from GitHub.

One month later

Databases should NOT grow over time but only over the number of {customers, products, etc.} once the desired equilibrium (steady state) is reached. In our dbstat installation, we have set this equilibrium state to 30 days. So it is now time that the size of dbstat stabilises and the database stops growing...

It would also be interesting to understand what practical use dbstat has. That is why we have now set to work and are trying to analyse the results of dbstat.

Here is an overview of the 11 current running database events:

SQL> SELECT db, name, definer, CONCAT(interval_value, ' ', interval_field) AS 'interval'
     , last_executed, ends, status
  FROM mysql.event
 ORDER BY db, name ASC
;
+--------+-------------------------+------------------+----------+---------------------+------+---------+
| db     | name                    | definer          | interval | last_executed       | ends | status  |
+--------+-------------------------+------------------+----------+---------------------+------+---------+
| dbstat | gather_global_status    | dbstat@localhost | 1 MINUTE | 2024-04-24 07:44:14 | NULL | ENABLED |
| dbstat | gather_global_variables | dbstat@localhost | 1 MINUTE | 2024-04-24 07:44:32 | NULL | ENABLED |
| dbstat | gather_metadata_lock    | dbstat@localhost | 1 MINUTE | 2024-04-24 07:44:47 | NULL | ENABLED |
| dbstat | gather_processlist      | dbstat@localhost | 1 MINUTE | 2024-04-24 07:44:28 | NULL | ENABLED |
| dbstat | gather_table_size       | dbstat@localhost | 1 DAY    | 2024-04-24 00:04:00 | NULL | ENABLED |
| dbstat | gather_trx_and_lck      | dbstat@localhost | 1 MINUTE | 2024-04-24 07:44:35 | NULL | ENABLED |
| dbstat | purge_global_status     | dbstat@localhost | 1 MINUTE | 2024-04-24 07:44:08 | NULL | ENABLED |
| dbstat | purge_metadata_lock     | dbstat@localhost | 5 MINUTE | 2024-04-24 07:44:37 | NULL | ENABLED |
| dbstat | purge_processlist       | dbstat@localhost | 1 MINUTE | 2024-04-24 07:43:58 | NULL | ENABLED |
| dbstat | purge_table_size        | dbstat@localhost | 5 MINUTE | 2024-04-24 07:40:04 | NULL | ENABLED |
| dbstat | purge_trx_and_lck       | dbstat@localhost | 1 MINUTE | 2024-04-24 07:44:45 | NULL | ENABLED |
+--------+-------------------------+------------------+----------+---------------------+------+---------+

Size of the tables

Firstly, the growth of dbstat itself is interesting. But of course this evaluation can also be carried out for any other database, table or catalogue (coming in MariaDB 11.7?):

SQL> SET SESSION sql_mode='STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION,only_full_group_by';

SQL> SET @machine_name = @@hostname;

SQL> SELECT `table_schema`, SUBSTR(`ts`, 1, 10) AS date
     , ROUND(SUM(`data_length`)/1024/1024, 1) AS data_mb
     , ROUND(SUM(`index_length`)/1024/1024, 1) AS index_mb
     , ROUND(SUM(`data_free`)/1024/1024, 1) AS free_mb
     , ROUND((SUM(`data_length`) + SUM(`index_length`) + SUM(`data_free`))/1024/1024, 1) AS total_mb
     , ROUND(SUM(`table_rows`)/1000/1000, 1) AS rows_m
  FROM `table_size`
 WHERE `machine_name` = @machine_name
   AND `table_catalog` = 'def'
   AND `table_schema` = 'dbstat'
 GROUP BY `table_catalog`, `table_schema`, `date`
 ORDER BY `table_catalog`, `table_schema`, `date` ASC
;
+--------------+------------+---------+----------+---------+----------+--------+
| table_schema | date       | data_mb | index_mb | free_mb | total_mb | rows_m |
+--------------+------------+---------+----------+---------+----------+--------+
| dbstat       | 2024-03-26 |   762.8 |   1128.6 |    18.0 |   1909.4 |   10.9 |
| dbstat       | 2024-03-27 |   835.8 |   1241.6 |    17.0 |   2094.4 |   11.1 |
| dbstat       | 2024-03-28 |   837.8 |   1241.6 |    14.0 |   2093.4 |   11.8 |
| dbstat       | 2024-03-29 |   960.7 |   1443.6 |    18.0 |   2422.4 |   14.2 |
| dbstat       | 2024-03-30 |   960.7 |   1443.6 |    17.0 |   2421.4 |   15.0 |
| dbstat       | 2024-03-31 |  1057.7 |   1604.6 |    20.0 |   2682.4 |   16.9 |
| dbstat       | 2024-04-01 |  1057.7 |   1602.6 |    21.0 |   2681.4 |   17.6 |
| dbstat       | 2024-04-02 |  1172.7 |   1797.6 |    22.0 |   2992.3 |   17.8 |
| dbstat       | 2024-04-03 |  1442.8 |   2333.7 |    12.0 |   3788.5 |   22.8 |
| dbstat       | 2024-04-04 |  1649.8 |   2723.7 |    13.0 |   4386.5 |   24.4 |
| dbstat       | 2024-04-05 |  1649.8 |   2722.7 |    14.0 |   4386.5 |   26.0 |
| dbstat       | 2024-04-06 |  1821.8 |   3034.8 |    13.0 |   4869.6 |   24.6 |
| dbstat       | 2024-04-07 |  1821.8 |   3034.8 |    14.0 |   4870.6 |   26.2 |
| dbstat       | 2024-04-08 |  1989.9 |   3344.8 |    12.0 |   5346.6 |   29.9 |
| dbstat       | 2024-04-09 |  1990.9 |   3343.8 |    14.0 |   5348.6 |   31.5 |
| dbstat       | 2024-04-10 |  2193.9 |   3712.8 |    13.0 |   5919.7 |   31.6 |
| dbstat       | 2024-04-11 |  2193.9 |   3712.8 |    15.0 |   5921.7 |   31.1 |
| dbstat       | 2024-04-12 |  2405.8 |   4119.1 |    12.0 |   6537.0 |   34.9 |
| dbstat       | 2024-04-13 |  2405.8 |   4119.1 |    14.0 |   6538.9 |   35.7 |
| dbstat       | 2024-04-14 |  2480.8 |   4278.9 |    15.0 |   6774.8 |   36.2 |
| dbstat       | 2024-04-15 |  2560.8 |   4443.7 |    12.0 |   7016.5 |   37.5 |
| dbstat       | 2024-04-16 |  2560.8 |   4443.7 |    12.0 |   7016.5 |   38.2 |
| dbstat       | 2024-04-17 |  2640.8 |   4610.6 |    18.0 |   7269.4 |   38.5 |
| dbstat       | 2024-04-18 |  2640.9 |   4611.6 |    14.0 |   7266.5 |   39.7 |
| dbstat       | 2024-04-19 |  2743.9 |   4826.5 |    14.0 |   7584.3 |   36.9 |
| dbstat       | 2024-04-20 |  2826.9 |   4995.5 |    14.0 |   7836.4 |   38.3 |
| dbstat       | 2024-04-21 |  2830.9 |   4997.4 |    18.0 |   7846.3 |   39.2 |
| dbstat       | 2024-04-22 |  2919.9 |   5177.4 |    14.0 |   8111.3 |   43.2 |
| dbstat       | 2024-04-23 |  2923.0 |   5177.3 |    16.0 |   8116.3 |   44.1 |
| dbstat       | 2024-04-24 |  3020.0 |   5376.3 |    16.0 |   8412.3 |   41.0 |
| dbstat       | 2024-04-25 |  3024.0 |   5377.3 |    17.0 |   8418.3 |   40.9 |
+--------------+------------+---------+----------+---------+----------+--------+

If you take the disc space in the O/S for comparison:

# du -shc *.ibd
8.6G    global_status.ibd
308K    global_variables.ibd
692K    metadata_lock.ibd
97M     processlist.ibd
18M     table_size.ibd
212K    trx_and_lck.ibd
8.7G    total

you can see that the values from the database are approximately correct (5% error)...

Important: The database dbstat reaches a size of approx. 9 Gbyte after approx. one month on a not particularly large database system.

You can also see that the size of the database is only just stabilising:

If you want to know more precisely which tables are responsible for which part of the data volume, you can also zoom in or drill down into the data:

SQL> SELECT `table_name`, SUBSTR(`ts`, 1, 10) AS date
     , ROUND(`data_length`/1024/1024, 1) AS data_mb
     , ROUND(`index_length`/1024/1024, 1) AS index_mb
     , ROUND(`data_free`/1024/1024, 1) AS free_mb
     , ROUND((`data_length` + `index_length` + `data_free`)/1024/1024, 1) AS total_mb
     , ROUND((`data_length` + `index_length` + `data_free`)/1024/1024/8418.26*100, 1) AS pct
     , ROUND(`table_rows`/1000/1000, 1) AS rows_m
  FROM `table_size`
 WHERE `machine_name` = @machine_name
   AND `table_catalog` = 'def'
   AND `table_schema` = 'dbstat'
   AND SUBSTR(`ts`, 1, 10) = CURRENT_DATE()
 ORDER BY rows_m DESC
;
+------------------+------------+---------+----------+---------+----------+------+--------+
| table_name       | date       | data_mb | index_mb | free_mb | total_mb | pct  | rows_m |
+------------------+------------+---------+----------+---------+----------+------+--------+
| global_status    | 2024-04-25 |  2949.9 |   5356.9 |     5.0 |   8311.8 | 98.7 |   40.4 |
| processlist      | 2024-04-25 |    68.2 |     17.1 |     7.0 |     92.2 |  1.1 |    0.4 |
| global_variables | 2024-04-25 |     0.1 |      0.1 |     0.0 |      0.2 |  0.0 |    0.0 |
| metadata_lock    | 2024-04-25 |     0.4 |      0.2 |     0.0 |      0.6 |  0.0 |    0.0 |
| table_size       | 2024-04-25 |     5.4 |      3.1 |     5.0 |     13.5 |  0.2 |    0.0 |
| trx_and_lck      | 2024-04-25 |     0.0 |      0.0 |     0.0 |      0.0 |  0.0 |    0.0 |
+------------------+------------+---------+----------+---------+----------+------+--------+

Note: Please excuse the non-use of the window function!

The only real driver for the data volume of this database is the global_status table. This is to be expected (see: see quantity structure of dbstat).

SQL> SELECT SUBSTR(ts, 1, 10) AS date, table_rows/1000/1000 AS k_rows
     , ROUND(data_length/1024/1024, 1) AS data_mb, ROUND(index_length/1024/1024, 1) AS index_mb, ROUND(data_free/1024/1024, 1) AS free_mb
     , ROUND((data_length + index_length + data_free)/1024/1024, 1) AS total_mb
  FROM table_size
 WHERE `machine_name` = @machine_name
   AND `table_catalog` = 'def'
   AND `table_schema` = 'dbstat'
   AND table_name = 'global_status'
   AND ts > DATE_SUB(CURRENT_DATE, INTERVAL 10 DAY)
;
+------------+-------------+---------+----------+---------+----------+
| date       | k_rows      | data_mb | index_mb | free_mb | total_mb |
+------------+-------------+---------+----------+---------+----------+
| 2024-04-15 | 37.13876300 |  2512.9 |   4433.0 |     4.0 |   6949.9 |
| 2024-04-16 | 37.94217200 |  2512.9 |   4433.0 |     4.0 |   6949.9 | + 0M
| 2024-04-17 | 38.19867500 |  2592.9 |   4600.0 |     7.0 |   7199.9 | + 250M
| 2024-04-18 | 39.39108500 |  2592.9 |   4600.0 |     5.0 |   7197.9 | - 2M
| 2024-04-19 | 36.52539600 |  2691.9 |   4813.0 |     5.0 |   7509.8 | + 312M
| 2024-04-20 | 37.99073500 |  2770.9 |   4980.9 |     6.0 |   7757.8 | + 248M
| 2024-04-21 | 38.79420200 |  2770.9 |   4980.9 |     7.0 |   7758.8 | + 1M
| 2024-04-22 | 42.82606200 |  2855.9 |   5158.9 |     6.0 |   8020.8 | + 263M
| 2024-04-23 | 43.62953000 |  2855.9 |   5158.9 |     7.0 |   8021.8 | + 1M
| 2024-04-24 | 40.54342200 |  2949.9 |   5356.9 |     7.0 |   8313.8 | + 292M
| 2024-04-25 | 40.43067700 |  2949.9 |   5356.9 |     5.0 |   8311.8 | - 2M
+------------+-------------+---------+----------+---------+----------+

Note: Sorry, I should really familiarise myself with the window functions...

If we analyse the data a bit more closely, we see that the number of rows has slowly stabilised over the last 4 days (note: table_rows is calculated (from the number of blocks and the average row length?) and is not an exact value), but the "amount of data" has continued to increase until yesterday, which is probably due to the fragmentation of the tables and indexes...

The primary key of the global_status table was chosen to optimise the localisation of the data:

PRIMARY KEY (`machine_name`,`variable_name`,`ts`),

The situation should calm down in the next few days. In 2 to 4 weeks we will have to check the situation again.

Summary: I would say that this feature fulfils the requirements and helps to understand the data growth.

List of processes

Since we do not have any serious load issues in our databases, this feature is not that interesting in our case. For example, we can see what a (persistent) connection has done:

SQL> SELECT connection_id, ts, command, time, state, SUBSTR(REGEXP_REPLACE(REPLACE(query, "\n", ' '), '\ +', ' '), 1, 64)
  FROM processlist
 WHERE machine_name = @machine_name
   AND command != 'Sleep'
   AND connection_id = @connection_id
   AND state NOT IN (
       'Waiting for next activation'
     , 'Master has sent all binlog to slave; waiting for more updates'
     , 'Waiting for master to send event'
     , 'Slave has read all relay log; waiting for more updates'
       )
 ORDER BY ts ASC
;
+---------------+---------------------+---------+-------+----------------+----------------------------------------------------------------------+
| connection_id | ts                  | command | time  | state          | SUBSTR(REGEXP_REPLACE(REPLACE(query, "\n", ' '), '\ +', ' '), 1, 64) |
+---------------+---------------------+---------+-------+----------------+----------------------------------------------------------------------+
|            18 | 2024-04-17 12:30:28 | Query   | 0.029 | Sending data   | select pp.item_preprocid,pp.itemid,pp.type,pp.params,pp.step,h.h     |
|            18 | 2024-04-17 14:58:28 | Query   | 0.009 | Writing to net | select itemtagid,itemid,tag,value from item_tag                      |
|            18 | 2024-04-18 06:24:28 | Query   | 0.003 | Sending data   | select pp.item_preprocid,pp.itemid,pp.type,pp.params,pp.step,h.h     |
|            18 | 2024-04-18 11:34:28 | Query   | 0.030 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-18 16:39:28 | Query   | 0.006 | Sending data   | select itemid,functionid,name,parameter,triggerid from functions     |
|            18 | 2024-04-18 19:12:28 | Query   | 0.014 | Sending data   | select triggerid,description,expression,error,priority,type,valu     |
|            18 | 2024-04-18 21:49:28 | Query   | 0.004 | Writing to net | select i.itemid,i.hostid,i.templateid from items i inner join ho     |
|            18 | 2024-04-19 00:21:28 | Query   | 0.032 | Sending data   | select pp.item_preprocid,pp.itemid,pp.type,pp.params,pp.step,h.h     |
|            18 | 2024-04-19 02:59:28 | Query   | 0.017 | Writing to net | select triggerid,description,expression,error,priority,type,valu     |
|            18 | 2024-04-19 05:39:28 | Query   | 0.052 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-19 08:19:28 | Query   | 0.000 | Statistics     | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-19 13:26:28 | Query   | 0.075 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-19 15:57:28 | Query   | 0.027 | Writing to net | select itemtagid,itemid,tag,value from item_tag                      |
|            18 | 2024-04-19 18:33:28 | Query   | 0.010 | Sending data   | select itemtagid,itemid,tag,value from item_tag                      |
|            18 | 2024-04-19 21:10:28 | Query   | 0.008 | Sending data   | select pp.item_preprocid,pp.itemid,pp.type,pp.params,pp.step,h.h     |
|            18 | 2024-04-19 23:50:28 | Query   | 0.067 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-20 02:28:28 | Query   | 0.008 | Sending data   | select triggerid,description,expression,error,priority,type,valu     |
|            18 | 2024-04-20 05:08:28 | Query   | 0.052 | Writing to net | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-20 07:44:28 | Query   | 0.123 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-20 10:21:28 | Query   | 0.144 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-20 12:55:28 | Query   | 0.004 | Sending data   | select i.itemid,i.hostid,i.templateid from items i where i.flags     |
|            18 | 2024-04-20 15:35:28 | Query   | 0.092 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-20 18:12:28 | Query   | 0.041 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-20 20:47:28 | Query   | 0.113 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-20 23:25:28 | Query   | 0.101 | Writing to net | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-21 02:03:28 | Query   | 0.120 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-21 04:42:28 | Query   | 0.099 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-21 07:18:28 | Query   | 0.015 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-21 12:32:28 | Query   | 0.018 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-21 15:06:28 | Query   | 0.091 | Writing to net | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-21 20:16:28 | Query   | 0.012 | Sending data   | select itemtagid,itemid,tag,value from item_tag                      |
|            18 | 2024-04-22 06:44:28 | Query   | 0.161 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-22 09:21:28 | Query   | 0.000 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-22 11:54:28 | Query   | 0.020 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-22 14:23:28 | Query   | 0.067 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-22 16:59:28 | Query   | 0.128 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-22 22:05:28 | Query   | 0.078 | Writing to net | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-23 00:38:28 | Query   | 0.084 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-23 03:15:28 | Query   | 0.098 | Writing to net | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-23 05:52:28 | Query   | 0.000 | starting       | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-23 08:27:28 | Query   | 0.011 | Sending data   | select pp.item_preprocid,pp.itemid,pp.type,pp.params,pp.step,h.h     |
|            18 | 2024-04-23 10:58:28 | Query   | 0.000 | Sending data   | select i.itemid,i.hostid,i.templateid from items i inner join ho     |
|            18 | 2024-04-23 13:31:28 | Query   | 0.110 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-23 16:01:28 | Query   | 0.023 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-23 18:35:28 | Query   | 0.095 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-23 21:10:28 | Query   | 0.017 | Writing to net | select itemtagid,itemid,tag,value from item_tag                      |
|            18 | 2024-04-23 23:44:28 | Query   | 0.014 | Sending data   | select triggerid,description,expression,error,priority,type,valu     |
|            18 | 2024-04-24 02:21:28 | Query   | 0.024 | Sending data   | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
|            18 | 2024-04-24 07:33:28 | Query   | 0.046 | Writing to net | select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.s     |
+---------------+---------------------+---------+-------+----------------+----------------------------------------------------------------------+

It is also important that we only see the entries in this report when the thread has done SOMETHING (we have hidden state Sleep). It is also interesting that we do not see this (persistent) connection before 17 April, but at the moment I have NO explanation for this from an operational point of view (restart etc.). Probably the application (Zabbix) has to explain this.

Global variables

The information in the global_variables table is also interesting:

SQL> SELECT variable_name, ts, variable_value
  FROM global_variables
 WHERE machine_name = @machine_name
   AND variable_name IN (
   SELECT variable_name
     FROM global_variables
    WHERE machine_name = @machine_name
    GROUP BY variable_name
   HAVING COUNT(*) > 1
 )
 ORDER BY ts, variable_name
;
+---------------------------+---------------------+----------------+
| variable_name             | ts                  | variable_value |
+---------------------------+---------------------+----------------+
| auto_increment_increment  | 2024-03-09 22:10:42 | 1              |
| auto_increment_offset     | 2024-03-09 22:10:42 | 1              |
| read_only                 | 2024-03-09 22:10:42 | OFF            |
| slave_parallel_max_queued | 2024-03-09 22:10:42 | 131072         |
| slave_parallel_threads    | 2024-03-09 22:10:42 | 0              |
| slave_parallel_workers    | 2024-03-09 22:10:42 | 0              |
| slave_skip_errors         | 2024-03-09 22:10:42 | OFF            |
| system_time_zone          | 2024-03-09 22:10:42 | CET            |

| read_only                 | 2024-03-27 09:42:50 | ON             |
| slave_skip_errors         | 2024-03-27 12:33:13 | 1032           |
| slave_skip_errors         | 2024-03-27 12:35:13 | OFF            |
| slave_skip_errors         | 2024-03-27 12:42:13 | 1032           |
| slave_skip_errors         | 2024-03-27 12:50:13 | OFF            |

| slave_parallel_threads    | 2024-04-02 10:17:32 | 8              |
| slave_parallel_workers    | 2024-04-02 10:17:32 | 8              |
| slave_parallel_max_queued | 2024-04-02 10:22:32 | 1048576        |
| slave_parallel_max_queued | 2024-04-02 10:23:32 | 4194304        |
| slave_parallel_max_queued | 2024-04-02 10:25:32 | 16777216       |
| slave_parallel_threads    | 2024-04-02 10:25:32 | 16             |
| slave_parallel_workers    | 2024-04-02 10:25:32 | 16             |
| slave_parallel_threads    | 2024-04-02 10:28:32 | 32             |
| slave_parallel_workers    | 2024-04-02 10:28:32 | 32             |
| auto_increment_increment  | 2024-04-02 10:39:32 | 2              |
| auto_increment_offset     | 2024-04-02 10:39:32 | 2              |
| slave_parallel_max_queued | 2024-04-02 10:57:32 | 131072         |
| slave_parallel_threads    | 2024-04-02 10:57:32 | 0              |
| slave_parallel_workers    | 2024-04-02 10:57:32 | 0              |
| system_time_zone          | 2024-04-02 10:57:32 | CEST           |

| slave_parallel_max_queued | 2024-04-16 14:06:32 | 16777216       |
| slave_parallel_threads    | 2024-04-16 14:06:32 | 8              |
| slave_parallel_workers    | 2024-04-16 14:06:32 | 8              |
| slave_parallel_max_queued | 2024-04-16 14:26:32 | 131072         |
| slave_parallel_threads    | 2024-04-16 14:26:32 | 0              |
| slave_parallel_workers    | 2024-04-16 14:26:32 | 0              |

| slave_parallel_max_queued | 2024-04-17 09:03:32 | 16777216       |
| slave_parallel_threads    | 2024-04-17 09:03:32 | 16             |
| slave_parallel_workers    | 2024-04-17 09:03:32 | 16             |

| slave_parallel_max_queued | 2024-04-24 08:26:32 | 131072         |
| slave_parallel_threads    | 2024-04-24 08:26:32 | 0              |
| slave_parallel_workers    | 2024-04-24 08:26:32 | 0              |
| read_only                 | 2024-04-24 08:42:32 | OFF            |
+---------------------------+---------------------+----------------+

Here you can see very precisely when and what was done to the database:

  • On 9 March, dbstat was installed for the first time.
  • Then on 27 March (before Easter) there seem to have been problems with the replication (here the new version of dbstat was installed, which allows simultaneous collection on master and slave. This led to replication errors, which were partially rectified).
  • On 2 April (after Easter) we then tried to catch up with parallel replication. You can also see that AUTO_INCREMENT_OFFSET and AUTO_INCREMENT_INCREMENT have been changed. Here we have corrected an error in the database configuration...
  • You can also see that the time zone has changed from CET to CEST (summer time!) Why only on 2 April is not entirely clear to me. (Maybe because it came via replication?)
  • Then on 16 and 17 April we tried to reproduce a "bug" in the parallel replication. Apparently we did not reset the value. Because only after the restart on 24 April (usual fortnightly maintenance window) was the value reset again.
  • On 24 April, you can also see that the database has now assumed the role of the active master (read_only = off). A gracefull switchover has therefore taken place...

Conclusion: A very useful feature to see when something has been changed. Although I have followed all these operations closely, I am still amazed at the informative value of this feature. I would like to see it installed in all databases...

Metadata Lock and InnoDB Transaction Lock

Unfortunately, due to the low traffic on our databases, we do not see too much exciting stuff here.

Here are the metadata locks that we have "caught" on the master in the last 24 hours:

+---------------+---------------------+--------+-----------------+--------------+---------------+-----------------+----------------------------------------------------------------------+
| connection_id | ts                  | user   | host            | table_schema | table_name    | state           | SUBSTR(REGEXP_REPLACE(REPLACE(query, "\n", ' '), '\ +', ' '), 1, 64) |
+---------------+---------------------+--------+-----------------+--------------+---------------+-----------------+----------------------------------------------------------------------+
|            18 | 2024-04-23 14:16:47 | zabbix | localhost:51252 | zabbix       | triggers      | Writing to net  | select triggerid,description,expression,error,priority,type,valu     |
|       1325025 | 2024-04-23 16:01:47 | zabbix | localhost:50150 |              |               | init for update | delete from history_text where itemid=85477 and clock<1678167661     |
|       1325025 | 2024-04-23 16:01:47 | zabbix | localhost:50150 | zabbix       | history_text  | init for update | delete from history_text where itemid=85477 and clock<1678167661     |
|       1365229 | 2024-04-24 02:13:47 | root   | localhost:38096 | dbstat       | global_status | Writing to net  | SELECT /*!40001 SQL_NO_CACHE */ `machine_name`, `variable_name`,     |
|            18 | 2024-04-24 03:10:47 | zabbix | localhost:51252 | zabbix       | item_tag      | Writing to net  | select itemtagid,itemid,tag,value from item_tag                      |
|       1368524 | 2024-04-24 04:41:47 | zabbix | localhost:38112 |              |               |                 | NULL                                                                 |
|       1368524 | 2024-04-24 04:41:47 | zabbix | localhost:38112 | zabbix       | history_uint  |                 | NULL                                                                 |
|            18 | 2024-04-24 05:46:47 | zabbix | localhost:51252 | zabbix       | item_tag      | Sending data    | select itemtagid,itemid,tag,value from item_tag                      |
+---------------+---------------------+--------+-----------------+--------------+---------------+-----------------+----------------------------------------------------------------------+

We have not found any InnoDB locks in the last 24 hours for 7 days in a row.

It would be interesting to see a system where more is happening...

Global status

If a normal database monitoring such as the FromDual Performance Monitor for MariaDB and MySQL (fpmmm) with Zabbix is used, this feature is not absolutely necessary. However, most of our customers do not have any useful monitoring in use. Therefore, this feature would be very useful for post-mortem analyses...

For example InnoDB Row Lock Waits, minute-granular over the last 30 days (analogue to sar from sysstat):

Here you can see that the database was restarted on 10 April between 08:37 and 08:41. You could also find this out in another way, but unfortunately this is often not possible for various reasons (error log rotated away, etc.).

The trend break around 2 April is also interesting. At this time we were experimenting with parallel replication. It should not have been a failover (see GLOBAL VARIABLES, above).

Although parallel replication was later deactivated again, there were more locks. A similar situation around the 16th/17th of April, here too we played around with parallel replication, which seems to have had an effect on the locking behaviour.

Even with this feature, there are many ways to analyse the database. Unfortunately, our database is relatively boring: Mainly monotonous traffic (which is plentiful due to the monitoring) and very little exceptional traffic.

Remark: This text has been translated with the assistance of DeepL.