You are here


Beware of large MySQL max_sort_length parameter

Shinguz - Wed, 2016-08-24 23:40

Today we had a very interesting phenomena at a customer. He complained that MySQL always get some errors of the following type:

[ERROR] mysqld: Sort aborted: Error writing file '/tmp/MYGbBrpA' (Errcode: 28 - No space left on device)

After a first investigation we found that df -h /tmp shows from time to time a full disk but we could not see any file with ls -la /tmp/MY*.

After some more investigation we found even the query from the Slow Query Log which was producing the same problem. It looked similar to this query:

SELECT * FROM test ORDER BY field5, field4, field3, field2, field1;

Now we were capable to simulate the problem at will with the following table:

CREATE TABLE `test` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `data` varchar(64) DEFAULT NULL, `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, `field1` varchar(16) DEFAULT NULL, `field2` varchar(16) DEFAULT NULL, `field3` varchar(255) DEFAULT NULL, `field4` varchar(255) DEFAULT NULL, `field5` varchar(32) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=8912746 DEFAULT CHARSET=utf8 ;

An we have seen the query in SHOW PROCESSLIST:

| Query | 26 | Creating sort index | select * from test order by field5, field4, field3, field2, field1 |

But we were still not capable to see who or better how the hell mysqld is filling our disk!

I remembered further that I have seen some strange settings in the my.cnf before when we did the review of the database configuration. But I ignored them somehow.

[mysqld] max_sort_length = 8M sort_buffer_size = 20M

Now I remembered again these settings. We changed max_sort_length back to default 1k and suddenly our space problems disappeared!

We played a bit around with different values of max_sort_length and got the following execution times for our query:

max_sort_lengthexecution time [s]comment 64 8.8 s128 8.2 s256 9.3 s512 11.8 s 1k 14.9 s 2k 20.0 s 8k129.0 s 8M 75.0 sdisk full (50 G)

We set the values of max_sort_length back to the defaults. Our problems disappeared and we got working and much faster SELECT queries.

Do not needlessly change default values of MySQL without proving the impact. It can become worse than before!!!

The default value of max_sort_length is a good compromise between performance and an appropriate sort length.


What I really did not like on this solution was, that I did not understand the way the problem occurred. So I did some more investigation in this. We were discussing forth and back if this could be because of XFS, because of sparse files or some kind of memory mapped files (see also man mmap).

At the end I had the idea to look at the lsof command during my running query:

mysql> SELECT * FROM test ORDER BY field5, field4, field3, field2, field1; ERROR 3 (HY000): Error writing file '/tmp/MYBuWcXP' (Errcode: 28 - No space left on device) shell> lsof -p 14733 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME mysqld 14733 mysql 32u REG 8,18 9705619456 30147474 /tmp/MYck8vf4 (deleted) mysqld 14733 mysql 49u REG 8,18 749797376 30147596 /tmp/MYBuWcXP (deleted)

So it looks like that there were some deleted files which were growing!

Further information from the IRC channel led me to the libc temporary files (see also man 3 tmpfile).

And some hints from MadMerlin|work pointed me to:

shell> ls /proc//fd

Where you can also see those temporary files.

Thanks to MadMerlin|work for the hints!

Taxonomy upgrade extras: sortfileorder by

FromDual Performance Monitor for MySQL and MariaDB 0.10.6 has been released

Shinguz - Wed, 2016-08-03 19:40

FromDual has the pleasure to announce the release of the new version 0.10.6 of its popular Database Performance Monitor for MySQL, MariaDB, Galera Cluster and Percona Server fpmmm.

You can download fpmmm from here.

In the inconceivable case that you find a bug in fpmmm please report it to our Bug-tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to

This release contains various bug fixes and improvements. The previous release had some major bugs so we recommend to upgrade...

Changes in fpmmm v0.10.6 fpmmm agent
  • Do not connect to server bug fixed.
  • Special case when lock file was removed when it was read is fixed.
  • Added ORDER BY to all GROUP BY to be compliant for the future.
  • Zabbix 3.0 templates added.
  • MaaS: Function curl_file_create implemented for php < 5.5
  • MaaS: Debug message fixed.
  • Maas: Curl upload fixed.
  • MaaS: InnoDB: Deadlock and Foreign Key errors are only escaped with xxx when used in MaaS. Otherwise they are sent normally. Foreign Key errors with MaaS is now also escaped with xxx.
Process module
  • Wrong substitution in process vm calculation fixed.
Galera module
  • Template: Galera items changed from normal to delta.
InnoDB module
  • Template: Fixed InnoDB template to work with Zabbix v3.0.
  • Template: InnoDB locking graph improved.

For subscriptions of commercial use of fpmmm please get in contact with us.

Taxonomy upgrade extras: mysqlperformancemonitormonitoringfpmmmmaasperformance monitormpmrelease

MySQL Environment MyEnv 1.3.1 has been released

Shinguz - Wed, 2016-08-03 08:27

FromDual has the pleasure to announce the release of the new version 1.3.1 of its popular MySQL, Galera Cluster, MariaDB and Percona Server multi-instance environment MyEnv.

The new MyEnv can be downloaded here.

In the inconceivable case that you find a bug in the MyEnv please report it to our bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to

Upgrade from 1.1.x or higher to 1.3.1 # cd ${HOME}/product # tar xf /download/myenv-1.3.1.tar.gz # rm -f myenv # ln -s myenv-1.3.1 myenv

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

cd ${HOME}/product/myenv ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Changes in MyEnv 1.3.1 MyEnv
  • Bash function bootstrap added.
  • Galera options --bootstrap --new-cluster and start method bootstrap was implemented. Typo fixed.
  • New 5.7 variables added and 5.6 variables to avoid nasty warnings in the error log added to the my.cnf template. Further new file system structure was prepared.
  • MySQL 5.7 variables for error log behaviour added.
  • Comment for log_bin added to my.cnf template.
  • ulimit problem fixed rudely in MyEnv init script.
  • wsrep_provider for CentOS added in my.cnf template.
  • Cgroup template improved.
  • Cgroup how-to improved and configuration example added.
MyEnv Installer
  • default as instance name set to blacklist.
  • Typo fixed in help of installMyEnv.
MyEnv Utilities
  • Test table prepared for explicit_defaults_for_timestamp configuration.
  • now has optional parameters for user, host etc.

For subscriptions of commercial use of MyEnv please get in contact with us.

Taxonomy upgrade extras: myenvoperationMySQL Operationsmulti instanceconsolidationtestingupgradereleasecloudcgroupscontainermysqld_multi

Multiple MySQL Instances on a Single Machine

Jörg Brühe - Thu, 2016-07-28 13:11

Typically, on a single machine (be it a physical or a virtual one) only a single MySQL instance (process) is running. This is perfectly ok for all those situations where a single instance is sufficient, like for storing small amounts of data (RedHat using MySQL for postfix, KDE using it for akonadi, ...), as well as those where a dedicated machine per MySQL instance is appropriate (high CPU load, memory fully loaded, availability requirements).

But there are also those users who want to (or would like to) have multiple instances which would still fit into a single machine. Even among them, a single instance per machine is typical. For this, there are good reasons:

  • MySQL comes with defaults for files (config file, error log, ...) and directories (data directory, binlogs, ...) which would cause conflicts between multiple instances (unless they are changed).
  • The scripts coming with MySQL, especially the automated start/stop with machine reboot/shutdown, are written for a single instance only.
  • Last but not least: The instructions. those in the manual as well as the many "How to setup ..." in the Web, cover a single instance only.
Because of this, users often either restrict themselves to a single instance, or they set up several virtual machines (or containers) holding a single instance each.

But that overhead (both in software and in labour) isn't necessary: There is a way out, supporting easy handling of multiple MySQL instances on a single machine directly, without containers or VMs. This is our "MyEnv" package, available for download here, licensed under the GPL.

What Does MyEnv Do?

MyEnv cares about two aspects which in combination provide easy use of multiple instances:

  • It helps to configure multiple MySQL instances without overlap, so they won't collide with each other.
  • It maintains separate environments, each to manage and access one specific instance.

Each environment contains the path to the binaries (so the instances can use different versions), the config file, the socket and port number, data directory, error log etc. The environment is specified by a name (choose a meaningful one!), and it is switched by using its name as a shell command. (MyEnv creates an alias for that.)

Administrative commands like "start" and "stop" will manage the instance of the current environment. MySQL client programs like "mysql" or "mysqldump" will access that instance.

MyEnv supports the autostart of instances at machine boot, configurable per instance - something which is impossible using only the tools of a MySQL distribution.

Of course, an instance started via MyEnv (either manually or via autostart) can be accessed by any other client program on the machine, or from any other machine in the network - all that is needed is the specification of the proper socket or network port.

Handling Multiple Binaries

In the previous section, I wrote the instances can use different versions. This is done by installing those different versions into different locations, controlled by MyEnv, and the directory with the binaries will become a component of the user's PATH variable, switched when the environment is switched. Obviously, this works only if the destination path of the installation can be controlled, which implies the tar.gz format - RPM or DEB packages have fixed destinations, so different versions would overwrite each other on installation.
But that is no severe limitation, as all MySQL versions are available in tar.gz format, and these are sufficiently generic to run on any reasonably current Linux distribution.
(Yes, that is something I forgot to mention: MyEnv is developed and tested on Linux only. You are welcome to try it on any other Unix platform, and we will gladly listen to your experiences and accept your contributions, but we do not actively pursue non-Linux platforms.)

This support for multiple versions makes MyEnv the perfect tool for application development: Using a single machine, you can let your application access the MySQL servers of different versions and can verify it works the way you want it to.

Similar, you can install binaries of MySQL (Oracle), Percona Server, or MariaDB, and verify your application is portable across them.

And the adventurous among us can use different binaries, from the same or different vendor(s), to test whether replication works across versions and/or vendors, all without the effort of installing a separate VM or container setup.

MyEnv and Galera Cluster Till now, I mentioned MySQL (and its variants), and many readers may associate that term with a traditional single instance. So I better state explicitly: Of course, such an instance can take part in replication, in any role: master, slave, or intermediate in multi-level replication.

But besides single instances and replication, there exists a different MySQL setup: Nodes combined to form a Galera Cluster. And again, let me state explicitly: Again of course, an instance controlled by MyEnv can be a node participating in a Galera Cluster.

Those readers who have experiance with Galera Cluster (or who have just read the documentation or blogs about it) know that to start the first node of a cluster a special command is needed, called "bootstrap" - a simple "start" will not do. So this command was also added to MyEnv, it can manage a Galera Cluster completely by its builtin commands.

RPM and DEB packages Above, I wrote that to install different versions you cannot use RPM or DEB packages. I did not write that MyEnv cannot use RPM or DEB - in fact it can, the absolute path names in these formats just limit this to a single version.

So you can install the RPM or DEB of your choice, disable its autostart, and then call MyEnv to create multiple instances. You will give them different names, specify different sockets and ports and use different data directories, but for all of them you will specify the same path "/usr". As a result, MyEnv will simply manage multiple instances of the same version.

You can configure them differently to test the consequences, or you can set them up to replicate among them - master and slave can run on the same machine. Of course, this will not give you the "high availability" or the "scale-out" benefits which are the typical reasons to use replication, but I trust this wasn't your purpose for this test.

Using binaries that include Galera, and configuring them properly, you can even run all nodes of a Galera Cluster as separate instances on a single machine. That may be considered to stretch the concept, because a single machine is a very different setup than separate machines, but it gives an idea of the possibilities opened by MyEnv.

Typical Use of MyEnv

Admitted: The claim to know what MyEnv is used for by others would be arrogant, and I do not uphold it. Nonetheless, we do know some use cases of people who downloaded MyEnv, and they are close to our internal use of the tool.

MyEnv allows to have multiple MySQL instances on the same machine, to manage them separately, and to access them using MySQL client programs or other applications. So it is the perfect setup for all those who need to access different versions: developers and software testers.

When we encounter some unexpected behaviour, we often want to know whether it is specific to some version or series, or is widespread. To check that, MyEnv is the perfect infrastructure: You write a test case to provoke the effect and run it on several versions, then you note the result and can tell whether it exists "since ages" or is new, whether it still occurs in current versions or will change with an upgrade - exactly the information you need to decide about an upgrade or write a bug report.

Database administrators and application developers use it to avoid nasty surprises with new versions, so their production instances will not suffer from unexpected functional changes. Setting up a test environment, especially for multiple versions, becomes cheap, much less ressources are needed. You don't need to copy your test code onto different machines, and you are sure you are running identical tests, so that you won't compare apples and oranges.


If all that made you curious, I invite you to look into the instructions, to download MyEnv and to try it. And of course, your feedback and reports are very welcome.

Take care!

Appendix: Where to Meet Us

All FromDual colleagues will deliver talks at the FrOSCon in St. Augustin near Cologne, Germany, on August 20 and 21, so that is a good opportunity for personal contact. As several talks will be delivered in English, the conference also meets the needs of attendants who cannot follow a German talk - check the programme. Froscon is a famous event, very interesting talks are promised, and I look forward to enjoy the community atmosphere there.

I will deliver a talk at the "Open Source Backup Conference" in Cologne, Germany, on September 26 and 27; this conference is held in English.

I do not have feedback yet about Percona Live in Amsterdam, I may attend that also.

And finally, FromDual will again have a booth and deliver talks at the DOAG conference on November 15 - 18 in Nuremberg, Germany. This is "the" event for Oracle users (at least in Germany, maybe in all Europe), and it has a separate track dealing with MySQL only.

We will be delighted to meet you face to face!

Temporary tables and MySQL STATUS information

Shinguz - Fri, 2016-07-08 18:42

When analysing MySQL configuration and status information at customers it is always interesting to see how the applications behave. This can partially be seen by the output of the SHOW GLOBAL STATUS command. See also Reading MySQL fingerprints.

Today we wanted to know where the high Com_create_table and the twice as high Com_drop_table is coming from. One suspect was TEMPORARY TABLES. But are real temporary tables counted as Com_create_table and Com_drop_table at all? This is what we want to find out today. The tested MySQL version is 5.7.11.

Caution: Different MySQL or MariaDB versions might behave differently!

Session 1 Global Session 2 CREATE TABLE t1 (id INT);
Query OK, 0 rows affected     Com_create_table +1
Opened_table_definitions +1 Com_create_table +1
Opened_table_definitions +1    CREATE TABLE t1 (id INT);
ERROR 1050 (42S01): Table 't1' already exists     Com_create_table +1
Open_table_definitions +1
Open_tables +1
Opened_table_definitions +1
Opened_tables +1 Com_create_table + 1
Open_table_definitions +1
Open_tables +1
Opened_table_definitions +1
Opened_tables +1    CREATE TABLE t1 (id INT);
ERROR 1050 (42S01): Table 't1' already exists     Com_create_table + 1 Com_create_table + 1    DROP TABLE t1;
Query OK, 0 rows affected     Com_drop_table +1
Open_table_definitions -1
Open_tables -1 Com_drop_table +1
Open_table_definitions -1
Open_tables -1    DROP TABLE t1;
ERROR 1051 (42S02): Unknown table 'test.t1'     Com_drop_table -1 Com_drop_table -1    CREATE TEMPORARY TABLE ttemp (id INT);
Query OK, 0 rows affected     Com_create_table +1
Opened_table_definitions +2
Opened_tables +1 Com_create_table +1
Opened_table_definitions +2
Opened_tables +1    CREATE TEMPORARY TABLE ttemp (id INT);
ERROR 1050 (42S01): Table 'ttemp' already exists     Com_create_table +1 Com_create_table +1    DROP TABLE ttemp;
Query OK, 0 rows affected     Com_drop_table +1 Com_drop_table +1    CREATE TEMPORARY TABLE ttemp (id int);
Query OK, 0 rows affected   CREATE TEMPORARY TABLE ttemp (id int);
Query OK, 0 rows affected Com_create_table +1
Opened_table_definitions +2
Opened_tables +1 Com_create_table +2
Opened_table_definitions +4
Opened_tables +2 Com_create_table +1
Opened_table_definitions +2
Opened_tables +1  DROP TABLE ttemp;
Query OK, 0 rows affected   DROP TABLE ttemp;
Query OK, 0 rows affected Com_drop_table +1 Com_drop_table +2 Com_drop_table +1
  • A successful CREATE TABLE command opens and closes a table definition.
  • A non successful CREATE TABLE command opens the table definition and the file handle of the previous table. So a faulty application can be quite expensive.
  • A further non successful CREATE TABLE command has no other impact.
  • A DROP TABLE command closes a table definition and the file handle.
  • A CREATE TEMPORARY TABLE opens 2 table definitions and the file handle. Thus behaves different than CREATE TABLE
  • But a faulty CREATE TEMPORARY TABLE seems to be much less intrusive.
  • Open_table_definitions and Open_tables is always global, also in session context.
Taxonomy upgrade extras: statustemporary table

Why is varchar(255) not varchar(255)?

Cédric Bruderer - Fri, 2016-06-24 16:18

Recently I was working on a clients question and stumbled over an issue with replication and mixed character sets. The client asked, wether it is possible to replicate data to a table on a MySQL slave, where one column had a different character set, than the column in the same table on the master.


I set up two servers with identical table definitions and changed the character set on one column on the slave from latin1 to utf8.








So far no problem, I was able to start the replication and set off some INSERT statements with special characters (like ä, ö, ü, ...). But when I went to look for them in the slave's table, I could not find them.


"SHOW SLAVE STATUS", showed me this error:

Column 1 of table 'test.test' cannot be converted from type 'varchar(255)' to type 'varchar(255)'


You might ask yourself now: But the columns have the same type, what is the problem? What is not shown in the error is the fact, that there are two different character sets.


The log file is of no help either. It only shows the same error and tells you to fix it.

2016-05-26 15:51:06 9269 [ERROR] Slave SQL: Column 1 of table 'test.test' cannot be converted from type 'varchar(255)' to type 'varchar(255)', Error_code: 1677 2016-05-26 15:51:06 9269 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'valkyrie_mysqld35701_binlog.000050' position 120 2016-05-26 15:53:39 9269 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)


Skipping the statement will not work, as the server will just fail again, when the next statement shows up.


For all those who are now running to change the character set: STOP!

Changing characters set of columns or tables containing data can be fatal when done incorrectly. MySQL offers a statement to convert tables and columns to the character set you wish to have.


To convert the entire table, you can write:



To convert a single column, you can write:

ALTER TABLE tbl_name MODIFY latin1_column TEXT CHARACTER SET utf8;


More details can be found in the ALTER TABLE documentation of MySQL. (Converting character sets is at the end of the article.)


Just to be clear, this is no bug! MySQL replication was never intended to work with mixed character sets and it makes a lot of sense, that the replication is halted when differences are discovered. This test was only an experiment.

MySQL spatial functionality - points of interest around me

Shinguz - Wed, 2016-06-01 10:13

This week I was preparing the exercises for our MySQL/MariaDB for Beginners training. One of the exercises of the training is about MySQL spatial (GIS) features. I always tell customers: "With these features you can answer questions like: Give me all points of interest around me!"

Now I wanted to try out how it really works and if it is that easy at all...

To get myself an idea of what I want to do I did a little sketch first:

   My position   Shops   Restaurants   Cafes

To do this I needed a table and some data:

CREATE TABLE poi ( id INT UNSIGNED NOT NULL AUTO_INCREMENT , name VARCHAR(40) , type VARCHAR(20) , sub_type VARCHAR(20) , pt POINT NOT NULL , PRIMARY KEY (id) , SPATIAL INDEX(pt) ) ENGINE=InnoDB; INSERT INTO poi (name, type, sub_type, pt) VALUES ('Shop 1', 'Shop', 'Cloth', Point(2,2)) , ('Cafe 1', 'Cafe', '', Point(11,2)) , ('Shop 2', 'Shop', 'Cloth', Point(5,4)) , ('Restaurant 1', 'Restaurant', 'Portugies', Point(8,7)) , ('Cafe 2', 'Cafe', '', Point(3,9)) , ('Shop 3', 'Shop', 'Hardware', Point(11,9)) ;

This looks as follows:

SELECT id, CONCAT(ST_X(pt), '/', ST_Y(pt)) AS "X/Y", name, type, sub_type FROM poi; +----+-----------+--------------+------------+-----------+ | id | X/Y | name | type | sub_type | +----+-----------+--------------+------------+-----------+ | 1 | 2/2 | Shop 1 | Shop | Cloth | | 2 | 11/2 | Cafe 1 | Cafe | | | 3 | 5/4 | Shop 2 | Shop | Cloth | | 4 | 8/7 | Restaurant 1 | Restaurant | Portugies | | 5 | 3/9 | Cafe 2 | Cafe | | | 6 | 11/9 | Shop 3 | Shop | Hardware | +----+-----------+--------------+------------+-----------+

Now the question: "Give me all shops in a distance of 4.5 units around me":

SET @hereami = POINT(9,4); SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance FROM poi WHERE ST_Distance(@hereami, pt) < 4.5 AND type = 'Shop' ORDER BY distance ASC ; +----+------------+--------+----------+ | id | point | name | distance | +----+------------+--------+----------+ | 3 | POINT(5 4) | Shop 2 | 4.00 | +----+------------+--------+----------+ 1 row in set (0.37 sec)

The query execution plan looks like this:

id: 1 select_type: SIMPLE table: poi partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 650361 filtered: 10.00 Extra: Using where; Using filesort

So no use of the spatial index yet. :-(

Reading the MySQL documentation Using Spatial Indexes provides some more information:

The optimizer investigates whether available spatial indexes can be involved in the search for queries that use a function such as MBRContains() or MBRWithin() in the WHERE clause.

So it looks like the optimizer CAN evaluate function covered fields in this specific case. But not with the function ST_Distance I have chosen.

So my WHERE clause must look like: "Give me all points within a polygon spanned 4.5 units around my position..."

I did not find any such function in the short run. So I created a hexagon which is not too far from a circle...

With this hexagon I tried again:

SET @hereami = POINT(9,4); SET @hexagon = 'POLYGON((9 8.5, 12.897 6.25, 12.897 1.75, 9 -0.5, 5.103 1.75, 5.103 6.25, 9 8.5))'; SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance FROM poi WHERE MBRContains(ST_GeomFromText(@hexagon), pt) AND ST_Distance(@hereami, pt) < 4.5 AND type = 'Shop' ORDER BY distance ASC ; Empty set (0.03 sec)

And tadaaah: Damned fast, but the result is not the same! :-( When you look at the graph above it is obvious why: The missing shop is 0.103 units outside of our hexagon search range but within our circle range. So an octagon would have been the better approach...

At least the index is considered now! :-)

id: 1 select_type: SIMPLE table: poi partitions: NULL type: range possible_keys: pt key: pt key_len: 34 ref: NULL rows: 31356 filtered: 10.00 Extra: Using where; Using filesort

For specifying a an "outer" hexagon I was too lazy. So I was specifying a square:

SET @hereami = POINT(9,4); SET @square = 'POLYGON((4.5 8.5, 13.5 8.5, 13.5 -0.5, 4.5 -0.5, 4.5 8.5))'; SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance FROM poi WHERE MBRContains(ST_GeomFromText(@square), pt) AND ST_Distance(@hereami, pt) < 4.5 AND type = 'Shop' ORDER BY distance ASC ; +----+------------+--------+----------+ | id | point | name | distance | +----+------------+--------+----------+ | 3 | POINT(5 4) | Shop 2 | 4.00 | +----+------------+--------+----------+ 1 row in set (0.02 sec)

So, my shop is in the result again now. And even a bit faster!

Now I wanted to find out if this results are any faster than the conventional method with an index on (x) and (y) or (x, y):

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance FROM poi WHERE x >= 4.5 AND x <= 13.5 AND y >= -0.5 AND y <= 8.5 AND ST_Distance(@hereami, pt) < 4.5 AND type = 'Shop' ORDER BY distance ASC ; 1 row in set (0.15 sec)

Here the optimizer chooses the index on x. But I think he could do better. So I forced to optimizer to use the index on (x, y):

SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance FROM poi FORCE INDEX (xy) WHERE x >= 4.5 AND x <= 13.5 AND y >= -0.5 AND y <= 8.5 AND ST_Distance(@hereami, pt) < 4.5 AND type = 'Shop' ORDER BY distance ASC ; 1 row in set (0.03 sec) id: 1 select_type: SIMPLE table: poi partitions: NULL type: range possible_keys: xy key: xy key_len: 10 ref: NULL rows: 115592 filtered: 1.11 Extra: Using index condition; Using where; Using filesort

Same performance than with the spatial index. So it looks like for this simple task with my data distribution conventional methods do well enough.

No I wanted to try a polygon which comes as close as possible to a circle. This I solved with a MySQL stored function which returns a polygon:/p>

DROP FUNCTION polygon_circle; delimiter // CREATE FUNCTION polygon_circle(pX DOUBLE, pY DOUBLE, pDiameter DOUBLE, pPoints SMALLINT UNSIGNED) -- RETURNS VARCHAR(4096) DETERMINISTIC RETURNS POLYGON DETERMINISTIC BEGIN DECLARE i SMALLINT UNSIGNED DEFAULT 0; DECLARE vSteps SMALLINT UNSIGNED; DECLARE vPolygon VARCHAR(4096) DEFAULT ''; -- Input validation IF pPoints < 3 THEN RETURN NULL; END IF; IF pPoints > 360 THEN RETURN NULL; END IF; IF pPoints > 90 THEN RETURN NULL; END IF; if (360 % pPoints) != 0 THEN RETURN NULL; END IF; -- Start SET vSteps = 360 / pPoints; WHILE i < 360 DO SET vPolygon = CONCAT(vPolygon, (pX + (SIN(i * 2 * PI() / 360) * pDiameter)), ' ', (pY + (COS(i * 2 * PI() / 360) * pDiameter)), ', '); SET i = i + vSteps; END WHILE; -- Add first point again SET vPolygon = CONCAT("POLYGON((", vPolygon, (pX + (SIN(0 * 2 * PI() / 360) * pDiameter)), " ", (pY + (COS(0 * 2 * PI() / 360) * pDiameter)), "))"); -- RETURN vPolygon; RETURN ST_GeomFromText(vPolygon); END; // delimiter ; SELECT ST_AsText(polygon_circle(9, 4, 4.5, 6)); -- SELECT polygon_circle(9, 4, 4.5, 8);

Then calling the query in the same way:

SET @hereami = POINT(9,4); SELECT id, ST_AsText(pt) AS point, name, ROUND(ST_Distance(@hereami, pt), 2) AS distance FROM poi WHERE MBRContains(polygon_circle(9, 4, 4.5, 90), pt) AND ST_Distance(@hereami, pt) < 4.5 AND type = 'Shop' ORDER BY distance ASC ; +----+------------+--------+----------+ | id | point | name | distance | +----+------------+--------+----------+ | 3 | POINT(5 4) | Shop 2 | 4.00 | +----+------------+--------+----------+ 1 row in set (0.03 sec)

This seems not to have any significant negative impact on performance.

Results Test#rowsoperationlatencyTotal655360FTS1300 msSpatial exact Circle4128FTS520 msSpatial inner Hexagon3916range (pt)20 msSpatial outer Square4128range (pt)30 msConventional outer Square on (x)4128range (x) or (y)150 msConventional outer Square on (xy)4128range (x,y)30 msSpatial good Polygon4128range (pt)30 msTaxonomy upgrade extras: spatialgis

Why you should take care of MySQL data types

Shinguz - Wed, 2016-05-25 11:42

A customer reported last month that MySQL does a full table scan (FTS) if a query was filtered by a INT value on a VARCHAR column. First I told him that this is not true any more because MySQL has fixed this behaviour long time ago. He showed me that I was wrong:

CREATE TABLE `test` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `data` varchar(64) DEFAULT NULL, `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), KEY `data` (`data`) ) ENGINE=InnoDB; EXPLAIN SELECT * FROM test WHERE data = 42\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test partitions: NULL type: ALL possible_keys: data key: NULL key_len: NULL ref: NULL rows: 522500 filtered: 10.00 Extra: Using where EXPLAIN SELECT * FROM test WHERE data = '42'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test partitions: NULL type: ref possible_keys: data key: data key_len: 67 ref: const rows: 1 filtered: 100.00 Extra: NULL

When I executed the query I got some more interesting information:

SELECT * FROM test WHERE data = '42'; Empty set (0.00 sec) SELECT * FROM test WHERE data = 42; +--------+----------------------------------+---------------------+ | id | data | ts | +--------+----------------------------------+---------------------+ | 1096 | 42a5cb4a3e76857a3efe7af44ba9f4dd | 2016-05-25 10:26:59 | ... | 718989 | 42a1921fb2df42126d85f9586532eda4 | 2016-05-25 10:27:12 | +--------+----------------------------------+---------------------+ 767 rows in set, 65535 warnings (0.26 sec)

Looking at the warnings we also find the reason: MySQL does the cast on the column and not on the value which is a bit odd IMHO:

show warnings; | Warning | 1292 | Truncated incorrect DOUBLE value: '80f52706c2f9de40472ec29a7f70c992' |

A bit suspicious I looked at the warnings of the query execution plan again:

show warnings; +---------+------+---------------------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------------------------+ | Warning | 1739 | Cannot use ref access on index 'data' due to type or collation conversion on field 'data' | | Warning | 1739 | Cannot use range access on index 'data' due to type or collation conversion on field 'data' | +---------+------+---------------------------------------------------------------------------------------------+

I thought this was fixed, but it seems not. The following releases behave like this: MySQL 5.0.96, 5.1.73, 5.5.38, 5.6.25, 5.7.12 and MariaDB 5.5.41, 10.0.21 and 10.1.9

The other way around it seems to work in both cases:

SELECT * FROM test WHERE id = 42; +----+----------------------------------+---------------------+ | id | data | ts | +----+----------------------------------+---------------------+ | 42 | 81d74057d7be8f20563da404bb1b3ab0 | 2016-05-25 10:26:56 | +----+----------------------------------+---------------------+ SELECT * FROM test WHERE id = '42'; +----+----------------------------------+---------------------+ | id | data | ts | +----+----------------------------------+---------------------+ | 42 | 81d74057d7be8f20563da404bb1b3ab0 | 2016-05-25 10:26:56 | +----+----------------------------------+---------------------+ EXPLAIN SELECT * FROM test WHERE id = 42\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: test partitions: NULL type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 filtered: 100.00 Extra: NULL
Taxonomy upgrade extras: query tuningexplaindata typesql

How to become a certified DBA

Cédric Bruderer - Tue, 2016-05-10 10:16

I recently managed to get my certification as MySQL 5.6 DBA, and was asked to write a blog about it, because I had trouble getting the informations I needed.

You may have figured out too, that Oracle does not really supply you with information about the certification. At least, there is the MySQL documentation. It contains all the information you need.

Further, I recommend to use a virtual linux machine in combination with our tool MyEnv. This way you can simulate multiple scenarios, including replication set-ups, and if one or two servers die during your exercises, nobody gets mad at you.

When learning, make sure to have a look at the following topics:

  • Query tuning
  • Parameters tuning
  • MySQL client tools (mysqldump, mysqladmin, ...)
  • MySQL Audit Plugin
  • How to secure MySQL (Especially, the correct assignment of privileges.)
  • How to use the Performance and Information Schema
  • Partitions
  • Replication
  • Backup and Recovery (Both, physical and logical variant.)

The certification takes 150 minutes and contains 100 questions. 60% of your answers have to be correct, in order to pass. If you keep a pace of one answer per minute, you will also have enough time to go over those answers you were not entirely sure at the first time.

MariaDB 10.2 Window Function Examples

Shinguz - Mon, 2016-04-18 22:39

MariaDB 10.2 has introduced some Window Functions for analytical queries.

See also: Window Functions, Window Functions, Window function and Rows and Range, Preceding and Following

Function ROW_NUMBER()

Simulate a row number (sequence) top 3

SELECT ROW_NUMBER() OVER (PARTITION BY NULL ORDER BY category_id) AS num , category.category_id FROM category LIMIT 3 ;


SELECT ROW_NUMBER() OVER (ORDER BY category_id) AS num , category.category_id FROM category LIMIT 3 ; +-----+-------------+ | num | category_id | +-----+-------------+ | 1 | ACTUAL | | 2 | ADJUSTMENT | | 3 | BUDGET | +-----+-------------+
ROW_NUMBER() per PARTITION SELECT ROW_NUMBER() OVER (PARTITION BY store_type ORDER BY SUM(sf.store_sales) DESC) AS Nbr , s.store_type AS "Store Type", s.store_city AS City, SUM(sf.store_sales) AS Sales FROM store AS s JOIN sales_fact AS sf ON sf.store_id = s.store_id GROUP BY s.store_type, s.store_city ORDER BY s.store_type, Rank ; +-----+---------------------+---------------+------------+ | Nbr | Store Type | City | Sales | +-----+---------------------+---------------+------------+ | 1 | Deluxe Supermarket | Salem | 1091274.68 | | 2 | Deluxe Supermarket | Tacoma | 993823.44 | | 3 | Deluxe Supermarket | Hidalgo | 557076.84 | | 4 | Deluxe Supermarket | Merida | 548297.64 | | 5 | Deluxe Supermarket | Vancouver | 534180.96 | | 6 | Deluxe Supermarket | San Andres | 518044.80 | | 1 | Gourmet Supermarket | Beverly Hills | 619013.24 | | 2 | Gourmet Supermarket | Camacho | 357772.88 | | 1 | Mid-Size Grocery | Yakima | 304590.92 | | 2 | Mid-Size Grocery | Mexico City | 166503.48 | | 3 | Mid-Size Grocery | Victoria | 144827.48 | | 4 | Mid-Size Grocery | Hidalgo | 144272.84 | +-----+---------------------+---------------+------------+
Function RANK()

Ranking of top 10 salaries

SELECT full_name AS Name, salary AS Salary , RANK() OVER(ORDER BY salary DESC) AS Rank FROM employee ORDER BY salary DESC LIMIT 10 ; +-----------------+----------+------+ | Name | Salary | Rank | +-----------------+----------+------+ | Sheri Nowmer | 80000.00 | 1 | | Darren Stanz | 50000.00 | 2 | | Donna Arnold | 45000.00 | 3 | | Derrick Whelply | 40000.00 | 4 | | Michael Spence | 40000.00 | 4 | | Maya Gutierrez | 35000.00 | 6 | | Pedro Castillo | 35000.00 | 6 | | Laurie Borges | 35000.00 | 6 | | Beverly Baker | 30000.00 | 9 | | Roberta Damstra | 25000.00 | 10 | +-----------------+----------+------+
Function DENSE_RANK() SELECT full_name AS Name, salary AS Salary , DENSE_RANK() OVER(ORDER BY salary DESC) AS Rank FROM employee ORDER BY salary DESC LIMIT 10 ; +-----------------+----------+------+ | Name | Salary | Rank | +-----------------+----------+------+ | Sheri Nowmer | 80000.00 | 1 | | Darren Stanz | 50000.00 | 2 | | Donna Arnold | 45000.00 | 3 | | Derrick Whelply | 40000.00 | 4 | | Michael Spence | 40000.00 | 4 | | Maya Gutierrez | 35000.00 | 5 | | Pedro Castillo | 35000.00 | 5 | | Laurie Borges | 35000.00 | 5 | | Beverly Baker | 30000.00 | 6 | | Roberta Damstra | 25000.00 | 7 | +-----------------+----------+------+
Aggregation Windows SELECT full_name AS Name, salary AS Salary , SUM(salary) OVER(ORDER BY salary DESC) AS "Sum sal" FROM employee ORDER BY salary DESC LIMIT 10 ; +-----------------+----------+-----------+ | Name | Salary | Sum sal | +-----------------+----------+-----------+ | Sheri Nowmer | 80000.00 | 80000.00 | | Darren Stanz | 50000.00 | 130000.00 | | Donna Arnold | 45000.00 | 175000.00 | | Derrick Whelply | 40000.00 | 255000.00 | | Michael Spence | 40000.00 | 255000.00 | | Laurie Borges | 35000.00 | 360000.00 | | Maya Gutierrez | 35000.00 | 360000.00 | | Pedro Castillo | 35000.00 | 360000.00 | | Beverly Baker | 30000.00 | 390000.00 | | Roberta Damstra | 25000.00 | 415000.00 | +-----------------+----------+-----------+
Function CUME_DIST() and PERCENT_RANK() SELECT s.store_state AS State, s.store_city AS City, SUM(e.salary) AS Salary , CUME_DIST() OVER (PARTITION BY State ORDER BY Salary) AS CumeDist , PERCENT_RANK() OVER (PARTITION BY State ORDER BY Salary) AS PctRank FROM employee AS e JOIN store AS s on s.store_id = e.store_id WHERE s.store_country = 'USA' GROUP BY s.store_name ORDER BY s.store_state, Salary DESC ; +-------+---------------+-----------+--------------+--------------+ | State | City | Salary | CumeDist | PctRank | +-------+---------------+-----------+--------------+--------------+ | CA | Alameda | 537000.00 | 1.0000000000 | 1.0000000000 | | CA | Los Angeles | 221200.00 | 0.8000000000 | 0.7500000000 | | CA | San Diego | 220200.00 | 0.6000000000 | 0.5000000000 | | CA | Beverly Hills | 191800.00 | 0.4000000000 | 0.2500000000 | | CA | San Francisco | 30520.00 | 0.2000000000 | 0.0000000000 | | OR | Salem | 260220.00 | 1.0000000000 | 1.0000000000 | | OR | Portland | 221200.00 | 0.5000000000 | 0.0000000000 | | WA | Tacoma | 260220.00 | 1.0000000000 | 1.0000000000 | | WA | Spokane | 223200.00 | 0.8571428571 | 0.8333333333 | | WA | Bremerton | 221200.00 | 0.7142857143 | 0.6666666667 | | WA | Seattle | 220200.00 | 0.5714285714 | 0.5000000000 | | WA | Yakima | 74060.00 | 0.4285714286 | 0.3333333333 | | WA | Bellingham | 23220.00 | 0.2857142857 | 0.1666666667 | | WA | Walla Walla | 21320.00 | 0.1428571429 | 0.0000000000 | +-------+---------------+-----------+--------------+--------------+
Function NTILE() SELECT promotion_name, media_type , TO_DAYS(end_date)-TO_DAYS(start_date) AS Duration , NTILE(4) OVER (PARTITION BY promotion_name ORDER BY DURATION) AS quartile , NTILE(5) OVER (PARTITION BY promotion_name ORDER BY DURATION) AS quintile , NTILE(100) OVER (PARTITION BY promotion_name ORDER BY DURATION) AS precentile FROM promotion WHERE promotion_name = 'Weekend Markdown' LIMIT 10 ; +------------------+-------------------------+----------+----------+----------+------------+ | promotion_name | media_type | Duration | quartile | quintile | precentile | +------------------+-------------------------+----------+----------+----------+------------+ | Weekend Markdown | In-Store Coupon | 2 | 1 | 1 | 9 | | Weekend Markdown | Daily Paper | 3 | 3 | 4 | 29 | | Weekend Markdown | Radio | 3 | 4 | 4 | 36 | | Weekend Markdown | Daily Paper, Radio | 2 | 2 | 2 | 13 | | Weekend Markdown | Daily Paper, Radio, TV | 2 | 2 | 3 | 20 | | Weekend Markdown | TV | 2 | 3 | 3 | 26 | | Weekend Markdown | Sunday Paper | 3 | 3 | 4 | 28 | | Weekend Markdown | Daily Paper, Radio, TV | 3 | 3 | 4 | 34 | | Weekend Markdown | Daily Paper | 2 | 1 | 2 | 10 | | Weekend Markdown | Street Handout | 2 | 2 | 2 | 18 | | Weekend Markdown | Bulk Mail | 3 | 4 | 5 | 37 | | Weekend Markdown | Cash Register Handout | 2 | 2 | 2 | 14 | | Weekend Markdown | Daily Paper, Radio, TV | 3 | 3 | 4 | 31 | | Weekend Markdown | Sunday Paper | 2 | 3 | 3 | 27 | | Weekend Markdown | Sunday Paper, Radio, TV | 1 | 1 | 1 | 4 | +------------------+-------------------------+----------+----------+----------+------------+
Taxonomy upgrade extras: mariadbdwhreportingAnalyticsWindow FunctionOLAPData Mart

Define preferred SST donor for Galera Cluster

Cédric Bruderer - Fri, 2016-04-15 18:00

One of our customers recently ran into a problem, where he wanted to have a preferred donor for SST, whenever a node came up. The problem was, that the node did not come up, when the preferred donor was not running.

In the documentation, you can find the parameter wsrep_sst_donor, which prefers the specified node as SST donor. This is great, as long as the donor is actually running.

The problem can be fixed by adding a comma to the end of the value of wsrep_sst_donor, what would look like this:


Note the comma at the end of the value. This trailing comma basically tells this node, that galera2 is the preferred donor, if galera2 is not available, any other available node will be used as donor.

You could also specify a secondary node, which is needed to be available for the node to come up:


In this case, galera1 wil be used as secondary donor if galera2 is not available. If both are not available, the node will refuse to come up.

Taxonomy upgrade extras: Galera Cluster

Past and Future Conferences, and Talks Around MySQL

Jörg Brühe - Mon, 2016-04-11 15:25

Time flies, and my blogging frequency is quite low. More frequent would be better, but knowing myself I'll rather not promise anything ;-)

Still, it is appropriate to write some notes about CeBIT, the "Chemnitzer Linuxtage 2016", and future events.


CeBIT was running from March 14 to 18 (Monday till Friday) in Hannover, Germany, and I will leave the general assessment to the various marketing departments as well as to the regular visitors (to which I do not belong).

In order to meet our current customers as well as potential future ones, FromDual had a booth in the "Open Source Forum". We displayed a Galera Cluster, running on three tiny headless single-board Linux machines, and showed how it reacts to node failures and then recovers all by itself, without any administrator intervention. Many of our visitors were fascinated, because a HA solution would be a good fit in their solution architecture. We had got several stuffed dolphins "Sakila", the traditional MySQL symbol, and all of them found new homes (typically with the words "for my grandchild"). :-)

IMHO, the "Open Source Forum" had deserved a better visitor attraction than it really got - placing it into one hall with document management systems was no good fit, research and development might have been more appropriate.
The forum had an area for talks which were running all five days, I consider John "Maddog" Hall (who had provided an Alpha machine to Linus Torvalds decades ago) and Prof. Klaus Knopper (who is maintaining the "Knoppix" live distribution) the most prominent speakers. FromDual's Oli Sennhauser talked about the new features of MySQL 5.7, you can get the slides via the FromDual download page.

Chemnitzer Linux-Tage

The weekend following CeBIT, March 19 and 20, had been selected for the Chemnitzer Linux-Tage. Like in the previous years, the conference attracted many visitors from all over Germany as well as from some neighbouring countries, and both John Hall and Klaus Knopper had come there directly from Hannover - like me and several others.

As usual, the conference programme covered all aspects of Linux, the headline was "It is your project!". Databases are definitely not one of the major topics there, it is more about overall trends, distributions, communication, and many other aspects.

I delivered a talk about "RPM conventions - a Modern Tower of Bable", and it was well received. I am using various MySQL RPMs (from MySQL AB, Oracle, or RedHat) as examples to show different opinions about packaging, dependencies, installation actions, and compatibility issues, which partly originate from the diverging positions of software developer vs distributor. MySQL was used as a well-known example (but will interest my readers here), most of the items are also applicable to almost any software. Again, the slides (in German) are available on the FromDual web site. Your comments are welcome!

Open Source Data Center Conference

We all know that Open Source has become a major force in computing, so it is no surprise to have it as the subject for various conferences.

One of them is the "Open Source Data Center Conference" "OSDC", to be held in Berlin on April 26 - 28. Open Source database systems are one of the topics, and the programme committee accepted my talk "MySQL-Server in Teamwork - Replication and Galera Cluster". After the conference, I will upload it on the FromDual site and make it available for download.

Now, having told you all this, i will turn to customer issues again ...


Galera Cache sizing

Shinguz - Mon, 2016-04-04 22:03

To synchronize the data between the Galera Cluster and a new or re-entering Galera node Galera Cluster uses 2 different mechanisms:

  • For full synchronization of data: Snapshot State Transfer (SST).
  • For delta synchronization of data: Incremental State Transfer (IST).

The Incremental State Transfer (IST) is relevant when a node is already known to the Galera Cluster and just left the cluster short time ago. This typically happens in a maintenance window during a rolling cluster restart.

The Galera Cache is a round-robin file based cache that keeps all the write-sets (= transactions + meta data) for a certain amount of time. This time, which should be bigger than your planned maintenance window, depends on the size of the Galera Cache (default 128 Mbyte) and the traffic which will happen during your maintenance window.

If your traffic is bigger than the Galera Cache can keep Galera Cluster will fall-back from IST to SST which is a very expensive operation for big databases.

The size of the Galera Cache can be calculated of the delta of the sum of the following 2 Galera status informations before and after the maintenance window:

Galera Cache size = delta(wsrep_replicated_bytes + wsrep_received_bytes)

Ideally you determine these values before your change happens in a time window where you have roughly the same traffic as during your maintenance window.

If you do not have a Galera Cluster in place yet or if you do not have those values available you can also use the numbers of the traffic written to the binary log or the number of the traffic written to InnoDB transaction log (Innodb_os_log_written).

As a rough estimate we have evaluated the following formulas for you:

Binary Log Traffic x 1.3 = Wsrep traffic (+/- 10%)


InnoDB Log File traffic x 0.6 = Wsrep traffic (+/- 10%)
Taxonomy upgrade extras: Galera Clustercachesizing

FromDual Nagios and Icinga plug-ins v1.0.1 for MySQL/MariaDB released

FromDual.en - Tue, 2016-02-23 18:27

FromDual has the pleasure to announce the release of the new version 1.0.1 of the FromDual Nagios and Icinga plug-ins for MySQL, Galera Cluster and MariaDB.


The new FromDual Nagios plug-ins can be downloaded here.

In the inconceivable case that you find a bug in the FromDual Nagios plug-ins please report it to our bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to

Changes in Nagios plug-ins 1.0.1
  • Help adapted to new 5.7 conventions for creating user in
  • Output can be formatted for Centreon, Icinga and Nagios now in
  • Support for Galera Cluster implemented in
Taxonomy upgrade extras: nagiosicingaplug-inmysqlmariadbpercona serverGalera ClustermonitoringOperationsrelease

On Files, the Space They Need, and the Space They Take

Jörg Brühe - Tue, 2016-02-09 14:55


xfs Users, Take Care!

Recently, we had a customer ask: Why do many files holding my data take up vastly more space than their size is? That question may sound weird to you, but it is for real, and the customer's observation was correct. For a start, let's make sure we are using the same terms.

  • The size of a file is the number of bytes it will deliver if it is read sequentially from start to end.
  • The space it takes up is the sum of all disk pages which are used to hold the file's data, or to locate those data pages ("indirect" blocks in Unix/Linux terminology).

Every Unix/Linux admin knows (or at least should know) that a file may take up less disk space than its size is. This happens when not all bytes of the file were really written, but the write pointer was advanced via "seek()", leaving a gap. Disk pages which are completely contained in such a gap will not be written, and reading these positions will produce bytes containing zero. This is called a "sparse file". You will find some remarks about them in our blog at, or search the net for that term.

The Customer's Message

Now that we have brought those basics into active memory again, let's return to the original question: Can there be files which take up vastly more space than their size is? We will not consider potential administrative overhead (pointers to pages), because to the customer a file of slightly more than 4 GB was reported to take up 8.1 GB disk space - see this quote from his mail (file name changed):

# ls -l some_table#P#p01.ibd
-rw-rw---- 1 mysql mysql 4307550208 Jan 4 01:06 some_table#P#p01.ibd
# du -hs some_table#P#p01.ibd
8,1G some_table#P#p01.ibd

Luckily, the customer's mail mentioned the file system: It was not one of the "ext" family (ext2, ext3, or etx4), but rather they are using xfs. This gave me a hint to search for information, and Google provided several pointers, IMO the most helpful ones where these:

Both these texts describe that the Linux kernel includes a tuning function for xfs file systems, which is to pre-allocate pages at the end of a growing file. Originally, the amount was small (64 kB), but then the size was made a function of the file size - the larger the file, the more pages were pre-allocated. Hence, this is now called "dynamic speculative EOF preallocation". It is based on the assumption that the file will continue to grow, and these pre-allocated pages are adjacent, so the performance of later file use (especially reads) will be improved. To not waste disk space permanently, such pre-allocated pages will be cut from the file when it is closed.

Measuring File Size and Space Taken

To see this behavior in practice, I wrote a little shell script that lets a file grow in increments of 160 kB (= ten InnoDB pages of default size) without closing it. (You can find it attached.) In parallel, I checked the size ("ls -l --block-size=K") and the space allocated ("du -k"). With this script, I could easily observe the effect:

Test './try-xfs-prealloc' is running on TTY 'pts/10'.

Linux trift-6core 3.13.0-74-generic #118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
/dev/mapper/vg1000-XFS_try on /XFS_try type xfs (rw)
Dateisystem 1K-Blöcke Benutzt Verfügbar Verw% Eingehängt auf
/dev/mapper/vg1000-XFS_try 52403200 33504 52369696 1% /XFS_try
-rw-rw-r-- 1 joerg joerg 480K Feb 4 16:56 /XFS_try/somedir/bigfile
960 /XFS_try/somedir/bigfile
-rw-rw-r-- 1 joerg joerg 960K Feb 4 16:57 /XFS_try/somedir/bigfile
960 /XFS_try/somedir/bigfile
-rw-rw-r-- 1 joerg joerg 1440K Feb 4 16:57 /XFS_try/somedir/bigfile
1984 /XFS_try/somedir/bigfile
-rw-rw-r-- 1 joerg joerg 1920K Feb 4 16:57 /XFS_try/somedir/bigfile
1984 /XFS_try/somedir/bigfile
-rw-rw-r-- 1 joerg joerg 2240K Feb 4 16:57 /XFS_try/somedir/bigfile
4032 /XFS_try/somedir/bigfile

(( several lines not quoted ))

-rw-rw-r-- 1 joerg joerg 11200K Feb 4 17:02 /XFS_try/somedir/bigfile
16320 /XFS_try/somedir/bigfile
-rw-rw-r-- 1 joerg joerg 11680K Feb 4 17:02 /XFS_try/somedir/bigfile
16320 /XFS_try/somedir/bigfile
No further writes, status:
-rw-rw-r-- 1 joerg joerg 12288000 Feb 4 17:02 /XFS_try/somedir/bigfile
16320 /XFS_try/somedir/bigfile

Writer process killed, status:
-rw-rw-r-- 1 joerg joerg 12288000 Feb 4 17:02 /XFS_try/somedir/bigfile
12000 /XFS_try/somedir/bigfile

While dynamic preallocation is a good idea for most files, it fails badly on MySQL data files: The MySQL server will not close them, in general, even when they won't grow any further (like a table partitioned by date). So this is what the customer detected: A table partition which had grown somewhat beyond 4 GB had got pages for another 4 GB pre-allocated, they were not released, and this happened for many files. Those of you who have ample disk space may say "who cares?", but there are others who have to care. For them, this feature has risky consequences, so they should try to prevent them.

Avoiding The Unlimited Growth

Basically, the only way out is to use the "allocsize" mount option, as described in the FAQ. InnoDB reads 64 pages of 16 kB at most, so "allocsize=1M" might be best.

Like the customer, many DBAs or SysAdmins may not be aware of that behaviour and might detect it only on the running system. Of course, the first question will be: "Can I fix that without downtime?" Immediately, a "mount -o remount" comes to mind, so I tried that: While my test script was running, I issued
sudo mount -o remount,allocsize=1M /XFS_try

Sadly, I must tell you it had no effect: The size of the pre-allocated space continued to grow, like in the original run. Even worse, this command also did not have any effect on a run I started after issuing it.

This proves that the value of "allocsize" cannot be changed for a mounted XFS file system, rather its value at mount time remains effective until the unmount. Only when I unmounted it and then mounted it anew, giving "allocsize=1M", did I see the fixed size as pre-allocation amount. From the DBA point of view, it means that a shutdown of the MySQL instance cannot be avoided for this change. (Of course, if we talk about a Galera cluster, the system remains available, because the nodes can be handled one at a time.)

Can You Get It Without Shutdown?

Now what if you really need to avoid a shutdown, but also need to get back the pre-allocated space urgently? As written above, this will happen only when the file is closed. So the question is: How can the DBA let the MySQL server close a table data file without interrupting the service? There seems to be a chance: the "flush tables" statement. The manual says:

Closes all open tables, forces all tables in use to be closed, and flushes the query cache. ...

FLUSH TABLES tbl_name [, tbl_name] ...
With a list of one or more comma-separated table names, this statement is like FLUSH TABLES with no names except that the server flushes only the named tables. ...

The text is identical for versions from 5.1 to 5.7.

But then, see the user comment by Simon Mudd on that page: No effect for InnoDB. To check this, I wrote a script that inserts rows into an InnoDB table, then let it run: The effect of preallocation is clearly visible. However, sometimes the space used may suddenly go down to the file size, then go up again. My impression is that XFS will react different to a plain file and an InnoDB table, because a file will grow sequentially at the end only while an InnoDB table also has writes to other blocks during its growth. At the end of the insert run, "ls -l" and "du" might show a big preallocation, but not in all runs. To really be sure, I repeated the experiment with the daemon "mysqld" running under "strace" control: No, I did not get a "close()" logged from the "flush table" command.

I had the opportunity to discuss it with a MySQL developer: Yes, that is correct, and it is intentional. InnoDB relies heavily on background threads, and they do not want to add the complexity of syncing these tasks with a "flush table" command. So there is no command that would guarantee the release of preallocated space.

Conclusion While xfs is a good file system for databases, the "dynamic speculative EOF preallocation" is a feature to be aware of, and you may want to limit its amount so that you don't have too much wasted space on your disk(s). Use the "allocsize=" mount option, and remember that it needs to be set before the mount.

Take care!

AttachmentSize Shell script to show XFS preallocation1.92 KB

FOSDEM 2016 - MySQL slides about PERFORMANCE_SCHEMA available

FromDual.en - Wed, 2016-02-03 21:56

The FOSDEM 2016 in Brussels (Belgium) January 29/30 is over and was very interesting and IMHO a big success.

For all those who could not participate at FOSDEM 2016 our presentation slides about PERFORMANCE_SCHEMA and sys schema are available here:

PERFORMANCE_SCHEMA and sys schema - What can we do with it? (PDF, 406 kbyte)

Taxonomy upgrade extras: sysperformance_schemaslides

FromDual Ops Center for MySQL and MariaDB 0.3 has been released

FromDual.en - Tue, 2016-02-02 16:24

FromDual has the pleasure to announce the release of the new version 0.3 of the FromDual Ops Center for MySQL and MariaDB.

The FromDual Ops Center for MySQL and MariaDB (focmm) is an application for DBA's and system administrators to manage MySQL and MariaDB database farms.

The main task of Ops Center is to support you in your daily MySQL and MariaDB operation tasks.

More information about FromDual Ops Center you can find here.


The new FromDual Ops Center can be downloaded here.

In the inconceivable case that you find a bug in the FromDual Ops Center please report it to our bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to

Installation of Ops Center

A complete guide on how to install FromDual Ops Center you can find in the Ops Center User Guide.

Upgrade from 0.2 to 0.3

An upgrade path from Ops-Center v0.2 to v0.3 is currently not supported. Please also check Upgrading.

Changes in Ops Center 0.3
  • Ops Center repository database was changed from Sqlite to MySQL/MariaDB.
  • Structure and schema comparison introduced.
  • Variables comparison introduced.
  • Processlist tool introduced.
  • VIP is checked over crontab now.
  • Debugging facility improved.
  • Some configuration rules/checks implemented.
  • User management implemented for multiple-instances.
  • Clone user implemented.
  • Database backup implemented.
  • Jab handling implemented.
  • Blocking backup implemented for MyISAM tables.
  • Skip Slave with GTID works now.
Taxonomy upgrade extras: operationOperationsreleaseBackupfailover

MySQL Environment MyEnv 1.3.0 has been released

FromDual.en - Sun, 2016-01-31 12:34

FromDual has the pleasure to announce the release of the new version 1.3.0 of its popular MySQL, Galera Cluster, MariaDB and Percona Server multi-instance environment MyEnv.

The new MyEnv can be downloaded here.

In the inconceivable case that you find a bug in the MyEnv please report it to our bug tracker.

Any feedback, statements and testimonials are welcome as well! Please send them to

Upgrade from 1.1.x or higher to 1.3.0 # cd ${HOME}/product # tar xf /download/myenv-1.3.0.tar.gz # rm -f myenv # ln -s myenv-1.3.0 myenv

If you are using plug-ins for showMyEnvStatus create all the links in the new directory structure:

cd ${HOME}/product/myenv ln -s ../../utl/oem_agent.php plg/showMyEnvStatus/
Changes in MyEnv 1.3.0 MyEnv
  • NDB Cluster stuff was removed and cleaned-up.
  • my.cnf template was slightly adapted to new requirements (query_cache_size, innodb_buffer_pool_instances, binlog_stmt_cache_size, wsrep_sst_method and wsrep_sst_auth).
  • Cgroups stuff was fixed and made more verbose in case of configuration errors.
  • Cgroups template (tpl/cgroups.conf) was extended.
  • Some warnings made more clear.
  • Too verbose error message in case of missing my.cnf was redirected to debug logging.
MyEnv Installer
  • Preparation work for Mac OSX was done.
  • Installation deletion can be automatized now.
  • Installer made ready for 5.7
  • Tags angel and default are not attached to each section any more.
  • Schema sys was added to hideschema variable
MyEnv Utilities
  • Script host tag added.
  • ORDER BY added to GROUP BY (in query_bench.php) to make it working the same in the future.
  • rc bug fixed in

For subscriptions of commercial use of MyEnv please get in contact with us.

Taxonomy upgrade extras: myenvoperationMySQL Operationsmulti instanceconsolidationtestingupgradereleasecloudcgroupscontainer

Replication in a star

Cédric Bruderer - Thu, 2016-01-21 21:24

Most of you know, that it is possible to synchronize MySQL and MariaDB servers using replication. But with the latest releases, it is also possible to use more than just two servers as a multi-master setup.

Most of you know that both MySQL and MariaDB support replication in a hierarchical master-slave-setup, to propagate changes across all connected servers.

But with the latest releases, a slave can have more than one master.

The keyword: Multi-Source replication

It is supported from MySQL 5.7 and MariaDB 10.0 on, and this article describes how to set it up.

What does Multi-Source mean?

Multi-Source means, that you can take two or more masters and replicate them to one slave, where their changes will be merged. This works just like the regular MySQL/MariaDB replication.

Well, we are going to exploit that a little. It is still possible to configure a Master-Master set up, what basically allows the following configuration.

Multiple servers are conjoined in a cluster, where every server is a master, and the replication happens over one central node.

Why should we use Multi-Source-Replication, if it's just Master-Master?

With a Master-Master-Setup, you are limited to a ring topology or two servers. With Multi-Source, you can now use more than two server without having to use the ring topology, which might break and cause the replication to halt.

Further it is possible to back everything up at one place, without the risk of interrupting access to the databases.


The logical topography is a star. The following image shows a possible set up, with servers located in different countries. Thanks to the asynchronous replication, which does not require a broadband connection to work properly, this is possible without a problem.

What has to be considered?
  • The problems for any other Master-Master-Setup apply here as well.
  • If the replication in the cluster is stalled, the problem is usually on more than one server, maybe even the entire cluster.
  • Although the synchronization is asynchronous and does not cause a lot of network traffic itself, the replication of large or heavily accessed databases will cause some traffic at the central node.

How do I set it up?

The way you set it up, is like any other Master-Master replication. Except, that you will have more masters in the cluster.

1) Set up a standard installation of MySQL 5.7 or MariaDB 10.0 or above.


2) Prepare the configuration on all servers:

- On all the outer nodes (In Layout: All except server 1)

log_slave_updates = 0;

- On the central node (In Layout: server 1)

log_slave_updates = 1;



  • The central server must forward everything it receives, so that the changes starting on some outer node will also reach all the other outer nodes.
  • The outer nodes must not forward such changes, because they would loop through the cluster forever.


- Give each server a unique ID!


- Create the user which will be used for the replication:

CREATE USER 'replicator'@'localhost' IDENTIFIED BY 'password'; GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'' IDENTIFIED BY 'password';

- On MySQL 5.7 you have to enable the use of GTIDs (see step 3).


3) Set up the replication with all the outer nodes:

I will start with MySQL 5.7:


  • MySQL Multi-Source-Replication
  • Online enable GTIDs

    Although there is a paragraph, which states that it is possible with file and position, I have experienced something different, what forced me to enable GTIDs.

    You have to change the repositories for "master info" and "relay log" to the format "TABLE":

mysql> SET GLOBAL master_info_repository = 'TABLE'; mysql> SET GLOBAL relay_log_info_repository = 'TABLE';

Further, it is necessary to enable GTID. From MySQL 5.7.6 and higher, it is possible to do this without restarting the server.


The value of "ENFORCE_GTID_CONSISTENCY" has to be "ON", otherwise the slave will return an error and refuse to work. And don't forget to make those changes in your "my.cnf" persistent.

Create a new link to a master:


Start slave:

mysql> START SLAVE [thread_type] [FOR CHANNEL=[]];

Stop slave:

mysql> STOP SLAVE [thread_type] [FOR CHANNEL=[]];

How to modify a link:



Now MariaDB 10.0:

Source: MariaDB Multi-Source-Replication

Create a new link to a master (ATTENTION: MariaDB has a different syntax, compared to MySQL):


Start all slaves on this server at the same time:


Check the status of all slaves on this server:


If you want to manage a slave on its own, you can use the regular commands. But you have to make the connection the default one.

Here is an example what you have to do, if you want to modify the connection 'maria-master':

mariadb> STOP SLAVE'maria-master'; mariadb> CHANGE MASTER 'maria-master' TO MASTER_HOST='', MASTER_PORT=, MASTER_USER='', MASTER_PASSWORD=''; mariadb> START SLAVE 'maria-master'; mariadb> SHOW SLAVE 'maria-master' STATUS\G

After you have done this, you have a normal Master-Slave replication between two servers (as shown in the following picture). To complete the star, repeat those steps on each server, you want to connect.

Is it possible expand an existing set up?

It is no problem to expand an existing master-master or master-slave set up. However I would recommend to create new connections. This way the replication is consistent in the configuration with the other connections.

How do I break it?

Just like any other Master-Master-Setup. So be careful, with writing on more than one server. In general, if a bad command is committed, it will be replicated to the other nodes. This will cause the cluster to stall.

Another problem is auto_increment. Duplicate IDs will cause a "Duplicate Key Error" and stall each server it happens on. This can be prevented by editing the values of "auto_increment_increment" and "auto_increment_offset". In this scenario, the value "auto_increment_increment" should be 7 (the amount of servers in the cluster) and the value of "auto_increment_offset" would be something from 0 to 6 (or 1 to 7, depending on your preferences).

How do I fix it?

Sadly, that is not so easy. In addition, the statements are still replicated over the cluster, what usually causes more than just one server to stall, most time it is the entire cluster.

You have to execute every action on each connection. This gets tedious if you have a large amount of nodes.

Let's assume you have a duplicate key error, because two inserts happened at the same time.


MySQL 5.7:

If you experience a stalled replication on MySQL, you have to skip the GTID of the transaction which caused the stall.

First, stop the slave:

mysql> STOP SLAVE FOR CHANNEL 'failed-transactions-channel-name';

Retrieve the next GTID:


The line "Retrieved_Gtid_Set" contains the next GTID which would be executed. Copy the value and tell the server to execute that GTID:

mysql> SET GTID_NEXT="dd57a411-b477-11e5-b518-005056244454";

Execute a blank transaction:


Tell the server to take control of the GTIDs:


And restart the slave:

mysql> START SLAVE FOR CHANNEL 'failed-transactions-channel-name';


MariaDB 10.0:

Stop the slave for the connection:

mariadb> STOP SLAVE 'maria-master';

Define which connection, you would like to edit:

mariadb> SET @@default_master_connection='maria-master'; # No, it's not a joke.

Skip the failed statement:


Restart the slave:

mariadb> START SLAVE 'maria-master';

Check if the slave is running:

mariadb> SHOW SLAVE 'maria-master' STATUS\G

Please note the "@@default_master_connection". It is really necessary to set this variable, or you will not be able to change the counter you want. I'm not sure, if I should be surprised or shocked, that this is the recommended solution by MariaDB.

Hot expand

What is needed to add a new node? Is it required to stop the entire cluster? Or can I just add a new server?

If you want to add a new node to the cluster, it would be best, if you take a dump from another node, including the master data. You then import that dump into the node, make sure the link to the master is configured correctly, and start it. If everything is set up the right way, you should have no problems at all.

It is not necessary to stop the entire cluster, since you can add the links between the nodes while the servers are up.

Backup and Restore Method 1

The thought behind Multi-Source replication was to make the administrators life easier when it comes to backups. Instead of backing up all data at their respective location, you can gather it on one server and do the backup on this machine, without interrupting the "productive" servers work.

To obtain a backup of all the replicated databases from the cluster, it would be best to do it on the central server. The reason for this is, that everything has to go over it. This method is suitable to protect yourself from to losing all data on the cluster. The downside is, when one node dies, you have to obtain the backup from that location and transfer it to the failed server.

Method 2

If you would like to keep your data save on the location of the server, you can set up a slave at each location and replicate the master to it. For a restore, you could use the existing slave as the new node of the star, while the old server is rebuilding. This set up is capable of keeping the downtime of the service as little as possible. Further, you are not required to transfer the backup from one location to another, since it is already stored close to the failed server.



Versions used:

MariaDB: 10.1.10

MySQL: 5.7.10

How to Get a Galera Cluster Into Split Brain

Jörg Brühe - Fri, 2015-10-23 17:02

"Split Brain" is the term commonly used for a cluster whose nodes have different contents, rather than identical as they should have. Typically, a "split brain" situation is the DBA's nightmare, and the Galera software is designed to avoid it. Galera is very successful in that avoidance, and it needs some special steps by the DBA to achieve "split brain". Here is how to do it - or, for most DBAs, what to avoid doing to not get a split-brain cluster.

Galera's Design

First, let's remember how Galera is operating:

  • The Galera software ensures that all nodes participating in a cluster will start from identical contents, by doing a "snapshot state transfer" (SST) of all current data to a newly joining node.
  • When the cluster is running, Galera transfers all changes (transactions) to all cluster nodes and applies them (or rolls back and ignores, in the case of a conflict).
  • If some connections get lost, all nodes check whether they "have quorum" (belong to a majority), and stop serving requests if they don't.
  • When a disconnected node re-joins the cluster, it gets all meantime changes transferred ("incremental state transfer" IST) and so makes its contents current.
  • Should that be impossible, because some of those changes have become unavailable (log purge), a full transfer (SST) is done.
By this design, the Galera software successfully avoids getting into a "split brain" situation.

Of course, the quorum is a well-known concept. The old term for it is "majority consensus", and the approach is built on a simple principle:
In any set (of cluster nodes), there cannot be two (or more) non-overlapping subsets which both contain a majority of the elements.
So if some loss of connectivity splits the cluster into subsets, at most one of them can "have quorum", all others will stop serving requests, and there cannot be two (or more) different directions in which the contents (data) changes.

What Galera introduced (compared to previous designs of distributed DBMSs) is the efficient transfer of changes and conflict detection / resolution ("certification" in Galera terms) at "commit" time that makes the system fast, while previous designs used "distributed locking" or other principles which added latency to many commands and so made their systems slow.

The Story

Let's get back to the "split brain" issue, of which I said that Galera avoids it, and also said it can be reached. Sounds contradictory? Well, there are more active components than just Galera. Here is a real-world case, as happened to (I won't say "achieved by") a customer:

Originally, they had set up a Galera cluster of three nodes; let's call them A, B, and C. This is started by bringing up node A as a stand-alone node, running MySQL with the "wsrep" plugin. Then, one after the other, nodes B and C are configured to join node A in forming a cluster, and started. As a result, there are three nodes communicating with each other that form the cluster. In addition, HAproxy is running somewhere, it will direct the clients to an active cluster node.

So far, so good: The cluster is running, clients connect and issue transactions, everything is ok, and the DBA/s turn/s to other tasks ...

Some time later, node A must be stopped to do some hardware maintenance. No problem, nodes B and C are running fine, they have quorum (2 of 3 is a majority), so the system is still available and operations continue. HAproxy detects that node A does not respond, so it directs all clients to B or C. The cluster architecture is serving its purpose of continuous availability even during a maintenance period.

Maintenance is done, node A is rebooted, its MySQL+Galera server process restarts. It comes up, HAproxy detects it as running and directs clients to it. All seems fine ...

Three hours later, someone has become suspicious, detects trouble, and node A is stopped. Why? What has happened?



What has happened?

Remember what I wrote about the cluster setup: It was

... started by bringing up node A as a stand-alone node, ... ... nodes B and C are configured to join node A ... ... three nodes communicating with each other that form the cluster.

These steps were sufficient to get the nodes up and running. What was missing, however, was to re-configure A from "stand-alone" to "member of cluster with B and C".

As a consequence, when A was restarted after the maintenance, it again came up stand-alone. It did not try to join the cluster (which would have triggered an IST of the meantime changes) or check for quorum (a stand-alone node is self-sufficient).

Based on A's configuration (as read from disk), all was fine.
Based on the concept of the A+B+C cluster, it was a plain, simple split-brain:
Some changes were done on B+C (which still had quorum), while others were done on A only (which was mis-configured).

Lesson to learn (or rather, to bring back into active brain memory):
If some configuration is changed at run-time, this change must also be done in the configuration files so that it is used on restart.

The typical example for such changes is a "set global" command modifying some dynamic variable, like "max_connections".
But in a Galera cluster, a node joining the others is also a dynamic configuration change, and it should ASAP be reflected in all configuration files. If this isn't done, the consequences might be as described above.


Now, most stories have a happy ending, and this one shouldn't be an exception:

Luckily, the application uses self-generated keys, similar to UUIDs, so the entries created on A did not conflict with those of B+C. Also, there were no changes of existing data, just inserts. So the situation could be corrected by extracting all new data from A, inserting them on B+C, and then resetting A's state so that it asked B+C for an SST. Uff!

Operational Advice

There are some tools available that will do such a transfer. However, it can be done completely with standard parts coming with MySQL:

  • "mysqldump" will extract the data from A.
  • Suitable options will make sure this exctract does not contain "drop" or "create" commands, and generates "insert ignore".
    Check the documentation for the options "--no-create-db", "--no-create-info", "--skip-add-drop-table", and "--insert-ignore".
  • If the old, common data are deleted first, both dumping and loading becomes faster, and duplicates are reduced / avoided.
  • "mysql" can be used to load these data into B or C.
Note, however, that conflicts will be ignored and not reported. Other tools or approaches might do that.

Had they used auto-increment keys, or had they modified existing data, it would have been much more complicated, and it might even have been impossible to combine all changes without losing some. I leave it to your imagination to think of such scenarios.

To repeat the lesson in DBMS / DBA terms:

  • The most important property of a database is consistency, it must be kept up at all times.
  • For database operations, the configuration on disk (in files) must be consistent to that in RAM (of the running processes), so any runtime changes must be reflected in the configuration files on disk to maintain consistency.

Percona's "pt-config-diff" can be used to compare a node's current variables to its configuration file.

Take care!


Subscribe to MySQL, Galera Cluster and MariaDB support and services aggregator - FromDual all (en)