You are here

To NULL, or not to NULL, that is the question!

As we already stated in earlier articles in this blog [1 and 2] it is a good idea to use NULL values properly in MariaDB and MySQL.

One of my Mantras in MariaDB performance tuning is: Smaller tables lead to faster queries! One consequence out of this is to store NULL values instead of some dummy values into the columns if the value is not known (NULL: undefined/unknown).

To show how this helps related to space used by a table we created a little example:

CREATE TABLE big_null1 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NOT NULL
, c02 VARCHAR(32) NOT NULL
, c03 VARCHAR(32) NOT NULL
, c04 VARCHAR(32) NOT NULL
, c05 VARCHAR(32) NOT NULL
, c06 VARCHAR(32) NOT NULL
, c07 VARCHAR(32) NOT NULL
, c08 VARCHAR(32) NOT NULL
, c09 VARCHAR(32) NOT NULL
, c10 VARCHAR(32) NOT NULL
, c11 VARCHAR(32) NOT NULL
, c12 VARCHAR(32) NOT NULL
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

CREATE TABLE big_null2 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NOT NULL
, c02 VARCHAR(32) NOT NULL
, c03 VARCHAR(32) NOT NULL
, c04 VARCHAR(32) NOT NULL
, c05 VARCHAR(32) NOT NULL
, c06 VARCHAR(32) NOT NULL
, c07 VARCHAR(32) NOT NULL
, c08 VARCHAR(32) NOT NULL
, c09 VARCHAR(32) NOT NULL
, c10 VARCHAR(32) NOT NULL
, c11 VARCHAR(32) NOT NULL
, c12 VARCHAR(32) NOT NULL
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

Now we fill the table with default values (empty string or dummy values) because we do not know yet the contents:

INSERT INTO big_null1 VALUES (NULL, '', '', '', '', '', '', '', '', '', '', '', '');
INSERT INTO big_null1 SELECT NULL, '', '', '', '', '', '', '', '', '', '', '', '' FROM big_null1;
... up to 1 Mio rows

INSERT INTO big_null2
VALUES (NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'
  , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.');
INSERT INTO big_null2
SELECT NULL, 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'
  , 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.', 'Some dummy value.'
  FROM big_null2;
... up to 1 Mio rows

ANALYZE TABLE big_null1;
ANALYZE TABLE big_null2;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2')
 ORDER BY table_name;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
+------------+------------+----------------+-------------+--------------+-----------+

The opposite example is a table which allows NULL values for unknown fields:

CREATE TABLE big_null3 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NULL
, c02 VARCHAR(32) NULL
, c03 VARCHAR(32) NULL
, c04 VARCHAR(32) NULL
, c05 VARCHAR(32) NULL
, c06 VARCHAR(32) NULL
, c07 VARCHAR(32) NULL
, c08 VARCHAR(32) NULL
, c09 VARCHAR(32) NULL
, c10 VARCHAR(32) NULL
, c11 VARCHAR(32) NULL
, c12 VARCHAR(32) NULL
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

Also this table is filled with unknown values but this time with value NULL instead of an empty string:

INSERT INTO big_null3 (id) VALUES (NULL);
INSERT INTO big_null3 (id) SELECT NULL FROM big_null3;
... up to 1 Mio rows

ANALYZE TABLE big_null3;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3')
 ORDER BY table_name
;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
+------------+------------+----------------+-------------+--------------+-----------+

We see, that this table already uses much less space when we make correct use of NULL values...

So let us do some simple query run time tests:

big_null1big_null2big_null3
SELECT * FROM big_nullx1.1 s1.3 s0.9 s
SELECT * FROM big_nullx AS t1
  JOIN big_nullx AS t2 ON t2.id = t1.id
  JOIN big_nullx AS t3 ON t1.id = t3.id
5.0 s5.7 s4.2 s

One of my advices is, to fill the columns with NULL values if possible. So let us try this advice as well:

CREATE TABLE big_null4 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, c01 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c02 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c03 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c04 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c05 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c06 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c07 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c08 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c09 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c10 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c11 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, c12 VARCHAR(32) NULL DEFAULT 'Some dummy value here...!'
, INDEX (c03)
, INDEX (c06)
, INDEX (c09)
);

INSERT INTO big_null4 (id) VALUES (NULL);
INSERT INTO big_null4 (id) SELECT NULL FROM big_null4;
... up to 1 Mio rows

ANALYZE TABLE big_null4;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4')
 ORDER BY table_name
;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
| big_null4  |     998533 |            383 |   382599168 |    118358016 |   6291456 |
+------------+------------+----------------+-------------+--------------+-----------+

So following my advice we fill with NULL values:

UPDATE big_null4
   SET c01 = NULL, c02 = NULL, c03 = NULL, c04 = NULL, c05 = NULL, c06 = NULL
     , c07 = NULL, c08 = NULL, c09 = NULL, c10 = NULL, c11 = NULL, c12 = NULL;

ANALYZE TABLE big_null4;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4')
 ORDER BY table_name;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
| big_null4  |    1047285 |            364 |   381779968 |    126222336 |  33554432 |
+------------+------------+----------------+-------------+--------------+-----------+

It seems like we do not see the effect yet. So lets optimize the table to reclaim the space:

OPTIMIZE TABLE big_null4;

SELECT table_name, table_rows, avg_row_length, data_length, index_length, data_free
  FROM information_schema.tables
 WHERE table_name IN ('big_null1', 'big_null2', 'big_null3', 'big_null4')
 ORDER BY table_name
;
+------------+------------+----------------+-------------+--------------+-----------+
| table_name | table_rows | avg_row_length | data_length | index_length | data_free |
+------------+------------+----------------+-------------+--------------+-----------+
| big_null1  |    1046760 |             37 |    39387136 |     36225024 |   4194304 |
| big_null2  |    1031990 |            264 |   273416192 |     89899008 |   6291456 |
| big_null3  |    1047800 |             26 |    27852800 |     36225024 |   7340032 |
| big_null4  |    1047180 |             30 |    32030720 |     39370752 |   4194304 |
+------------+------------+----------------+-------------+--------------+-----------+

And you see there we get much of the space back... NULL is a good thing!