You are here

Impact of indices on MySQL

It is generally well known that indexes help a lot to speed up database queries (especially SELECT but also UPDATE and DELETE). It is less known, that indexes also have some disadvantages.

One of these disadvantages is, that indexes require space in memory and on disk. An other disadvantage of indexes is, that they slow down DML statements like INSERT and DELETE.

We often see at our customers that they do not realized this behavior. Now I found the time to show the impact of (too) many indexes graphically.

In the following test scenario we created a simple table:

CREATE TABLE `test` (
  `id` INT(10) UNSIGNED NOT NULL,
  `f1` INT(10) UNSIGNED NOT NULL,
  `f2` INT(10) UNSIGNED NOT NULL,
  `f3` INT(10) UNSIGNED NOT NULL,
  `f4` INT(10) UNSIGNED NOT NULL,
  `f5` INT(10) UNSIGNED NOT NULL,
  `f6` INT(10) UNSIGNED NOT NULL,
  `f7` INT(10) UNSIGNED NOT NULL,
  `f8` INT(10) UNSIGNED NOT NULL
) ENGINE = MyISAM;

and added some indexes:

ALTER TABLE test ADD PRIMARY KEY (id);
ALTER TABLE test ADD INDEX (f1);
...

Then we did 10 times 500k simple INSERT statements to this table.

What we can clearly see is for every index we add to the table the INSERTs will take longer and longer.

impact_of_indices.png

impact_of_indices2.png

The database parameters were as follows:

key_buffer_size         = 384M
sort_buffer_size        = 256M
read_buffer             =   8M
myisam_sort_buffer_size = 128M

The INSERTs were running single threaded.

In our example we were just indexing INT values (4 bytes in size). But I predict that it becomes even slower, when we index VARCHAR fields.

This situation will become even worse when you get pressure to your RAM or when the MyISAM key buffer will become too small and/or if you have pressure to your I/O system.

When you go to the limits of your hardware you should find a good balance between as many indexes as necessary but as little indexes as possible.

When you want to get rid of indexes the quick wins are indexes which are completely redundant. This happens when the people who create the indexes are not looking if such an index already exists.

An other quick win are indexes which are partially redundant (from left to right). With such indexes it does often not make sense to keep both of them (except you make use of covering indexes and explicitly specified them for this purpose).

It is a bit more tricky to find indexes which are never used. How this works out we will see in a later post.

Taxonomy upgrade extras: 

Comments

Using MyISAM is just asking for poor performance. Try using a storage engine that is actually good at indexing. For example, InnoDB offers a substantial improvement (perhaps a factor of 4) over MyISAM for maintaining indexes. InnoDB uses an change buffer to achieve that performance. For really high DML operations, try TokuDB (tokutek.com), which is about ten times faster than InnoDB. TokuDB uses fractal tree indexes to achieve its performance. Both InnoDB and TokuDB are transactional and can recover from crashes. MyISAM will corrupt your data on a crash.
Bradley C. Kuszmaulcomment

Moin Bradley, the post is about the increasing cost of maintaining (more) indexes. Secondary indexes with InnoDB are even more expensive. Think about a UUID-PK and every secondary saved separately to the PK. So I doubt there is no increasing cost using InnoDB. (And only this would counter the essence of the blog.) I don't know TokuDB, but as TokuDB is *not* GPL who cares?
erkulescomment

You are correct that indexes cost something in InnoDB. However, if you measure InnoDB you will likely find it is much faster than MyISAM for insertions when maintaining an index on a large table. My measurements (and others, such as Mark Callaghan at Facebook) say that InnoDB is faster (maybe 4x) for this kind of load. Also you are correct that some people will not consider using a non-GPL storage engine, and that's fine. That's part of why I am pointing out that InnoDB is far better suited than MyISAM for the load you measured. Some people are willing to use a non-GPL storage engine if it offers some other advantage, however. In the case of TokuDB, one advantage is more than a 10x performance improvement for this kind of load. (There are other advantages such as compression.) I read your home page, which says you offer "vendor neutral" consulting, and that you are in the business of recommending the best solution to meet your clients needs. I hope that there are some situations where TokuDB would be the best solution to meet your clients needs, and if you find such a client, I hope we can work together. -Bradley
Bradley C. Kuszmaulcomment

Let me summarize something. You don't disagree the cost of maintaining indexes as this blog describes. You are not disagreeing there are "some" costs for innodb either. So there is a flame MyIsam vs. InnoDb left. Which is imho not intended (at least from the author). Btw: Im not working for fromdual, so I don't give a ***** what they do. If you want to read *my* homepage just click on my name;)
erkulescomment

I didn't mean to be flaming, I'm sorry if it seemed that way. And I apologize for not noticing that the response to my comment wasn't from the original author. I guess I got a different message from the original posting. The original posting observed that indexes cost something. But what's the point of that observation? The point is to do something to your database to fix the slowness. The original posting implied that dropping some indexes would be one fix to the problem. The chart says fewer indexes means higher performance. So far, so good. However, there are other things you could do. For example, switch to InnoDB or TokuDB. (There may be other approaches such as using some NoSQL solution, but let's put those aside for now.) I guess I misread the original posting. Upon re-reading, it seems like the data in the original posting doesn't look so bad. After all, inserting 500K rows in 3 minutes instead of 90 seconds to get 5 indexes doesn't seem so bad. Sure, indexes cost *something*, but it's not much on this measurement. (In fact, for such a small database, perhaps MyISAM is faster than InnoDB. But the problems I've seen don't become difficult until the databases get larger.) Here's a much more interesting experiment (at least more interesting to me): What happens if a) the tables are bigger than main memory, and b) the indexes are random (or nearly random) values (things like UUID-PKs). Check out Mark Callaghan's postings on his MySQL at Facebook page to see the advantage that InnoDB offers. (And check out the graphs on Tokutek.com to see the additional advantages that TokuDB offers. Or don't if you really don't want to know about a non-GPL solution.) I guess my point is that if you are finding that the cost of maintaining indexes is slowing down your DML, dropping the indexes isn't your only option. You may like the indexes because they speed up queries. Given a fixed hardware budget and a fixed insertion rate, you can keep many more indexes than with MyISAM. I want people to be able to use indexes without worrying so much about the cost of maintaining them.
Bradley C. Kuszmaulcomment

I forgot to mention in the main post that I did these tests with 5.1.50. Settings for MyISAM this time:
key_buffer_size         = 8M
sort_buffer_size        = 2M
read_buffer_size        = 128k
myisam_sort_buffer_size = 8M
Settings for InnoDB:
# plugin not built-in
innodb_file_per_table          = 1
innodb_log_file_size           = 128m
innodb_buffer_pool_size        = 384m
innodb-flush_log_at_trx_commit = 0
innodb_log_files_in_group      = 3
innodb_support_xa              = 0
InnoDB table WITHOUT explicit PK and WITH explicit PK:
INDEX: name GEN_CLUST_INDEX, id 0 353, fields 0/12, uniq 1, type 1
  FIELDS:  DB_ROW_ID DB_TRX_ID DB_ROLL_PTR id f1 f2 f3 f4 f5 f6 f7 f8

INDEX: name PRIMARY, id 0 356, fields 1/11, uniq 1, type 3
 FIELDS:  id DB_TRX_ID DB_ROLL_PTR f1 f2 f3 f4 f5 f6 f7 f8
Impact of Indices
Impact of Indices
It seems indeed, that InnoDB has a less linear overhead than MyISAM (at least in my test) as long as the data can be kept in Memory. An thus it seems to perform slightly better in the described scenario when we have more than 1 to 2 indexes. Do not get me wrong, I do NOT promote MyISAM here I just want to find the truth. Some forces (good or bad ones?) hindered me to download Tokutek. I take it as a sign to stay away and stay with clean open source.
Shinguzcomment

Thanks, I appreciate the effort you are taking to find the truth. -Bradley
Bradley C. Kuszmaulcomment