首页 > Mysql数据库 > load MySQL极慢,为什么?

load MySQL极慢,为什么?

2015年5月21日

load MySQL极慢,为什么?
http://blog.chinaunix.net/uid-20778583-id-3832897.html
之前对MySQL的2种主要storage engine 认识不够,前不久一次数据库迁移让我深入了解了下。

背景,尽可能减少服务中断时间,把DB从机器1迁移到另外一个更高specification的机器上2上。
1.首先我mysqldump发现时间非常久,拷贝过去load大概20分钟发现太慢,遂根据dev的建议直接select out 然后再load data infile,这种速度要快一些。
mysql> load data infile ‘/export/dump/unip_com_placment.csv’ into table history_placement_daily FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\n’;

Query OK, 66574783 rows affected (1 hour 2 min 44.66 sec)
Records: 66574783 Deleted: 0 Skipped: 0 Warnings: 0
结果发现耗时62分钟多,记录有6657万+
以下是show innodb status看到的结果:
3 queries inside InnoDB, 0 queries in queue
3 read views open inside InnoDB
Main thread process no. 1485, id 1197824320, state: sleeping
Number of rows inserted 1818403515, updated 2345769085, deleted 18847375, read 713722797864
12708.19 inserts/s, 2.11 updates/s, 3.47 deletes/s, 38588.42 reads/s

由于该表是Innodb引擎,当时这方面知识匮乏,只是记得可以disable索引,后来才知道这个之适用于myisam的,以下是google结果顺便学习以下。
To ignore foreign key constraints during the load operation, issue a SET foreign_key_checks = 0 statement before executing LOAD DATA.
If you use LOAD DATA INFILE on an empty MyISAM table, all nonunique indexes are created in a separate batch (as for REPAIR TABLE). Normally, this makes LOAD DATA INFILE much faster when you have many indexes. In some extreme cases, you can create the indexes even faster by turning them off with ALTER TABLE … DISABLE KEYS before loading the file into the table and using ALTER TABLE … ENABLE KEYS to re-create the indexes after loading the file. See Section 8.3.2.1, “Speed of INSERT Statements”.
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

以下是我事后测试:
alert table 把引擎改成myisam的话,Load是非常快的,因为它只是把数据写到内存,然后依赖操作系统调度把数据flush到磁盘上,
结果如下:
mysql> load data infile ‘/export/dump/mysql/unip_com_placment.csv’ into table history_placement_daily FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\n’;
Query OK, 66574783 rows affected (2 min 14.46 sec)
Records: 66574783 Deleted: 0 Skipped: 0 Warnings: 0
但转成innodb还是很耗时,
mysql> alter table history_placement_daily engine=innodb;
Query OK, 66574783 rows affected (1 hour 2 min 13.82 sec)
因为转成innodb的时候,由于innodb是事务型的,它要写大量的redo log等到磁盘以便恢复等机制导致还是很慢,具体参考下面:
http://dba.stackexchange.com/questions/16395/mysql-insert-performance-innodb-vs-myisam
其实数据量到了这种地步,设置innodb_flush_log_at_trx_commit =0/1/2 以及Innodb_buffer_pool_size没啥改变。因为这个表不过6G的样子,而我们的innodb_buffer_pool_size是12G。

2)我把index给删了
mysql> show create table history_placement_daily\G
history_placement_daily | CREATE TABLE `history_placement_daily` (
`event_date` date NOT NULL,
`network_id` mediumint(9) NOT NULL,
`site_id` int(11) NOT NULL,
`placement_id` mediumint(9) NOT NULL,
`ad_tree_node_id` mediumint(9) NOT NULL,
`event_count` bigint(20) DEFAULT ‘0’ COMMENT ‘impressions’,
`gross_bid` decimal(16,5) DEFAULT ‘0.00000’ COMMENT ‘revenue’
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED

mysql> load data infile ‘/export/dump/mysql/unip_com_placment.csv’ into table history_placement_daily FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘”‘ LINES TERMINATED BY ‘\n’;
Query OK, 66574783 rows affected (18 min 57.85 sec)
这次很快,但是没有索引对于这种6600+万的查询会做full scan , 遍历整个表有多慢,你可想而知,所以必须逐个加索引,
mysql> ALTER TABLE `history_placement_daily` ADD INDEX idx_site_ad_tree_node (`site_id`,`ad_tree_node_id`);
Query OK, 0 rows affected (9 min 7.53 sec)

mysql> ALTER TABLE `history_placement_daily` ADD INDEX idx_network_placement (`network_id`,`placement_id`);
Query OK, 66574783 rows affected (19 min 53.55 sec)
Records: 66574783 Deleted: 0 Skipped: 0 Warnings: 0
……
差不多花了40分钟,总的来看时间都一样。他们基本原理都差不多,load的时候边insert,边建index,和先insert后create index一回事。对于业务允许的才有意义的,可以在空闲的时候慢慢的create index。

The way MySQL creates indexes is by the table and then sorting and adding the indexes. This means that it needs to re-write all the data, and then sort all of the data (not cheap by any means). It depends on your server’s I/O performance and how much ram you can give it.

奇怪的是,ADD index的时候有的没有遍历整个table,但 ALTER TABLE `history_placement_daily` ADD INDEX idx_network_placement (`network_id`,`placement_id`);
是做了full scan, 区别在什么地方呢?
所以对于这种情况,要么用myisam(replication-ignore-db,需要复制的话master一般都是innodb,主要为了数据安全起见。slave是 myisam可能有外键问题,因为它不支持外键),或者干脆infobright, 要么业务允许事后建index。问题关键是磁盘I/O,假如新的server做raid0+1应该快很多。

p.s , load为什么比mysqldump然后load快?
mysql的data access layer会做
(1)语法合法性检查,语句是否符合SQL语法规则
(2)语义检查,是否有对应的字段/表等
(3) 获得对象的锁,mysql会对查询对象加锁保障数据一致性。
(4)权限检查,accout是否有权限access该db/table等
dump出来的是执行一次分析一次,而load批量插入只分析一次。
create index做了什么?
index是MySQL这个RDMS自己内部维护的一个数据结构,主要是B+/B-树,建立索引的过程就是6657w+个记录创初始化一个树的过程:
http://www.cnblogs.com/biyeymyhjob/archive/2012/07/25/2608412.html
http://blog.codinglabs.org/articles/theory-of-mysql-index.html

分类: Mysql数据库 标签:
本文的评论功能被关闭了.