I’ve been working with a large-ish MySQL table on an EC2 instance. This table has approximately 147MM rows, and I’ve been disappointed with the performance I’ve seen. I took a look at how much performance I could gain by moving to a more powerful EC2 instance, and the answer is “some, but not all upgrades are equal”.
Problem/Test Case
This is the query I’m using as a test case:
Select b.respondent_name, a.tv
From (Select respondent_id,
Sum(tr_tot_transaction_chrg) As tv
From eqr_transaction
Where report_yr = 2007
And report_prd = 3
Group By respondent_id) a
Join eqr_s0_respondent_id b
On b.respondent_id = a.respondent_id
Order By a.tv Desc
Limit 20
Some relevant details:
- The Where clause uses an index to select ~8.3MM rows from the 147MM in the table
- The Group By clause generates ~3000 result rows
- The database is MySQL 5
- The DB files are stored on an EBS device
- All queries are run from the mysql command line client
Results
I ran the query 3 times on each instance. Here is a summary of the results of those timing experiments.
VM information | ET results | ||||
Instance Type | Cost ($/hr) | Run 1 | Run 2 | Run 3 | Average |
m1.small | 0.10 | 180s | 175s | 177s | 177s |
m1.large | 0.40 | 110s | 77s | 77s | 88s |
m1.xlarge | 0.80 | 105s | 76s | 76s | 86s |
c1.medium | 0.20 | 93s | 70s | 70s | 78s |
c1.xlarge | 0.80 | 78s | 47s | 47s | 57s |
Conclusions
The c1.medium Instance Type is a decent upgrade: It offers roughly double the performance for double the price. For this problem, it appears strictly preferable to the m1.large and m1.xlarge Instance Types, and preferable on a “bang-for-the-buck” basis to the c1.xlarge Instance Type.
It is not clear why c1.medium outperformed m1.xlarge, since the latter completely outclasses the former on paper. I suspect either that my testing program was flawed (i.e. EC2 performance is extremely volatile) or that the 64-bit implementation of some piece of software is not all that it might be.