Sunday, February 28, 2010

Testing MySQL Storage engines with ADO.NET Entity framework

All that I really really care about in the current project is that the user be able to import an Excel File full of statistics ultra fast into the database.


Note in all cases selects were WICKED fast. But inserts .... well.


So I tried loading 27397 rows (a random excel file I had lying around) into the database .. one insert at the time.

Here are the results:

InnoDB:
Starting Insert:2/28/2010 4:58:50 PM
Insert Completed:2/28/2010 5:02:36 PM
Insert Duration:00:03:46.9059177



MyIsam:
Starting Insert:2/28/2010 4:53:07 PM
Insert Completed:2/28/2010 4:55:13 PM
Insert Duration:00:02:05.8441979



This confirms to official notes
Faster Selects: InnoDB
Faster Inserts: MyIsam

A basic simple reason is that for v5.1 of MySQL MyIsam does a table lock for writing :)

So MyIsam it is .... but I STILL need something faster..... 2 minutes to import 200 rows of an excel file (with 141+ columns ... data is to be imported with column names as row entries :)  ) is simply UNACCEBTABLE! .

I i need further optimizations ... perhaps a DB redesign!

But the reason for the current strategy is that the user can use any excel file and if certain columns are present the values in other columns are stored against them. Maybe there is a better way.

Anyways, here are some optimizations I tried:

Lower load collation (moved to ascii) : No Effect
Lesser / More Indecies : No drastic effect
PackKeys to zero : No drastic effect
Row Format :Fixed .... slight improvement
Delay Key Write : checked gives slight better performance ... but I kept it disabled.


Finally a breakthrough (well not really) :
Starting Insert:2/28/2010 6:54:19 PM
Insert Completed:2/28/2010 6:55:26 PM
Insert Duration:00:01:06.8178218

A fundamental observation. My code for insertItems was further calling in a loop code that insert/updated the item and called save changes. I modified it to call save changes only once. Also since I need to check if entry already exists I open the connection before the loop and close it afterwards. Thus:

Starting Insert:2/28/2010 7:08:25 PM
Insert Completed:2/28/2010 7:09:11 PM
Insert Duration:00:00:45.4445993

Next Move columns with strings to new column linking back as a key: 
Starting Insert:2/28/2010 8:33:51 PM
Insert Completed:2/28/2010 8:34:25 PM
Insert Duration:00:00:34.2109567


further things can be tried:
Bulk Import
Dynamic Table Based on columns



In All cases no ORM for this table :)

Either ADO.NET it is ... or I put up with this!

Update: I redesigned to recreate a table as per the excel file. 7000 row , 141 columns ..... 10 seconds ... enough said :)