Code StackCats Edoc: MySQL Design tips - how big should varchar be

http://www.sqlskills.com/BLOGS/KIMBERLY/post/Disk-space-is-cheap.aspx

So - to start, I loaded all three databases with roughly 6.7 million rows... and, I made sure everything was clean and contigious so that I'd have the same starting point for all of the tables. I actually strategically started things in one filegroup and then moved things over to another filegroup with 2 files so that I could get some benefits from having multiple files as well (see Paul's excellent post on why a RW filegroup should generally have 2-4 files here: Benchmarking: do multiple data files make a difference?). So, at the initial start I have three databases:
SalesDBInts (inital size with Sales at 6.7 million rows = 334MB):

Customers - has an ever-increasing identity (int) PK (4 bytes)
Employees - has an ever-increasing identity (int) PK (4 bytes)
Products - has an ever-increasing identity (int) PK (4 bytes)
Sales - has an ever-increasing identity (int) PK and FKs to Customers, Employees and Products (row size = 27 bytes)

SalesDBGUIDs (inital size with Sales at 6.7 million rows = 1000MB):

Customers - has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
Employees - has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
Products - has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
Sales - has a randomly generated (using the NEWID() function) GUID PK (16 bytes) and FKs to Customers, Employees and Products (row size 75 bytes)

SalesDBSeqGUIDs (inital size with Sales at 6.7 million rows = 961MB):

Customers - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)

Employees - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)

Products - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)

Sales - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes) and FKs to Customers, Employees and Products (row size 75 bytes)

OK, so here's where the session really starts... I run 10K inserts into the Sales table in each database and then I check and see what happens:

10K rows in SalesDBInts takes 00:17 seconds
10K rows in SalesDBGUIDs takes 05:07 minutes
10K rows in SalesDBSeqGUIDs takes 01:13 minutes

This is already SCARY and should go down into the "Are you kidding me category?"

http://www.bigresource.com/MS_SQL-Difference-advantage-of-varchar-vs-nvarchar-PQfJeW9V.html
http://dba.stackexchange.com/questions/1767/how-do-too-long-fields-varchar-nvarchar-impact-performance-and-disk-usage-ms
http://stackoverflow.com/questions/262238/are-there-disadvantages-to-using-a-generic-varchar255-for-all-text-based-field
what's the disadvantage of defining big varchar fields? - Google Search

Code StackCats Edoc

Sunday, September 9, 2012

MySQL Design tips - how big should varchar be

No comments:

Post a Comment

Contributors