Sunday, September 9, 2012

MySQL Design tips - how big should varchar be




http://www.sqlskills.com/BLOGS/KIMBERLY/post/Disk-space-is-cheap.aspx


So - to start, I loaded all three databases with roughly 6.7 million rows... and, I made sure everything was clean and contigious so that I'd have the same starting point for all of the tables. I actually strategically started things in one filegroup and then moved things over to another filegroup with 2 files so that I could get some benefits from having multiple files as well (see Paul's excellent post on why a RW filegroup should generally have 2-4 files here: Benchmarking: do multiple data files make a difference?). So, at the initial start I have three databases:
SalesDBInts (inital size with Sales at 6.7 million rows = 334MB):
  • Customers - has an ever-increasing identity (int) PK (4 bytes)
  • Employees - has an ever-increasing identity (int) PK (4 bytes)
  • Products - has an ever-increasing identity (int) PK  (4 bytes)
  • Sales - has an ever-increasing identity (int) PK and FKs to Customers, Employees and Products (row size = 27 bytes)
SalesDBGUIDs (inital size with Sales at 6.7 million rows = 1000MB):
  • Customers - has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
  • Employees - has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
  • Products - has a randomly generated (using the NEWID() function) GUID PK (16 bytes)
  • Sales - has a randomly generated (using the NEWID() function) GUID PK (16 bytes) and FKs to Customers, Employees and Products (row size 75 bytes)
SalesDBSeqGUIDs (inital size with Sales at 6.7 million rows = 961MB):
  • Customers - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)
  • Employees - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)
  • Products - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes)
  • Sales - has a sequentially generated (using the NEWSEQUENTIALID() function) GUID PK (16 bytes) and FKs to Customers, Employees and Products (row size 75 bytes)
OK, so here's where the session really starts... I run 10K inserts into the Sales table in each database and then I check and see what happens:
  • 10K rows in SalesDBInts takes 00:17 seconds
  • 10K rows in SalesDBGUIDs takes 05:07 minutes
  • 10K rows in SalesDBSeqGUIDs takes 01:13 minutes
This is already SCARY and should go down into the "Are you kidding me category?"




http://www.bigresource.com/MS_SQL-Difference-advantage-of-varchar-vs-nvarchar-PQfJeW9V.html
http://dba.stackexchange.com/questions/1767/how-do-too-long-fields-varchar-nvarchar-impact-performance-and-disk-usage-ms
http://stackoverflow.com/questions/262238/are-there-disadvantages-to-using-a-generic-varchar255-for-all-text-based-field
what's the disadvantage of defining big varchar fields? - Google Search







No comments:

Post a Comment