Reviewing the Use of Types :: SQL Server
You are hired as a consultant to help address performance issues in an existing system. The system was developed originally by using SQL Server 2005 and has recently been upgraded to SQL Server 2012. Write rates in the system are fairly low, and their performance is more than adequate. Also, write performance is not a priority. However, read performance is a priority, and currently it is not satisfactory. One of the main goals of the consulting engagement is to provide recommendations that will help improve read performance. You have a meeting with representatives of the customer, and they ask for your recommendations in different potential areas for improvement. One of the areas they inquire about is the use of data types. Your task is to respond to the following customer queries:
- We have many attributes that represent a date, like order date, invoice date, and so on, and currently we use the DATETIME data type for those. Do you recommend sticking to the existing type or replacing it with another? Any other recommendations along similar lines?
- We have our own custom table partitioning solution because we're using the Standard edition of SQL Server. We use a surrogate key of a UNIQUEIDENTIFIER type with the NEWID function invoked by a default constraint expression as the primary key for the tables. We chose this approach because we do not want keys to conflict across the different tables. This primary key is also our clustered index key. Do you have any recommendations concerning our choice of a key?
Answers
- The DATETIME data type uses 8 bytes of storage. SQL Server 2012 supports the DATE
data type, which uses 3 bytes of storage. In all those attributes that represent a date
only, it is recommended to switch to using DATE. The lower the storage requirement,
the better the reads can perform.
As for other recommendations, the general rule "smaller is better, provided that you
cover the needs of the attribute in the long run" is suitable for read performance. For
example, if you have descriptions of varying lengths stored in a CHAR or NCHAR type,
consider switching to VARCHAR or NVARCHAR, respectively. Also, if you’re currently
using Unicode types but need to store strings of only one language -- say, US English --
consider using regular characters instead.
- For one, the UNIQUEIDENTIFIER type is large -- 16 bytes. And because it's also the
clustered index key, it is copied to all nonclustered indexes. Also, due to the random
order in which the NEWID function generates values, there's probably a high level
of fragmentation in the index. A different approach to consider (and test!) is switching
to an integer type and using the sequence object to generate keys that do not
conflict across tables. Due to the reduced size of the type, with the multiplied effect
on non-clustered indexes, performance of reads will likely improve. The values will be
increasing, and as a result, there will be less fragmentation, which will also likely have a
positive effect on reads.