Improving a Process That Updates Data :: SQL Server 2012
The same company that hired you to consult about its inefficient nightly process from the first
scenario hires you again. They ask for your advice regarding the following update processes:
- The database has a table holding about 100 million rows. About a third of the existing
rows need to be updated. Can you provide recommendations as to how to handle the
update in order not to cause unnecessary performance problems in the system?
- There’s an UPDATE statement that modifies rows in one table based on information
from related rows in another table. The UPDATE statement currently uses a separate
subquery for each column that needs to be modified, obtaining the value of the
respective column from the related row in the source table. The statement also uses a
subquery to filter only rows that have matches in the source table. The process is very
slow. Can you suggest ways to improve it?
Answers
- The customer should consider developing a process that handles the large update in
chunks. If done in one big transaction, the process will very likely result in a significant
increase in the transaction log size. The process will also likely result in lock escalation
leading to blocking problems.
- The customer should consider using an UPDATE statement based on a join instead of
the existing use of subqueries. The amount of code will be significantly reduced, and
the performance will likely improve. Each subquery requires a separate visit to the
related row. So using multiple subqueries to obtain values from multiple columns will
result in multiple visits to the data. With a join, through one visit to the matching row,
you can obtain any number of column values that you need.