SQL Update Statements
Data Repository Talk

Update Statements - Pay Attention, Be Prepared or Pay the Piper

Grouping by a large number of columns to pull aggregates when a lot of table joins are involved can be slow. Update Statements can be effective, but beware... read on to learn more.


Update statements can be effective and sometimes more efficient than straight SQL.   Grouping by a large number of columns to pull aggregates when a lot of table joins are involved can lead to slow results rendering-- even if you are following the primary keys.   Sometimes it is more efficient to separate out the aggregation to its own query, save to a temp table and then update the final data appropriately using that temp table.  

There are dangers associated with update statements, both obvious and subtle.  Let’s look at one with an obvious danger.  

Updating a table is not an overly difficult task.  Updating it correctly requires you pay attention to the details.  Updating a dictionary table that many users access means you need to update exactly what you want to update and nothing more.  A good start is to check the current contents of the table.  Which column(s) are you updating?  With what data will it be updated?

Suppose you have a “Configuration” table used for SSIS packages.  The Configuration table has columns for the name of the SSIS package, a variable name and a variable value. The variable name and value are used within the SSIS package to dynamically tell the SSIS package to which folder/filename the results should be sent.

As an example, the SSIS package name is ExportCurrentInpatients, the Variable_Name is varAWCurrentInpatientsFile and the Variable_Value is \\Acmeware\SSISOutputs\CurrentInpatientsFile.csv.  When the SSIS package is run the results are output to \\Acmeware\SSISOutputs\CurrentInpatientsFile.csv because the varAWCurrentInpatientsFile is used for the destination.   Now that the SSIS package is ready for “live” production use you want the output file to be put in \\Acmeware\SSISOutputsProd\CurrentInpatientsFile.csv.  We need only change the Variable_Value where Variable_Name =  varAWCurrentInpatientsFile  or Variable_Value = \\Acmeware\SSISOutputs\CurrentInpatientsFile.csv.

Update Configuration
SET Variable_Value = ‘\\Acmeware\SSISOutputsProd\CurrentInpatientsFile.csv
WHERE Variable_Value =’ \\Acmeware\SSISOutputs\CurrentInpatientsFile.csv

This is pretty straight forward.  How do you execute that code?  Running a “SELECT * from Configuration WHERE Variable_Value =\\Acmeware\SSISOutputs\CurrentInpatientsFile.csv’” to verify what you are updating is a great idea.   One safe way to execute the code is to have just that update code in a Query Analyzer window and execute the entire window.  Do you highlight the 3 lines and execute those selectively?  That would be safe so long as you highlight all 3 lines.  But if you are not paying enough attention or you are moving quickly (maybe your highlight hiccups) you might highlight just the first 2 lines and wind up updating all the Variable_Values to that SSISOutputsProd value.  That is not so good.   Then you need to undo that change.  If you made a copy of the Configuration table beforehand (you are prepared) then you could undo the change fairly easily.  Otherwise you need to restore the table from a database backup.   That Piper can be costly.

Now for a subtle update table problem.  The base table is ClaimAttempts.  Its primary keys are SourceID, BillingID, ClaimNumber.  We want to update the ClaimPdAmount, ClaimAdjAmount and PymtRcvdDate for each primary key row entry in ClaimAttempts with the data from the corresponding #BLT table entries.  Where those data are the SUMs of the receipt Amounts, of the adjustment Amounts and the most recent PymtRcvdDate.

UPDATE ClaimAttempts
     SET
     ClaimPdAmount = #BLT.PaidAmount,
     ClaimAdjAmount = #BLT.AdjustedAmount,
     PymtRcvdDate = #BLT.PymtRcvdDate

FROM #BLT

WHERE

     ClaimAttempts.SourceID = #BLT.SourceID
     AND ClaimAttempts.BillingID = #BLT.BillingID
     AND ClaimAttempts.ClaimNumber = #BLT.ClaimNumber>

This subtlety comes from the #BLT table.

SELECT

     BLT.SourceID AS SourceID,
     BLT.BillingID AS BillingID,
     CA.ClaimNumber,  

     SUM(CASE WHEN BLT.Type = 'R' THEN ISNULL (BLT.Amount, 0) ENDAS PaidAmount,
     SUM(CASE WHEN BLT.Type = 'A' THEN ISNULL (BLT.Amount, 0) END) 
                                                               AS AdjustedAmount,
     SUM(CASE WHEN BLT.Type = 'R' THEN 1 ELSE 0 END) AS PaidQuantity,
     SUM(CASE WHEN BLT.Type = 'A' THEN 1 ELSE 0 END) AS AdjustmentQuantity,
     BLT.ServiceDateTime AS PymtRcvdDate
       

INTO #BLT

FROM  livedb.dbo.BarCollectionTransactions BLT WITH (NOLOCK)

INNER JOIN ClaimAttempts CA
     ON CA.SourceID = BLT.SourceID
     AND CA.BillingID = BLT.BillingID
     AND CA.ClaimNumber = BLT.ClaimNumber

GROUP BY
     BLT.SourceID,
     BLT.BillingID,
     CA.ClaimNumber,
     BLT.ServiceDateTime

     

SELECT * from #BLT  WHERE SourceID = ‘AWR’ and BillingID = ‘2283’ and ClaimNumber = ‘GORR8461’ gives us (without the 2 Quantity colums) 4 rows

SourceID     BillingID    ClaimNumber         PaidAmount       AdjustedAmount             PymtRcvdDate

AWR            2283          GORR8461            -1397.20               0.00                            2015-01-20

AWR            2283          GORR8461             1397.20               0.00                            2015-02-03

AWR            2283          GORR8461                   0.00                0.00                            2015-02-10

AWR            2283          GORR8461             1397.20               0.00                             2015-03-15

Can you spot the subtle problem in the #BLT table from the results displayed above?

When the update is done against SourceID, BillingID and ClaimNumber which of the 4 #BLT entries will be the final update?  Just because these results came out ordered by date for this execution of SELECT does not mean that order is guaranteed every time. Given that randomness of result order, the order of the update is random. The row order used each time the update is applied can be different each time.   PaidAmount will be correct 2 times in 4, PymtRcvdDate will be correct 1 time in 4.   So the final updated table is correct only 1/4 the time.  If you were summing for a Total PaidAmount from ClaimAttempts your numbers could change each time you summed that column after the compile.

The subtle mistake is that the PymtRcvdDate can be different for each entry and the update wants the most recent.    To fix it we use the maximum BLT.ServiceDateTime for our grouping.  This generates a single row for SourceID, BillingID, ClaimNumber with a PaidAmount of 1397.20 and a PymtRcvdDate of 2015-03-15.

SELECT
     BLT.SourceID AS SourceID,
     BLT.BillingID AS BillingID,
     CA.ClaimNumber,  
     SUM(CASE WHEN BLT.Type = 'R' THEN ISNULL(BLT.Amount, 0) END)AS PaidAmount,
     SUM(CASE WHEN BLT.Type = 'A' THEN ISNULL (BLT.Amount, 0) END)AS AdjustedAmount,
     SUM(CASE WHEN BLT.Type = 'R' THEN 1 ELSE 0 END) AS PaidQuantity,
     SUM(CASE WHEN BLT.Type = 'A' THEN 1 ELSE 0 END) AS AdjustmentQuantity,
     MAX(BLT.ServiceDateTime) AS PymtRcvdDate      

INTO #BLT

FROM  livedb.dbo.BarCollectionTransactions BLT WITH (NOLOCK)

INNER JOIN ClaimAttempts CA
     ON CA.SourceID = BLT.SourceID
     AND CA.BillingID = BLT.BillingID
     AND CA.ClaimNumber = BLT.ClaimNumber

GROUP BY
     BLT.SourceID,
     BLT.BillingID,
     CA.ClaimNumber

Not enough attention was paid to the different PymtRcvdDate data when selecting into #BLT.

As a final reminder, when running update statements take enough time to pay attention to what you are updating, verify what you will be updating and update only that entry or entries you intend to update.  Having a safety net (make a copy of the table pre-update) isn’t a bad idea.  Confirming that the new data you are updating to is the correct data keeps those subtle problems from showing up.


By Geoff Grouten at 10 Jun 2015, 15:28 PM
Share this post via

Comments

 

Post a comment

Please correct the following:
Tags
Latest Comments
By Bill Presley at 4 May 2015, 17:09 PM

Authors
Glen D'Abate
Ian Proffer
Bill Presley
Jamie Gerardo
Geoff Grouten
Ken MacPherson
Alexis Donnaruma
Taylor Solari
Theresa Jasset
Howard Minor
Categories
Tutorial
Project
Company Information
Reports
News
Event
Subscribe






rss feedRSS