Friday, October 21, 2011

Fixing damaged pages using page restore or manual inserts

Here's an interesting scenario that cropped up today. You have a database on a RAID array that failed and has zero'd out a page. How can you get the data back?




There are two ways to do it, depending on the database recovery model and version of SQL Server - single-page restore or manual insert/select - both of which rely on you having a backup of the database. You can use single-page restore if you're on SQL Server 2005 and the database is in Full or Bulk-Logged recovery mode, otherwise you need to use the manual method, and that will only work as long as you know the data being salvaged hasn't changed since the last backup.



Let's try them both. Here's a script to create a test database and make a backup of it:



-- Create the database.



USE master;



GO



CREATE DATABASE dbccpagetest;



GO



ALTER DATABASE dbccpagetest SET RECOVERY FULL;



GO



-- Create a table to play with.



USE dbccpagetest;



GO



CREATE TABLE sales (



salesID INT IDENTITY,



customerID INT DEFAULT CONVERT (INT, 100000 * RAND ()),



salesDate DATETIME DEFAULT GETDATE (),



salesAmount MONEY);



CREATE CLUSTERED INDEX salesCI ON sales (salesID);



GO



-- Populate the table



SET NOCOUNT ON;



GO



DECLARE @count INT



SELECT @count = 0



WHILE (@count < 5000)



BEGIN



INSERT INTO sales (salesAmount) VALUES (100 * RAND ());



SELECT @count = @count + 1



END;



GO



-- Take a full backup.



BACKUP DATABASE dbccpagetest TO DISK = 'C:\dbccpagetest.bak' WITH INIT;



GO



I'm going to simulate our scenario by shutting down the database and using a hex editor to zero out page 158 of the database. (This translates to byte offset 1294336 of the file being zero'd for 8192 bytes).



Now if I run checkdb, I get the following:



Msg 8909, Level 16, State 1, Line 1



Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:158) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).



CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object.



Msg 8928, Level 16, State 1, Line 1



Object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data): Page (1:158) could not be processed. See other errors for details.



CHECKDB found 0 allocation errors and 1 consistency errors in table 'sales' (object ID 2073058421).



CHECKDB found 0 allocation errors and 2 consistency errors in database 'dbccpagetest'.



repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (dbccpagetest).



What does the page look like?



DBCC TRACEON (3604);



GO



DBCC PAGE (dbccpagetest, 1, 158, 3);



GO



DBCC execution completed. If DBCC printed error messages, contact your system administrator.



PAGE: (0:0)



BUFFER:



BUF @0x02C0632C



bpage = 0x04C12000 bhash = 0x00000000 bpageno = (1:158)



bdbid = 9 breferences = 0 bUse1 = 37241



bstat = 0xc00009 blog = 0x89898989 bnext = 0x00000000



PAGE HEADER:



Page @0x04C12000



m_pageId = (0:0) m_headerVersion = 0 m_type = 0



m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x200



m_objId (AllocUnitId.idObj) = 0 m_indexId (AllocUnitId.idInd) = 0 Metadata: AllocUnitId = 0



Metadata: PartitionId = 0 Metadata: IndexId = -1 Metadata: ObjectId = 0



m_prevPage = (0:0) m_nextPage = (0:0) pminlen = 0



m_slotCnt = 0 m_freeCnt = 0 m_freeData = 0



m_reservedCnt = 0 m_lsn = (0:0:0) m_xactReserved = 0



m_xdesId = (0:0) m_ghostRecCnt = 0 m_tornBits = 16777216



Allocation Status



GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED



PFS (1:1) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (1:6) = NOT CHANGED



ML (1:7) = NOT MIN_LOGGED



Msg 2514, Level 16, State 5, Line 2



DBCC PAGE error: Invalid page type - dump style 3 not possible.



Note the error at the end of the output - DBCC PAGE can't do an in-depth dump because it doesn't know what page type the page is. Let's try a full page hex dump using dump style 2 instead:



DBCC PAGE (dbccpagetest, 1, 158, 2);



GO



PAGE: (0:0)







DATA:



Memory Dump @0x44F3C000



44F3C000: 00000000 00020000 00000000 00000000 †................



44F3C010: 00000000 00000000 00000000 00000000 †................







44F3DFE0: 00000000 00000000 00000000 00000000 †................



44F3DFF0: 00000000 00000000 00000000 00000000 †................



DBCC execution completed. If DBCC printed error messages, contact your system administrator.



It really is all zero. First we'll fix it using page restore.



USE master;



GO



RESTORE DATABASE dbccpagetest PAGE = '1:158' FROM DISK = 'C:\dbccpagetest.bak';



GO



Processed 1 pages for database 'dbccpagetest', file 'dbccpagetest' on file 1.



The roll forward start point is now at log sequence number (LSN) 32000000047000001. Additional roll forward past LSN 33000000001700001 is required to complete the restore sequence.



RESTORE DATABASE ... FILE= successfully processed 1 pages in 0.176 seconds (0.046 MB/sec).



Isn't that cool? You can restore up to 1000 single pages from a backup at a time. For VLDBs, this cuts the recovery time WAY down. Now we need to roll forward the log. We don't have any more log backups so we can finish the roll forward by backing up and restoring the tail of the log.



-- Need to complete roll forward. Backup the log tail...



BACKUP LOG dbccpagetest TO DISK = 'C:\dbccpagetest_log.bak' WITH INIT;



GO



-- ... and restore it again.



RESTORE LOG dbccpagetest FROM DISK = 'C:\dbccpagetest_log.bak';



GO



Processed 5 pages for database 'dbccpagetest', file 'dbccpagetest_log' on file 1.



BACKUP LOG successfully processed 5 pages in 0.146 seconds (0.248 MB/sec).



Processed 0 pages for database 'dbccpagetest', file 'dbccpagetest' on file 1.



RESTORE LOG successfully processed 0 pages in 0.004 seconds (0.000 MB/sec).



And now we should have a clean database:

DBCC CHECKDB (dbccpagetest) WITH NO_INFOMSGS;



GO



Command(s) completed successfully.



Easy. But what if we can't do a page restore? Assuming I've corrupted the database in exactly the same way again, the first thing is to do is make sure we can restore the backup and then see what data range is on that page:



RESTORE DATABASE dbccpagetest_copy FROM DISK = 'C:\dbccpagetest.bak' WITH



MOVE N'dbccpagetest' TO N'C:\dbccpagetest_copy.mdf',



MOVE N'dbccpagetest_log' TO N'C:\dbccpagetest_log.ldf',



REPLACE;



GO



DBCC PAGE (dbccpagetest_copy, 1, 158, 3);



GO



Processed 184 pages for database 'dbccpagetest_copy', file 'dbccpagetest' on file 1.



Processed 2 pages for database 'dbccpagetest_copy', file 'dbccpagetest_log' on file 1.



RESTORE DATABASE successfully processed 186 pages in 0.361 seconds (4.205 MB/sec).



PAGE: (1:158)



BUFFER:



BUF @0x02BE8D38



bpage = 0x03FB4000 bhash = 0x00000000 bpageno = (1:158)



bdbid = 10 breferences = 1 bUse1 = 38283



bstat = 0xc00009 blog = 0x159a2159 bnext = 0x00000000



PAGE HEADER:



Page @0x03FB4000



m_pageId = (1:158) m_headerVersion = 1 m_type = 1



m_typeFlagBits = 0x4 m_level = 0 m_flagBits = 0x8200



m_objId (AllocUnitId.idObj) = 68 m_indexId (AllocUnitId.idInd) = 256



Metadata: AllocUnitId = 72057594042384384



Metadata: PartitionId = 72057594038386688 Metadata: IndexId = 1



Metadata: ObjectId = 2073058421 m_prevPage = (1:157) m_nextPage = (1:159)



pminlen = 28 m_slotCnt = 245 m_freeCnt = 11



m_freeData = 7691 m_reservedCnt = 0 m_lsn = (24:453:8)



m_xactReserved = 0 m_xdesId = (0:0) m_ghostRecCnt = 0



m_tornBits = -1020457745



Allocation Status



GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED



PFS (1:1) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL DIFF (1:6) = NOT CHANGED



ML (1:7) = NOT MIN_LOGGED



Slot 0 Offset 0x60 Length 31



Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP



Memory Dump @0x4542C060



00000000: 10001c00 d5030000 5bd30000 3f771101 †........[...?w..



00000010: b9980000 baa10a00 00000000 0500e0††††...............



UNIQUIFIER = [NULL]



Slot 0 Column 1 Offset 0x4 Length 4



salesID = 981



Slot 0 Column 2 Offset 0x8 Length 4



customerID = 54107



Slot 0 Column 3 Offset 0xc Length 8



salesDate = Jan 17 2007 4:35PM



Slot 0 Column 4 Offset 0x14 Length 8



salesAmount = 69.68







Slot 244 Offset 0x1dec Length 31



Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP



Memory Dump @0x4542DDEC



00000000: 10001c00 c9040000 bfa10000 57771101 †............Ww..



00000010: b9980000 c6b80500 00000000 0500e0††††...............



UNIQUIFIER = [NULL]



Slot 244 Column 1 Offset 0x4 Length 4



salesID = 1225



Slot 244 Column 2 Offset 0x8 Length 4



customerID = 41407



Slot 244 Column 3 Offset 0xc Length 8



salesDate = Jan 17 2007 4:35PM



Slot 244 Column 4 Offset 0x14 Length 8



salesAmount = 37.50



DBCC execution completed. If DBCC printed error messages, contact your system administrator.



So we're looking at salesID range 981 to 1225 inclusive. Before we can copy the rows back to the damaged database, we need to get rid of the corrupt page. Repair should delete the page for us. First I'm going to take another backup though - just in case something goes wrong!

BACKUP DATABASE dbccpagetest TO DISK = 'C:\dbccpagetest_corrupt.bak' WITH INIT;



GO



ALTER DATABASE dbccpagetest SET SINGLE_USER;



GO



DBCC CHECKDB (dbccpagetest, REPAIR_ALLOW_DATA_LOSS) WITH NO_INFOMSGS;



GO



ALTER DATABASE dbccpagetest SET MULTI_USER;



GO



Processed 184 pages for database 'dbccpagetest', file 'dbccpagetest' on file 1.



Processed 4 pages for database 'dbccpagetest', file 'dbccpagetest_log' on file 1.



BACKUP DATABASE successfully processed 188 pages in 0.380 seconds (4.052 MB/sec).



Msg 8909, Level 16, State 1, Line 1



Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:158) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).



The error has been repaired.



CHECKDB found 0 allocation errors and 1 consistency errors not associated with any single object.



CHECKDB fixed 0 allocation errors and 1 consistency errors not associated with any single object.



Repair: The Clustered index successfully rebuilt for the object "dbo.sales" in database "dbccpagetest".



Repair: The page (1:158) has been deallocated from object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data).



Msg 8945, Level 16, State 1, Line 1



Table error: Object ID 2073058421, index ID 1 will be rebuilt.



The error has been repaired.



Msg 8928, Level 16, State 1, Line 1



Object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data): Page (1:158) could not be processed. See other errors for details.



The error has been repaired.



Msg 8976, Level 16, State 1, Line 1



Table error: Object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data). Page (1:158) was not seen in the scan although its parent (1:154) and previous (1:157) refer to it. Check any previous errors.



The error has been repaired.



Msg 8978, Level 16, State 1, Line 1



Table error: Object ID 2073058421, index ID 1, partition ID 72057594038386688, alloc unit ID 72057594042384384 (type In-row data). Page (1:159) is missing a reference from previous page (1:158). Possible chain linkage problem.



The error has been repaired.



CHECKDB found 0 allocation errors and 3 consistency errors in table 'sales' (object ID 2073058421).



CHECKDB fixed 0 allocation errors and 3 consistency errors in table 'sales' (object ID 2073058421).



CHECKDB found 0 allocation errors and 4 consistency errors in database 'dbccpagetest'.



CHECKDB fixed 0 allocation errors and 4 consistency errors in database 'dbccpagetest'.



We should check the row count to see that the count has dropped from the initial 5000 rows we inserted:



USE dbccpagetest;



GO

SELECT COUNT (*) FROM SALES;



GO



SELECT COUNT (*) FROM sales WHERE salesID > 980 AND salesID < 1226;



GO



And we're down to 4755 rows, as expected with zero rows in that range. All we need to do now is to copy the missing rows over from the restored copy. Remember, this will only work if you know that the data being salvaged hasn't changed since the backup was taken - otherwise you'll have old and new data mixed in the table which will play havoc with your business. Before we copy the rows, we know we're got an identity column we'd like to preserve so we set IDENTITY_INSERT on which tells the server not to generate new identity values for the inserted rows.

-- Make sure identity values survive.



SET IDENTITY_INSERT sales ON;



GO



-- Insert the missing rows.



SET NOCOUNT OFF;



GO



INSERT INTO sales (salesID, customerID, salesDate, salesAmount)



SELECT * FROM dbccpagetest_copy.dbo.sales AS R



WHERE R.salesID > 980 AND R.salesID < 1226;



GO



-- Restore identity behavior.



SET IDENTITY_INSERT sales OFF;



GO



(245 row(s) affected)



We copy over 245 rows and checking the row count again says we're back to 5000 rows.