Recently I’ve had to write a data migration for SQL Server to split a large table (28 million rows) into separate tables. Some notes here on my thoughts…
Firstly, SQL Server has INSERT…SELECT syntax which allows you to copy from one table to another. It seems like any solution will be based around using this.
Secondly, my assumption is that for a large migration, we’ll need to run in batches, with a transaction for each batch, as it will take too long to run in a single transaction.
One first idea was to write something like this, and run it inside a loop, breaking out when no more rows were being copied:
BEGIN TRANSACTION X SET IDENTITY_INSERT TARGET_TABLE ON INSERT INTO TARGET_TABLE (ID, VERSION, TSTAMP, ...other fields here ) select top 100000 source.ID, source.VERSION, source.TSTAMP, ...other fields here from source_db.dbo.source_table source left outer join TARGET_TABLE x on source.id = x.id where source.item_type = 'REQUIRED_TYPE' and x.id is null SELECT @rows_processed = @@rowcount SET IDENTITY_INSERT TARGET_TABLE OFF COMMIT TRANSACTION X
However, testing this with millions of rows suggests it is taking too long to perform the left join, as the time to do the join increases with every batch, as we add rows to the target table. As the target table is a new table, and hence has no rows to begin with, and we have an integer primary key, I ended up changing the where condition on the INSERT..SELECT to the following:
WHERE source.ITEM_TYPE = 'REQUIRED_TYPE' AND source.ID > (SELECT COALESCE(MAX(ID),0) FROM TARGET_TABLE) ORDER BY source.ID
This means there is no join, just identifying the max id. Because we are ordering by the id and that is the primary key, there is no sorting required. In my testing, this took around 1 min 20 seconds to copy 3 million rows, compared to around 20 minutes for the join based approach.
I also had to migrate audit table data. This is interesting for two reasons. Firstly the audit tables don’t have a single primary key, but rather a composite. Secondly, the target tables already have some data, so this is more of a merge than just a copy. For this reason I ended up still using a left join for the audit data migration. I experimented with the batch size. A size of 10,000 took 28 minutes to copy 3 million rows, whereas a batch size of 100,000 took 18 minutes. This makes sense if the join is taking a long time, as we’ll be reducing the number if times we do the join by a factor of 10. I suspect the performance of the audit migration could be improved by changing the implementation so that the join is only performed once. e.g. perform initial join to check what rows need to be copied over, and store that either in a physical table or table variable, ordered by the same composite primary key. Then use a cursor or offset fetch to get the next batch of ids that need to be copied and use that to join with the source table to get the rows to copy. However my current implementation is fast enough, so this is one to investigate in a future migration perhaps.
Finally it is interesting to note that in my first implementation, I used some count(*) queries so that I could report progress in the logs. However with millions of rows, the time taken to perform these queries becomes non-trivial if you are doing a count with every batch. In my testing it could take 5-6 seconds to count with around 3 million rows, so doing that for 100 batches would mean 10 minutes just performing a count.
Insert select: https://www.w3schools.com/sql/sql_insert_into_select.asp