The organisation I'm working for has SQL Server 2005 merge replication up and running on 11 servers in 3 different countries.
Ocasionally we experience problems caused by rows in a logical record relationship being replicated between servers in the wrong order, i.e. an attempt is made to insert child rows before the parent. I am not able to reproduce this behaviour on my test servers.
Has anyone else experienced this problem?
It's entirely possible for a child row to be inserted before parent row, but it should be greatly reduced in SQL 2005. What kind of errors are you having?|||i'm sorry, i misspoke. the whole point of logical records is that a logical record relationship is committed in one transaction, the whole logical record should be committed as one transaction.
Can you tell me what the errors are that you're seeing, and can you verify that the article that's failing is marked for logical records?
|||We have a master database filtered and published to 2 separate databases on the same cluster (overlapping partitions).
Each of these 2 databases is republished to another 4 servers, where it is republished again.
Users update a local SQL Express database which is programmatically synchronised (anonymous pull subscription, via RMO) with one of the 4 servers when a transaction completes.
Very occasionally, if a user updates an existing parent row and then adds a new parent row with all its children, and both programmatic synchronisations occur within the 60 second polling interval on the server, then the data is passed up to the cluster as a parent update, child inserts and a parent insert, in that order. As a result the child rows in the server and user databases are deleted. I have not been able to reproduce the problem in the test environment but I have enough logging in place to know exactly what's happening.
I know that the logical row relationships exist because I can see them in sysmergesubsetfilters but sp_helpmergefilter won't show them to me unless they're filter_type 3 (filter and logical record relationship).
|||Sorry but it's not clear to me what the problem is. If you update parent row with ID1, and insert new parent row with ID2 and child row with FK=ID2, why would the child rows in the server and user dbs get deleted? And for which ID are they getting deleted, ID1? Are you saying the updates are moving the row out of partition?|||The child rows for ID2 get deleted.
Inserting the child rows into the 3rd level database is not a problem because we have NOT FOR REPLICATION on the foreign keys.
When the 2nd level merge agent runs it receives the child rows before the parent and therefore appears not to be able to determine which partition they belong to, so deletes them.
Incidentally, we have the NOT FOR REPLICATION clause to prevent the converse problem, where child rows get deleted before the parent.
Currently all article filters are type 3 (filter and logical record). Changing filter type on the 2nd and 3rd levels to 2 (logical record only) appears to make no difference in my test environment because I can't reproduce the problem to start with.
|||That should be:
Incidentally, we have the NOT FOR REPLICATION clause to prevent the converse problem, where child rows get deleted *after* the parent.
|||Books Online says NFR (Not For Replication) cannot be placed on FK constraint involved with logical records. Are you sure the articles are involved in logical record relationship, that's what it seems you're implying above. Logical records should be commiting the whole thing as one txn, so even if the child goes before the parent, it shouldn't matter in a single transaction. Can you verify that the articles involved are indeed part of logical records?
|||OK, I read sp_addmergearticle in BOL and understand what option 3 is, I guess this would allow you to add NFR to your FK constraint. May I ask why you have it set to 3 instead of 2?|||It wasn't intentional.
I scripted the sp_addmergearticle definitions from the top-level database and used them. It seemed to work, at least to start with....
|||Can I have the URL for that BOL reference please?
I believe the articles are involved in a logical record relationship because I can see them in sysmergesubsetfilters and because sp_helpmergefilter says so.
However, I take your point that the relationship might not be working at all as opposed to working intermittently although I'm surprised at how rarely it causes a problem.
|||See topic "Grouping Changes to Related Rows with Logical Records", http://msdn2.microsoft.com/en-us/library/ms152507.aspx.
Easiest way to fix the problem is to set @.filter_type = 2, this will ensure your relational changes get committed all at once. Other than that, the fact that a child row goes before a parent row is by design (as long as @.filter_type <> 2), it's possible for this to happen. Do know that this doesn't pose any pure data loss, it just seems that way. This usually happens with some weird timing issue, or if your connection gets broken mid-sync. What happens is the child row insert gets to the publisher, the publisher will see the child row originated from the subscriber, but knows there's no parent row to match the filter criteria, thus send a delete back down to the subscriber. But as soon as the parent row makes it to the publisher, it will recognize the child row doesn't exist at the subscriber and send the child insert back down to the subscriber. This works only if NFR is enabled, which I believe is what you have, without it you'd probably get conflicts.
|||Thanks.
We have this change scheduled for 29/5 (due to the need to re-initialise subscriptions) and I'll let you know what happens.
|||Problem not solved.
We completely rebuilt the replication infrastructure from scratch using filter_type 2 and with no NFR clause on the foreign keys supporting the logical record relationships.
What we're now seeing is very clearly a bug in SQL Server.
If the republisher begins synchronising with the publisher and, before it finishes, a client synchronises its anonynous pull subscription with the republisher, then some of the client changes are lost (present on the republisher but never uploaded to the publisher)
Here are the messages captured during programmatic synchronisation on the client:
06/06/2007 11:15:32 Initializing
06/06/2007 11:15:32 Connecting to Publisher 'TT-SQL1\SQL2005'
06/06/2007 11:15:32 Retrieving publication information
06/06/2007 11:15:32 Retrieving subscription information.
06/06/2007 11:15:32 Uploading data changes to the Publisher
06/06/2007 11:15:33 Enumerating deletes in all articles
06/06/2007 11:15:33 Enumerating inserts and updates in article 'Team_Session'
06/06/2007 11:15:33 Uploaded 1 change(s) in 'Outing' (1 update): 1 total
06/06/2007 11:15:33 Uploaded 1 change(s) in 'Stint' (1 update): 1 total
06/06/2007 11:15:33 Uploaded 1 change(s) in 'Lap' (1 insert): 1 total
06/06/2007 11:15:33 Downloading data changes to the Subscriber
06/06/2007 11:15:34 Merge completed after processing 3 data change(s) (1 insert(s), 2 update(s), 0 delete(s), 0 conflict(s)).
06/06/2007 11:15:34 Synchronisation complete
06/06/2007 11:15:59 Initializing
06/06/2007 11:15:59 Connecting to Publisher 'TT-SQL1\SQL2005'
06/06/2007 11:15:59 Retrieving publication information
06/06/2007 11:15:59 Retrieving subscription information.
06/06/2007 11:16:00 Uploading data changes to the Publisher
06/06/2007 11:16:01 Enumerating deletes in all articles
06/06/2007 11:16:01 Enumerating inserts and updates in article 'Team_Event'
06/06/2007 11:16:01 Enumerating inserts and updates in article 'Team_Session'
06/06/2007 11:16:01 Uploaded 1 change(s) in 'EventTyreset' (1 update): 1 total
06/06/2007 11:16:01 Uploaded 1 change(s) in 'Outing' (1 insert): 1 total
06/06/2007 11:16:01 Uploaded 27 change(s) in 'Outing_AeroComponent' (27 inserts): 27 total
06/06/2007 11:16:01 Uploaded 8 change(s) in 'Outing_CofGComponent' (8 inserts): 8 total
06/06/2007 11:16:01 Uploaded 1 change(s) in 'Stint' (1 insert): 1 total
06/06/2007 11:16:01 Uploaded 16 change(s) in 'Outing_BallastComponent' (16 inserts): 16 total
06/06/2007 11:16:01 Downloading data changes to the Subscriber
06/06/2007 11:16:02 Merge completed after processing 54 data change(s) (53 insert(s), 1 update(s), 0 delete(s), 0 conflict(s)).
06/06/2007 11:16:02 Synchronisation complete
And here's what happened on the republisher (in reverse chronological order):
No data needed to be merged.
06/06/2007 11:19:54
No data needed to be merged.
06/06/2007 11:19:41
No data needed to be merged.
06/06/2007 11:18:30
No data needed to be merged.
06/06/2007 11:18:18
Uploaded 1 change(s) in 'EventTyreset' (1 update): 11 total
06/06/2007 11:18:17
Uploading data changes to the Publisher
06/06/2007 11:16:48
No data needed to be merged.
06/06/2007 11:16:32
Uploaded 1 change(s) in 'Lap' (1 insert): 4 total
06/06/2007 11:16:27
Uploaded 1 change(s) in 'Stint' (1 update): 13 total
06/06/2007 11:16:25
Uploaded 1 change(s) in 'Outing' (1 update): 13 total
06/06/2007 11:16:19
Uploading data changes to the Publisher
06/06/2007 11:14:50
No data needed to be merged.
I am desperate to get a solution to this problem. Are there any post-SP2 hot fixes affecting replication?
Thanks in advance.
|||We are seeing something very similar. Updates from a CE database (SDF file) are coming up to the republisher database, but not getting to the Master database. Results are inconsistent at best. The replication seems to be working properly if a direct insert is done at either the republisher level or the Master level. We have not seen a failure in that scenario. We would also like an answer to the problem.
No comments:
Post a Comment