cancel
Showing results for 
Search instead for 
Did you mean: 

Critical issue with MSA

suznCB
Participant
0 Kudos

Dears,

Kindly help in this critica issue:

We have a MSA repication that is managed by 2 rep servers, we def8ned replication definition at RS1, and define subscription at RS2, there is a route between them every thing was fine, we needed to resync the replucation , so when we run drop subscription at RS2, it took long time for some reason, so we deĺete the subscription let's say sub1 manually from rs_subscriptions table at both RSSD1 and RSSD2

NOW WHEN WE TRY TO DEFINE A NEW SUBSCRIPTION, WE GOT THE FOLLOWING ERROR :

ERROR #28186 dSUB( ) - eneric\sub\sub_gen.c(2205)

Error applying command at the primary Replication Server for subscription sub1 for database replication definition def1 with replicate at dr1 site
E. ERROR #1028 dSUB( ) - eneric\sub\sub_gen.c(2205)

Message from server: 'unique key violation
It seems the is another system table at RS1 where should we should delete from it when we deleted from rs_subscriptions table, is this right ?Would you help please how can we overcome this problem?
sladebe
Active Participant

Re: we run drop subscription at RS2, it took long time for some reason

If there was a queue built up for the database associated with the subscription, the drop subscription command would have to wait in the queue to get processed. Queuing could occur in the route queue, the distributor in the downstream repserver, and the destination DSI connection. You can run rs_ticket in the primary db, then look at rs_ticket_history (where cnt=max(cnt)) to see how delayed replication is.

Also SAP note 1987766 might be relevant: "How to manually drop subscription in SAP Replication Server"

Accepted Solutions (1)

Accepted Solutions (1)

Mark_A_Parsons
Contributor
0 Kudos

Without the complete text of the error messages I'm guessing the table in question is rs_repdbs.

From the Reference Manual: rs_repdbs:

Contains information about all of the databases known by a primary Replication Server.
This information is stored when a subscription is entered for a database at a replicate site.
suznCB
Participant
0 Kudos

Dear Mark,

you are right!

In the table rs_repdbs, there was a record for the database that related to the subscription , we deleted it manually,

so the defining subscription executed on the primary rep server, and things fine

after that, , Actually a big issue has happened, when we start synchronizing the primary database by doing:

at RS2:

suspend connection to DRsite.db2

resume connection to DRsite.db2 skip to resync marker

at this point, the transactions start to arrive to the RS2 stable queues cause we didn't suspend users activities on db1 (and we cannot suspend them) and we think it is normal to arrive transactions to the RS2, and the things will be fine as there is a marker,

to continue:

at the primary database db1:

 rs_marker 'resync database'

then dump database db1

when dump complete, at RS2 DSI to DRsite.db2 suspended automatically (it is normal)

at DRsite.db2

load database db2

at RS2:

resume connection to DRsite.db2

the replication failed because of duplication issue!! what is the missing thing we did?

is this related to dump marker in defining subscription?

define subscription sub to db2

FOR DATABASE REPLICATION DEFINITION myrepdef

with primary at db1 replicated to db2 use dump marker

should we suspend the users activities till load database at DRsite.db2 ? if yes, is there another way cause it is at far destination and the database size about 400 GB

so, it will be long time to put the primary database offline?

Mark_A_Parsons
Contributor

You are mixing the steps from two different resync methods; you cannot mix-n-match the steps from these methods.

There are two primary methods of resyncing a replicate database:

  1. drop the subscription, create the subscription (with 'use dump marker'), dump from PDB (causes the DSI to suspend) and load into RDB, resume DSI; at no point do you issue a 'skip to resync marker' or 'rs_marker "resync database"'
  2. suspend/resume DSI with 'skip to resync marker', 'rs_marker "resync database"', dump from PDB (causes the DSI to suspend) and load into RDB, resume DSI; I covered all of this back in August (2022); at no point do you drop and/or recreate the subscription; at no point do you issue any command that includes the clause '(use) dump marker'

Pick one method and stick with it; do not mix-n-match steps from these two different methods. Having said that ...

------------------

If you've managed to screw up the subscription (ie, `check subscription' does not show VALID/VALID at the PRS and RRS) then you can't use method #2; method #2 requires the subscription to be VALID/VALID at the PRS and RRS.

If you've managed to mangle the connection into the RDB then neither method will work as they both require a valid connection into the RDB.

So, to get back to a working state ...

  • drop the (mangled?) subscription
  • make sure you have a valid connection into the RDB; if you have to (re)create the connection then do not provide the 'use dump marker' clause
  • method #1: continue with steps to create the subscription followed by the dump and load
  • method #2: you will need to create the subscription with the 'use dump marker' clause; (the 'user dump marker' clause will be needed for the initial creation of the subscription so that transactions coming from the PDB are discarded); at this point the subscription needs to show as VALID/VALID at both the PRS and RRS, and the DSI should have a status of waiting for dump marker; now proceed with the steps for method #2

------------------

NONE of the above requires suspension of user activity in the PDB.

What you need to insure is that you have enough disk space (aka stable queue space, aka partitions) in the RS instance(s) to store all replicated transactions while the dump-n-load of the 400 GB database is being performed.

suznCB
Participant
0 Kudos

Dear Mark

kindly note that we did the steps mentioned above with no success, we did them in the following order:

at RS2:

1. sqm_purg_queue

2. suspend connection to RDB

3. resume connection to RDB skip to resync marker

we checked admin disk_space and noticed that RS2 deletes any trans arrived to it before recieving a marker

4. at PDB rs_marker 'resync database'

and we checked the log at RS2 and sure that resync marker had arrived to RS2 which is waiting for dump marker

5. at PDB, we dumped the database and when dump is complete, we checked the log of RS2 and found that dump marker had arrived to it and DSI is suspended, at this point we checked admin disk_space and noticed the trans start to fill in the queue,

at RDB, load the dump, and at RS2, resume connection to RDB,

we got the same previous error about duplication issue for some tables, it is so strange, and unresonable!

while we try to resync MSA, we stopped any job for full dump and incremental dump of PDB, and there is no other issue on PDB, and as you mentioned there is no need to suspend user's activity (we resync MSA around the peak time, is this an issue?)

and referring to previous answer: Syncronizing primary and multiple standby database (WS + MSA) , we can resync MSA alone with no need to resync WS (if this is synchronized) which its Logical connection is the primary of MSA replication definition, is this right?

any hint will be useful to overcome this strange behavior

Regards

Mark_A_Parsons
Contributor
0 Kudos

Why are you running sysadmin sqm_purge_queue?

Do you by any chance:

  • have thresholds defined in the PDB that could have fired and issued a dump command?
  • have triggers enabled on the DSI?
  • have transactions or referential integerity constraints (primary key <--> foreign key) that span multiple databases in the RDS?
  • have any active dsi_command_convert settings on the DSI?

Have you verified the table(s) generating the duplicate key errors are in the RDB that you just loaded, or are they in another database in the RDS?

General ideas for 'next step' ...

  • troubleshoot the individual duplicate key errors, find the associated DML commands in the DSI/outbound queue, determine if there could be an explanation (eg, insert on tableA causing duplicate key error on tableB => possible issue with trigger?, etc)
  • if the associated DML command is an INSERT (against the table mentioned in the duplicate key error) then as a temporary workaround consider enabling dsi_command_convert/i2di on the table(s) in question; verify this addresses the duplicate key error; after some time remove the dsi_command_convert setting (there is a small performance degradation while enabled)
  • verify there is *NO* direct write activity against the RDB by any connections other than the DSI connection

If still stumped and/or duplicate key errors continue then open a case with SAP tech support (if you haven't already) and/or consider bringing in an external contractor (but make sure the person actually has extensive experience with the RS product). This newsgroup/forum really isn't the right medium for detailed troubleshooting of this nature.

Answers (0)