Posting this here in case anyone else runs into it. It applied not only to Linux but to UNIX (HP-UX in my case.)
We are using the Hitachi VSP disk array with our UNIX and Linux clients. For large Oracle databases we do a Shadow Image copy from a P-VOL (primary volume) to a S-VOL (secondary volume). We grouped pairs together in our /etc/horcm0.conf (Primary volumes) and /etc/horcm1.conf (secondary volumes).
Each night via cron jobs/scripts we did a process wherein we:
1) Unmount the filesystems mounted from the secondary volumes on server from which the backup to tape is run.
Disks should already be in PSUS (split) mode at start and finish of this step as viewed with the command: pairdisplay -g <group> -CLI -l
2) Run "pairresync -g <group>" to start synchronization from the primary disks to the secondary disks. Where group is the name we've given tot his set of disks (e.g. ERP for our main Enterprise and Resource Planning database disks, PAYR for our main payroll database disks).
Disks should go to COPY mode at start of pairresync and then to PAIR when the synchonization is complete for each pair of disks in the group. For smaller databases it might go to PAIR very quickly so you might not see it in COPY before that.
This is typically done a few hours before we plan to split the disks in order to allow time for the synchronization to complete.
3) A few hours later we put the database into hot backup mode or (once a week) shut it down completely for a cold backup.
4) After DB is down or in hot backup mode we check the status of the group and if all disks are in PAIR status we run "pairsplit -g <group>" to split the disks. Disks should show mode as PSUS after this is complete.
5) Once the disks are split we take the DB out of hot backup mode (or restart if it was shutdown for the cold backup). We also have a routine that automatically does this if for any reasons the disks aren't split after 90 minutes. This prevents keeping the DB down due to Shadow Image issues and insures our business can resume operations.
6) Assuming the pairsplit above was successful we mount the secondary volumes (now split) onto the server from which we do the backups. (Note "mount" here means we mount the filesystems - we do not start or "mount" the DB in the Oracle context of the word mount.)
7) We kick off a backup job of these secondary disks.
The above process is repeated each night.
A few weeks back we had an issue with one of our database filesystems on the backup server hanging each night. As this seemed indicative of corruption of the filesystem I reasoned that instead of doing a pairresync a paircreate should be done. Where pairresync simply gets deltas since the split was done after last sync the paircreate is used to do an entrirely new copy of the disks as if they'd never been synchronized before. (If you're familiar with the way EMC BCV stuff works pairresync is the equivalent of an incremental establish and paircreate is the equivalent of a full establish.)
Prior to doing a paircreate the disks must be in simplex (SMPL) mode. Accordingly I had run:
pairsplit -S -g PAYR = Put the disks in SMPL mode.
pairdisplay to verify the disks are in SMPL then the following script:
Code:
for pair in $(grep ^PAYR /etc/horcm0.conf |grep -v 10.0.5.20 | awk '{print $2}')
do paircreate -m grp 1 -vl -g PAYR -d $pair
done
The above finds all entries labeled for the PAYR pairs from the hocrm0.conf and exlcudes the line that contains the IP of the server which also starts with PAYR as it isn't one of the pairs of disks but starts with PAYR like they do. It then runs the paircreate on each pair found.
I had used the above syntax as it was in the script written by a co-worker who had done the setups and had recently left the company.
We had a set of scripts doing the outlined process above for PAYR and a separate set of scripts doing it for ERP. Step 3 for PAYR would run at 6 AM and for ERP at 4:50 AM. After doing the above paircreate though it seemed Step 3 for ERP was also causing the disks for PAYR to go to split (PSUS) mode even though nothing in 4:50 AM scripts dealt with the PAYR pairs (and in fact the server that was doing that particular split didn't even have PAYR in its horcm*conf files.
On troubleshooting an issue with ERP that occurred this past weekend I realized what the issue was. The paircreate command I'd used had specified "-m grp 1" which sets mode to grp and the 1 is the Consistency Group ID (which in the help for paircreate is labled as GID but in other documents is labled as CTGID).
You can determine the CTGID of a group by running:
pairvolchk -g <group>
On checking that I found as I suspected that the ERP group had CTGID value of 1 as did the PAYR group. We have another one that has a CTGID of 2. Accordingly I put the PAYR disks in SMPL mode again then ran the loop above but this time used "-m grp 3" instead of "-m grp 1". This set the consistency group to 3.
On checking this morning I found that the PAYR group did NOT get split at 4:50 AM but instead got split at 6:00 AM when it is supposed to and the ERP group did get split at 4:50 AM as it is supposed to do.
My main reason in posting this is that I didn't really find a good definition of what the number after "-m grp" was specifying in much of my searching. (That is saying it is GID in help was NOT very helpful and saying it was CTGID or omitting it completely in other on line references without defining CTGID wasn't much more helpful though I did eventually find some information that led me the assumption that made me redo the paircreate.