Posts Tagged ‘CommVault’

Happy Friday, Students of the Oracle!!
I had the opportunity to take over a failed clone job on CommVault. The immediate solution was to accomplish the clone using RMAN Duplicate. I believe the steps were to run a full backup on RMAN and then duplicate this to the target database. This succeeded and was definitely a solution that satisfied the customer.My job was to figure out why it failed. This is the error stack I received. The last three lines were the comments from my coworker as he left the building to go out of town on holiday.
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of Duplicate Db command at 06/01/2016 13:22:54
RMAN-05501: aborting duplication of target database
RMAN-05556: not all datafiles have backups that can be recovered to SCN 7447956627439
RMAN-03015: error occurred in stored script Memory Script
RMAN-06026: some targets not found – aborting restore
RMAN-06023: no backup or copy of datafile 18 found to restore
RMAN-06023: no backup or copy of datafile 16 found to restore
RMAN-06023: no backup or copy of datafile 14 found to restore
RMAN-06023: no backup or copy of datafile 12 found to restore
RMAN-06023: no backup or copy of datafile 9 found to restore
RMAN-06023: no backup or copy of datafile 7 found to restore
RMAN-06023: no backup or copy of datafile 6 found to restore
RMAN-06023: no backup or copy of datafile 1 found to restore
RMAN>
Recovery Manager complete.
]
That may point out a CV issue.
Why would the backups be reporting success yet be unrestorable?
Past reasons dealt with interface card setup in the CV infrastructure.
 This sounds really bad but it’s actually quite simple. The worst case scenario is we have a corrupted database. The best case scenario is the backup we chose to duplicate with is incomplete.
  1. My first step is to look at the complete RMAN log for the backup. I’ll spare you the complete script but it was attempting a Full (level 1) backup. In other words, an incremental full. When it came across missing datafiles, it defaulted to a Full (level 0) backup and that succeeded. This answered my coworker’s first two questions. It reported success because it was a successful backup despite failing to be an incremental backup. Also, this log gave me 90% certainty it is not a CommVault(CV) issue. **Hint** If we are going to use a past backup to clone with, make sure there are no warnings in the RMAN log and ensure it is a full backup. It was actually a stroke of good fortune that the clone failed.
  2. From CommVault, I ran a full (level 0) backup on both the source and target databases. In our setup, CommVault will not recognize backups run outside of the CommVault tool. Upon further investigation, there had not been a full (level 0) backup run on either database for some time. Our backup strategy and schedule is designed to run a full(level 0) backup once a week. This needs to be looked at. Both backups succeeded and the log showed no RMAN-06023 errors. This full backup can now be used to clone in the future.
  3. In CommVault, I navigated to my schedule policy for these databases. The schedule showed a full backup scheduled once a week. However, in the Backup Options tab, the Incremental box was selected. I quickly changed this to Full and saved my changes.
  4. I verified the logs for last night’s incremental backups on both databases. No errors. Problem solved!!

The moral of the story is don’t trust “Recovery Manager complete” as proof the backup succeeded without warnings. Also, I advise against using a Full (Level 1) backup to clone with.

Thanks for reading!

Jason

Advertisements

Happy Monday, Oracle Backup Engineers!

Ask a DBA what is the single most important skill that a DBA possess and the majority will tell you Performance Tuning. I must take issue with that opinion. My minority opinion is Backup and Recovery is, without a doubt, the most important skill to master. Without that skill, you may (will) wake up one morning without a database to tune.

Today, I thought it would be fun to take our cloning to the next level. Let’s take our RAC instance and clone it to a different server. This is a common practice and its applications are many. You can use it to restore, migrate or create a database instance. I am still doing test restores so that is the slant in which this and previous articles are pointed.

You will notice a 95% similarity between this article and the one I published two weeks ago. No need to rewrite the entire article but for your benefit, I have recopied all of the steps so you don’t have to bounce back and forth between articles. I merely added a step for tnsnames.ora and modified a couple other steps.

***A little terminology. When I refer to the source, that is the database I am cloning. When I refer to target, that is where I am cloning to.***

  1. Copy the primary database (source) to the clone I’ll make.
    1. Save the oratab with my clone database added.
      1. As Oracle, $ vi /etc/oratab
      2. Add a line with your restore instance name.
      3. $ cat /etc/oratab –to see if your new instance is there.
    2. Change environment to new instance $ . oraenv ?new_instance$ cd $ORACLE_HOME/dbs
    3. Make a pfile from the working instance (source database) you are going to clone.
      1. SQL> create pfile=’/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/initnewinstance.ora’ from spfile;
    4. Edit initnewinstance.ora to make your cloning pfile.
      1. :set ignorecase –makes it easier for searching
      2. Then erase double underscore lines in the pfile.
      3. set cluster_database=false
      4. Set control_files so only the disk groups are set
        1. for example: *.control_files=’+DATA_XDT1′,’+RECO_XDT1′
      5. Set db_recovery_file_dest_size, pga_aggregate, sga_target to appropriate size
      6. Set *.remote_listener to appropriate location.
      7. Change use_large_pages=’ONLY’ to = ‘TRUE’ so it uses them if available but doesn’t fail if none are available.
      8. Change all string=’clone_db’ to become ‘recover_db’
        1. At the vi colon prompt like so   :%s#clonedb#recoverdb#g
      9. Save the new pfile as initrecoverdb.ora.
      10. If you have these backup settings for improved tape backup, keep them.
        1. *._backup_disk_bufcnt=64
        2. *._backup_disk_bufsz=1048576
        3. *._backup_file_bufcnt=64
        4. *._backup_file_bufsz=1048576
        5. *.backup_tape_io_slaves=TRUE.
    5. Edit (vi) tnsnames.ora and add new instance.

:set ignorecase

search for clonedb  [/clonedb]

Copy the single instance “clone_db” to make recover_db then paste it back to create it.

Once edits are done save it then do a tnsping for recover_db.

That now shows network TNS protocol is ready

6. Make a “temporary” copy of your tnsnames.ora file.

a. In this file change the server info to match what exists in all nodes in your target database tnsnames file. You can optionally just edit the tnsnames file but I did this just to stay away from our production tnsnames file since what we are doing is temporary. 

$ cp tnsname.ora $ORACLE_HOME/network/admin/temp

7. Now check existence of $ORACLE_BASE/admin/recover_db/adump and $ORACLE_HOME/admin/recover_db/wallet; further, that wallet has both wallet and single sign on wallet.

  • Neither exist so create both and populate the wallet from our target database (to use RMAN terminology)

mkdir -p $ORACLE_BASE/admin/recover_db/adump

The “-p” tells mkdir to make the absolute path you supply to it

mkdir -p /u01/app/oracle/product/11.2.0.4/dbhome_1/admin/recover_db/wallet

$ cp cwallet.sso /u01/app/oracle/product/11.2.0.4/dbhome_1/admin/recover_db/wallet

$ cp ewallet.p12 /u01/app/oracle/product/11.2.0.4/dbhome_1/admin/recover_db/wallet

    1. As GRID, Add entry to listener.ora

(SID_DESC =

(GLOBAL_NAME=recoverdb)       (ORACLE_HOME=/u01/app/oracle/product/11.2.0.4/dbhome_1)       (SID_NAME=recoverdb)     )

lsnrctl status

lsnrctl reload

lsnrctl status

cd $ORACLE_HOME/network/admin

and edit the file as shown above.

After testing the directories for existence plus doing tnsping for the connection descriptor successfully, we’re ready to start the auxiliary database in nomount mode.

  1. Start SQL*Plus with environment set to recover_db

    Now start the database in nomount mode using the pfile we created. SQL> startup nomount pfile=’/u01/app/oracle/product/11.2.0.4/dbhome_1/dbs/initrecoverydb.ora’

Exit to the Unix prompt as we’re done with Unix preparations.

Now comes the CommVault steps

Here we go…

    1. Discover the recovery instance:
      1. Navigate to Single Instance Server (xdt7–>Oracle–>All Tasks–>Discover instance
        1. Discovered Instance!! Refresh the screen to see newly discovered instance.
        2. In the discovered instance in CV, navigate to the Client Name–> Instance(Oracle SID)–> Properties–> Details tab and edit the TNS_ADMIN folder: 

          TNS_ADMIN folder: /u01/app/oracle/product/11.2.0.4/dbhome_1/network/admin/temp 

  • We are duplicating source database “clone_db” to auxiliary database “recover_db”.
  1. Let’s display the target and list its backups from its backup history view

“clone_db->View->Backup History

  1. Use the advanced option to show All operations, Include Aged Data and Include Backup Copy as NAS Snapshot Catalog Jobs.
  2. Choose okay at each subsequent screen till the Backup Job History is displayed
  3. Double click the latest successful backup and copy and record its:
    1. Job id: 554999 (**Example)
    2. Start Time:
    3. End Time:
    4. SCN Number(S): I use the lower SCN number if there are two displayed.
  4. You’re now ready to begin setting up a clone restore of a backup.
  5. Right click on the backup job # i.e. (553999)
    1. Choose “Browse and Restore”
    2. Next click “View Content” push button control at bottom of the Browse and Restore Options screen.
    3. It displays the tablespaces available in the right pane for selected database in left hand pane.
    4. Click the left hand panes checkbox control to select entire content.
    5. Then “Recover All Selected…”

The “Oracle Restore Options Server_name-clone_db SID: clone_db” screen appears.

Note that its shows Destination Client as being Server_name-clone_db, which will change after you return from the “Advance” options on this screen which we’ll do now:

  1. Select “Duplicate DB”. Now we get 12 tab controls on the Oracle Restore Options screen.
  2. Click on the Advanced push button control. Tab controls appear on the Oracle Advanced Restore Options screen.
  3. Choose the “Duplicate” tab control.
  4. Click its “Duplicate To” checkbox control, and its option become enabled.
  5. In the “Database Name” textbox control, entry the auxiliary database name (i.e. clone you’re making): recover_db.
  6. Now move to the “Recover” tab control, then active its SCN radio button and specify the SCN number we save in our preparation work:
  7. Change to the Duplicate DB Options tab control, check its “No Filename Check” checkbox control.
  8. Finally activate the “Redirect” tab control, uncheck its “Redirect” checkbox control. Remaining tab controls aren’t used for this operation, but you can look at them.

To summarize, the key tab controls were these: Duplicate, Recover, Duplicate DB Options and Redirect.

Now choose “OK” to accept

Notice that our “Destination Client” was automatically filled in after completion of the advanced push button control dialogs.

  1. Now choose “Number of streams to use for restore”, which I usually set at 4 — next accept all entries by clicking “OK” at the bottom of the “Oracle Restore Options” screen.

It immediately gives a warning that, “Data from clone_db on server_name-clone_db will be restored on to recovery_db on server_name.  All data in recovery_db will be over written.  Are you sure you want to continue?”

  1. Take heed of the waning and carefully ensure you want to do this.
  2. Let the restore begin.
  3. Once it completes, you should be able to open the database from the SQL Plus prompt and test if the restore is complete.

At this point, I took the time to delete this recovery instance. Some of the steps I took were to remove listener.ora entry (as Grid), remove its TNS descriptor from tnsnames.ora, and oratab entries. Then shutdown the database, carefully dispose of the datafiles in ASM and remove its statis listener entries. Also, delete the temporary tnsnames.ora file.

Thanks for reading! Again, I greatly appreciate Tim Dexter for his expertise in CommVault and RMAN Duplicate.

Take care,

Jason

Since my last post on Making a Test Restore of a RAC DB Using CommVault and Putty SSH SessionI have performed this operation about five or six times. In the process I have come across more than a few pitfalls. I used to consider myself somewhat proficient with RMAN. Then I added CommVault into the mix. While it looks pretty to the average backup professional, it adds another layer of complexity such as the subject we are about to touch on.

I opened up my own blog post on the subject to perform this restore. As a blogger, I believe it is important to follow your own directions to ensure you won’t lead others astray.

  • I got down to the CommVault portion of the clone. The Backup History details for the full backup I chose to use had four SCN’s to choose from. I first chose the highest SCN. I have been told by Mr. Dexter that using the highest SCN has a high failure rate but I am a slow learner. I got the above error but with the higher SCN.
  • Next, I chose the lowest SCN of the four that the backup history provided. That’s when I got the error that is also the title of my blog post. I pride myself on my attention to detail. Yet, I didn’t read the whole error statement. Let’s look at the whole error statement…

RMAN-06457: UNTIL SCN (98553698) is ahead of last SCN in archived logs (98553695)

  • Can anyone see the SCN I should have used? My lowest SCN was 98553698 but RMAN was trying to tell me the last SCN in the archived logs was 98553695. But it is outside of the range for the backup history! I thought about that. I also remember someone from a previous contract mentioning you can’t always trust SCN’s provided by CommVault. Had I went to the RMAN Console I could have looked at the Backup Script and I am pretty sure the SCN’s would be a little different.
  • I created a third clone job based on all the numbers in the backup job but I supplied 98553695 this time.

This time it worked flawlessly! Ten minutes later I had a new instance! I logged in to SQL Plus, checked its status on v$database and looked at its data files.

Please let me know if you have tried my steps. Especially let me know if you have had any issues. I am very curious about the pitfalls of CommVault as I have used this product for a total of ten months. Hopefully we can work together and get this down to a science.

Thanks for reading!

Jason Brown

Good Afternoon, Oracle fanatics!

Today’s post deals with one of the nuances of CommVault and the management of our beloved recovery manager, formally known as RMAN.

For those of you who don’t know, CommVault is a graphical user interface (GUI) that works with recovery manager tools for data protection, backup and recovery. It does much much more but I only listed the piece I am currently using. Our team manages a multitude of Oracle single instance, 2 instance RAC and 4 instance RAC databases. CommVault not only shows a status of running jobs, it interfaces seamlessly with RMAN for configuration, scheduling and running full, differential, cumulative and archive log backups and recovery. It is a very useful tool for enterprise organizations.

Recently, our storage team upgraded CommVault to version 11. There was no need for an RMAN outage as the upgrade would take place under a short maintenance period and no backups were scheduled during the CommVault outage. As you may know, RMAN is resilient enough to withstand a paused job anyways. So the thought that backups might fail as a result of the upgrade was not fully investigated. The upgrade went smoothly. After the maintenance completed, CommVault came back up. We were able to connect after tweaking our Java settings. Each morning, we get a job summary report from CommVault showing all of the jobs that ran the previous day. There were over 200 jobs that ran. Of those, 12 backups failed, 9 were still running, 8 were delayed, 2 were killed, and 2 completed with errors. Uh oh. We have a major problem.

The script that CommVault created for backup used up to 4 channels spread across the various nodes of the instance. Each channel connects to RMAN and then connects to the database using…

allocate channel ch1 type ‘sbt_tape’ connect “sys/***************@rac_inst_name”
PARMS=”SBT_LIBRARY=/opt/simpana/simpana/Base/libobk.so,BLKSIZE=1048576,ENV=(CV_mmsApiVsn=2,CV_channelPar=ch1,ThreadCommandLine= -chg 1:2 -rac 364 -cn xdt#dbadm01 -vm Instance001)”
TRACE 0;

All of the jobs that failed overnight failed to connect to the instance with…

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
ORA-01017: invalid username/password; logon denied
RMAN>
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of configure command at 04/08/2016 07:23:58
RMAN-06171: not connected to target database

ORA-01017 is the root cause.

I simulated the error by connecting to RMAN and then running the connect string to the database. I was able to connect. These failed jobs seemed completely random. 13 databases including single instance, RAC, Solaris Servers and UNIX Servers. There was no connection so we turned it back over to our storage team. After a call with the vendor, we realized there was actually a pattern. All of the jobs that failed had SYS passwords that included special characters. So the connect string had to include double quotes. The CommVault connect string did not include double quotes so it could not interpret any special characters.

It turned out to be a simple solution but took days to get the storage team to look at CommVault as the culprit instead of RMAN or the database.

For the future, it makes sense to come up with a better test plan than simply see if CommVault comes back up and appears to be connected to the databases.

Thanks for reading!

Jason