Disaster Recovery

2007-12-25 11:37:00

Thanks to everyone for all the response. There are a bunch of interesting

options but I'm not sure which one I'll try yet. I couldn't even begin to

summarize the replys into a couple lines so here they all are.

============================================================================

Original Question

=============

Coming from the mainframe world, I had the capability to create a

stand-alone backup tape that contains a bootable program followed by a

backup of my system. I could boot from the tape, and it would restore my

system. We could use this when we went on disaster tests.

We're running Solstice Backup on out Solaris 7 boxes and I don't see an

equivalent function.

The Sun procedure seems to be (1) install Solaris on the box at the disaster

site (2) install the Solstice Backup software on the box at the disaster

site (3) restore the client. I can't believe this is the easiest/best way.

Also, this requires I restore my backup server also.

Question...Is there any way to create a backup tape that I can boot from and

restore my system at a Disaster Site ?

============================================================================

Reply - Nelson T Caparrosso

----------------------------------

I would think Legato Networker would have a similar thing although I am not

sure.

============================================================================

Reply - Alex Shepard

--------------------

There's a chapter in "Unix Backup and Recovery" by W Curtis Preston

(published by OReilly) on Solaris Bare Metal Recovery that probably will

meet your needs. It doesn't include a bootable tape, but it's simpler (and

more effective) than installing the OS and then restoring on top of a live,

active filesystem.

============================================================================

Reply - Joe Fletcher

-------------------------

Theoretically ufsdump/ufsrestore is the tool you are looking for.

SUN are a bit behind in this respect cf DECpaq's btcreate on Tru64, HP's

make_recovery and IBM's whatever it is.

============================================================================

Reply - Reggie Stuart

---------------------------

I heard a myth about bootable Solaris tapes about five years ago. PLEASE

post a summary if you find out otherwise, as your procedures for disaster

recovery for a Sun are my standard procedures.

If your installation is that important, you need to look at

clustering/redundancy. Because backups are mostly done over the network,

most servers don't have a local tape drive to boot from anyway. In the

past, I have setup backup servers with much extra disk space so as to be

able to restore a server's worth of data, and thing reconfigure nfs/nis+/etc

to point at the backup server until proper maintenance can be scheduled.

============================================================================

Reply - Thomas Wardman

----------------------

I've never seen any Sun capable of booting off of a tape. You could do a

couple things instead. You could make a Solaris bootable CD, that contained

the backup client, and you could restore the system from there.

Or, you could create a Jumpstart server at the disaster site, and if

something bad happened, simply boot the system using a automated install.

The automated install could include adding the backup client. Then you

could restore from there.

============================================================================

Reply - Darren Dunham

---------------------

>Coming from the mainframe world, I had the capability to create a

>stand-alone backup tape that contains a bootable program followed by a

>backup of my system. could boot from the tape,and it would restore my

>system. We could use this when we went on disaster tests.

>We're running Solstice Backup on out Solaris 7 boxes and I don't see

>an equivalent function.

There isn't one.

>The Sun procedure seems to be (1) install Solaris on the box at the

>disaster site (2) install the Solstice Backup software on the box at >the

disaster site (3) restore the client. I can't believe this is the

>easiest/best way. Also, this requires I restore my backup server also.

You are correct.

>Question...Is there any way to create a backup tape that I can boot >from

and restore my system at a Disaster Site ?

Not at this time. While it should be possible, no one has engineered a

bootable tape for Sun Solaris.

You may be able to construct something 'useful' on your own, but it is

not integrated into any commercial product that I'm aware of.

There are several sysadmin scripts that try to do this..

1) save a copy of the scripts to tape

2) save the disk configuration

3) save a ufsdump.

>From that, you could boot the machine from a cdrom. Grab the scripts

from the tape, then have the scripts read the configuration and redo the

disks and then restore from the ufsdump.

At that point, you should have a bootable system that has Solstice

Backup or another product on it ready to restore other filesystems.

============================================================================

Reply - blymn

-------------

>Question...Is there any way to create a backup tape that I can boot >from

and restore my system at a Disaster Site ?

Yes - don't use Solstice backup. Create a recovery tape just using

ufsdump which has all your partitions backed up on it. The procedure

then devolves to booting from CD, mount formatted/newfs'ed harddisk,

restore from tape (no OS install required), installing boot blocks,

reboot. I dislike a lot of backup software solutions because they

make things hard just when you don't need them to be - partially

rebuilding a server just to get access to your data is something you

should not need to deal with in a disaster situation. This leaves

aside the issues of trying to get a license to allow you to do the

restore (though, some products will allow you to restore without a

license)

============================================================================

Reply - Bret Hester

-------------------

This is quite intresting please summarise soon.

I am using Veritas Netbackup and the procedure is just as you have

decribed for Solstice Backup. The procedure we are in the middle

of implemently here for our Suns is to instead of tape booting.

We have a Jumpstart server (on the lost client we just do "boot net) which

sets up the correct partition slicing and installs the Networked backup

client software. Then we still have too do a client restore.

============================================================================

Reply - Ric Anderson

--------------------

Nope, that's it - just as DUMB as ADSM's idea of a bare metal recovery

in the IBM world.

What I do is make a ufsdump backup of the system device (/ in my case,

but some people put /var, /opt, /, /usr and who knows what else in

separate partitions, so you may need to back up more than just /) on

a single tape, e.g.

        for fs in / /usr /var /opt; do

        ufsdump 0f /dev/rmt/0cn $fs

        done

This creates a multifile ufsdump tape, from which you can restore / via

        mt -f /dev/rmt/0cn rew

        ufsrestore -rf /dev/rmt/0cn

or /var via

        mt -f /dev/rmt/0cn rew

        ufsrestore -rfs /dev/rmt/0cn 3

etc. Also make a prtvtoc of the system device (and all other disks) so

you have partitioning info for use at the disaster site (or to recover

after replacing a smoked disk).

With this tape and a Solaris install CD (no install required though) in

hand for your version of the OS, you

        1. stick the CD in

        2. type boot -sw at the ok prompt to get a single user shell

        3. run format, select the system disk, and the partition

           submenu. Use the prtvtoc output to help you partition

           the new disk like the old one, then do "label", followed by

           "yes" to write out the new partition table, and quit format.

        4. newfs the new partition(s).

        5. For discussion, presume c0t0d0 is the system disk, s0 is /.

           * mount /dev/dsk/c0t0d0s0 /mnt

           * cd /mnt

           * stick tape in drive, and do

           * ufsrestore -rf /dev/rmt/0cn

           * rm restoresymtable # ufsrestore scratch file

           * cd /

           * umount /mnt

           * installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \

               /dev/rdsk/c0t0d0s0

Now you should have a working "/". If /var (or /opt, or /usr) are

on their own partitions, then you need to repeat portions of the

above (skip boot block, but do rewind before each restore, and

remember to use the "s" option).

Once you've done a couple dozen of these, its pure autopilot :-)

============================================================================

Reply - Geoff Lane

------------------

As I understand it only fixed block length tapedrives can be used to boot

from under SunOS. There are few of these available these days.

We are looking at other schemes. The following are some (site specific)

notes that I wrote some time ago. We haven't yet decided what to do.

----------------------------------------------------------------------

Using ufsdump/ufsrestore

------------------------

(Although this describes Solaris based procedures, I would expect that very

similar steps and programs could be used on any Unix-like operating system.)

It's possible to use ufsdump to create a disaster recovery dump by

ufsdump'ing to a remote machine (in the case of vxfs you can use vxdump.) On

most Solaris systems this will end up requiring about 300Mbytes of remote

filestore for the full dump of /, /etc, /usr, /var (assuming var is not full

of email or logs.)

In theory you need to do this in single user mode, but ufsdump doesn't

enforce that restriction.

Assuming a total loss of a root disk, to recover the system you need to...

        1. boot from CDROM into maintainence/single user mode

        2. recreate the disk partitions if necessary (which you better

have records of, in the case of Solaris the explorer output contains

           the information)

        3. newfs the disk partitions just created.

        4. ufsrestore from the dumps.

        5. reboot the machine from disk.

So, for the 20 odd Solaris machines we would need about

        20 x 300M + 2G = 8G

of remote disk storage for the level 0 dumps (the 2G is for Irwell that has

a lot of application software installed on the root disk :-() You will need

about 20% more for the incrementals if needed (but if the total dump is only

about 300Mbytes then there's no real point in taking incrementals, it just

lengthens the time required to perform the restore.)

Ideally, one should have one "initial" dump taken when the system was first

installed and a "current" dump taken once a week or so. This allows for the

possibility that you may wish to determine exactly what has changed since a

known good system was running (if for example you suspect you have been

hacked.) This would double the total storage space required to 16G.

Step 3 is complicated if you have VXFS as you need to get the support s/w

online somehow - no idea how. However, one rarely does any more than

"encapsulate" the root drives in which case simple ufsdumps are fine (though

the final steps to total recovery are more complicated; /etc/vfstab will

need editing and the root filesystems will need re-encapulating.)

Step 4 is the interesting one; traditionally you do this with a locally

attached tape drive. We would like to use a remote fileserver which would

require network services to be running after booting from CDROM.

In the case of Solaris this is done by...

        1. Boot the system from CDROM into single user mode

                ok> boot cdrom -s

        2. Initiallise networking. You need some information for this

           that is best collected while the system is running.

                The name of the network interface hardware (hme0 etc)

                The IP number for the machine

                The Netmask and broadcast address

                The default route

           The values used here are examples, they will differ per machine.

                # ifconfig hme0

   Should return information on hme0 if it is configured in; if not

           then try

                # ifconfig hme0 plumb

           which should get the interface operational.

           Now, configure the network interface

                # ifconfig hme0 192.11.111.1 netmask 255.255.255.0 \

                        broadcast 192.11.111.255 up

           Check the interface state

                # ifconfig hme0

hme0: flags=863<UP,BROADCAST,NOTRAILERS,RUNNING,MULTICAST> mtu 1500

        inet 192.11.111.1 netmask ffffff00 broadcast 192.11.111.255

        3. Set a default route

                # route -n add default 192.11.111.250

You should now have a working network interface. You could go on to set up

DNS etc but it's probably not worth the bother. The OS at this point can use

telnet, ftp etc with explicit IP number addresses and you can pull the dumps

from a network attached fileserver.

Alternatively we can write the dumps to CDROM. The problem doing this is

that with active systems is the dumps will quickly become out of date. It

does eliminate the necessity to initiallise networking.

Another possibility is to get a cheap SCSI disk array and just plug it into

the system to be recovered -- there's no problem doing this as the system is

already down. That way the dump data is local to the machine and no fancy

network footwork is needed. Normally, the disk array would be plugged into

the dump server and kept up to date. The downside of this method is that

you would be disturbing the SCSI cables, never a wise move. Of course, if

money were no object a Fibre channel SAN would do the same job without any

manual plugging needed.

Step 5 should be simple but there are some possible problems. The selected

new boot drive may not be bootable. This can be fixed by the ? command.

Because the recovery process has not used the standard OS install procedures

the EEPROM will not have been updated to take account of any change in the

boot drive location (if it differs at all.) This requires that the boot

disk alias in the EEPROM be updated.

Most of this is theoretical. We'll have to try various procedures out and

see what works.

============================================================================

Reply - Stuart Whitby

---------------------

HP and DEC have similar functions (maybe just installs the core OS

image from tape - I haven't had to use it and I'm not sure), and

you can use a Jumpstart server to get things back up to date

quickly, minus your data.

The only way I know of to do this in Solaris is to use the bare-

metal recovery stuff that we make. I haven't had to use that

either, and on my first (and only) look at the functionality, it

appeared to be pretty limited in its hardware scope. I don't know

if there's any equivalent software from any other companies, or if

the functionality of our own software has improved since I looked -

around the beginning of the year.

============================================================================

Reply - Gary Litwin

-------------------

I always just made a ufsdump tape for each OS filesystem (including wherever

your solstice backup lives).

In the event of a disaster, boot from the cdrom, partition and replace the

bad disk, ufsrestore the filesystems it originally contained, now you are up

and configured as of the data you took the last ufsdump.

Now you can just restore the same filesystems via solstice, replacing all

the changed files, and you are back in business.

This process has saved me several times.

It is a really good idea to have a backup tape of /nsr on your backups

server as well, in case you lose that disk in an emergency. (I used to

ufsdump it to a hot spare disk once a week...)

============================================================================

Reply - Seth Rothenburg

-----------------------

The official procedure is just like the "fix root password" procedure...boot

from CD into single user mode.

Then, format/newfs/mount needed file systems (eg, on /a) and then restore

them. However, booting from CD rom is slow.

In our recent Disaster tests, we have been fortunate that we arrive at the

disaster site and find the system up and running off some disk c0t0d0s0, and

there are 6 disk drives in the system, so we can start in on the restore

without CD. We actually wrote a script and soon we hope to change our backup

to put files needed to start the restore on their own partition....Here is 2

examples.....

rothen@testdg[/home/mangala]:> more restore_prod restore_dg | cat

::::::::::::::

restore_prod

::::::::::::::

#!/bin/sh

#set up paths for commands

TEE=/bin/tee

UFSRESTORE=/usr/lib/fs/ufs/ufsrestore

DATE=/usr/bin/date

REWIND=/dev/rmt/0

NOREWIND=/dev/rmt/0n

MT=/usr/bin/mt

RESTORELOG=/tmp/restore_log

$DATE > $RESTORELOG

echo Please monitor console for all error messages...

# Rewind tape

$MT -f /dev/rmt/0 rewind 2>&1

# Start restore.

######### RESTORE root #############################################

newfs /dev/rdsk/c0t2d0s0

mount /dev/dsk/c0t2d0s0 /a

cd /a

$UFSRESTORE rvf $NOREWIND 2>&1|$TEE -a $RESTORELOG

installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk dev/rdsk/c0t2d0s0

########## END RESTORE root ########################################

######### RESTORE /usr #############################################

newfs /dev/rdsk/c0t3d0s5

mount /dev/dsk/c0t3d0s5 /a/usr

cd /a/usr

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /usr ########################################

######### RESTORE /var #############################################

newfs /dev/rdsk/c0t3d0s0

mount /dev/dsk/c0t3d0s0 /a/var

cd /a/var

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /var ########################################

########## RESTORE /opt ########################################

newfs /dev/rdsk/c0t2d0s6

mount /dev/dsk/c0t2d0s6 /a/opt

cd /a/opt

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /opt ########################################

########## RESTORE /opt/gnu ########################################

newfs /dev/rdsk/c0t2d0s7

mount /dev/dsk/c0t2d0s7 /a/opt/gnu

cd /a/opt/gnu

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /opt/gnu ########################################

########## RESTORE /home2 ########################################

newfs /dev/rdsk/c0t2d0s5

mount /dev/dsk/c0t2d0s5 /a/home2

cd /a/home2

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /home ########################################

########## RESTORE /loglu ########################################

newfs /dev/rdsk/c0t3d0s1

mount /dev/dsk/c0t3d0s1 /a/loglu

cd /a/loglu

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /loglu ########################################

        echo " " |$TEE -a

$RESTORELOG

        echo "restore of System Disks Completed." |$TEE -a

$RESTORELOG

        echo " " |$TEE -a

$RESTORELOG

$MT -f /dev/rmt/0 rewind 2>&1|$TEE -a $RESTORELOG

$DATE >>$RESTORELOG

::::::::::::::

restore_dg - for restoing the data partition

::::::::::::::

#!/bin/sh

#set up paths for commands

TEE=/bin/tee

UFSRESTORE=/usr/lib/fs/ufs/ufsrestore

DATE=/usr/bin/date

REWIND=/dev/rmt/0

NOREWIND=/dev/rmt/0n

MT=/usr/bin/mt

RESTORELOG=/tmp/restore_log

$DATE > $RESTORELOG

echo Please monitor console for all error messages...

# Rewind tape

$MT -f /dev/rmt/0 rewind 2>&1

newfs -m 1 /dev/md/ssa1/rdsk/d56

newfs -m 1 /dev/md/ssa1/rdsk/d45

newfs -m 1 /dev/md/ssa1/rdsk/d89

# Start restore.

######### RESTORE /dg #############################################

mount /dev/md/ssa1/rdsk/d56 /dg

cd /dg

$UFSRESTORE rvf $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /dg ########################################

######### RESTORE /dg/dghome/log

#############################################

mkdir /dg/dghome/log

mount /dev/md/ssa1/dsk/d45 /dg/dghome/log

cd /dg/dghome/log

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /dg/dghome/log

########################################

######### RESTORE /dg/dghome/queue

#############################################

mkdir /dg/dghome/queue

mount /dev/md/ssa1/dsk/d89 /dg/dghome/queue

cd /dg/dghome/queue

$UFSRESTORE rfv $NOREWIND 2>&1|$TEE -a $RESTORELOG

########## END RESTORE /dg/dghome/queue

########################################

        echo " " |$TEE -a

$RESTORELOG

        echo "restore of System Disks Completed." |$TEE -a

$RESTORELOG

        echo " " |$TEE -a

$RESTORELOG

$MT -f /dev/rmt/0 rewind 2>&1|$TEE -a $RESTORELOG

$DATE >>$RESTORELOG

rothen@testdg[/home/mangala]:>

============================================================================

S

U BEFORE POSTING please READ the FAQ located at

N ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/faq

. and the list POLICY statement located at

M ftp://ftp.cs.toronto.edu/pub/jdd/sun-managers/policy

A To submit questions/summaries to this list send your email message to:

N sun-managers@codeprof.ececs.uc.edu

A To unsubscribe from this list please send an email message to:

G majordomo@codeprof.ececs.uc.edu

E and in the BODY type:

R unsubscribe sun-managers

S Or

. unsubscribe sun-managers original@subscription.address

L To view an archive of this list please visit:

I http://www.latech.edu/sunman.html

S

T

Comments

Got something to say?

You must be logged in to post a comment.