When disaster strikes, you must have a plan, and you must have prepared in advance otherwise the work of recovering your system and your files will be considerably greater. For example, if you have not previously saved the partitioning information for your hard disk, how can you properly rebuild it if the disk must be replaced?
Unfortunately, many of the steps one must take before and immediately after a disaster are very operating system dependent. As a consequence, this chapter will discuss in detail disaster recovery (also called Bare Metal Recovery) for Linux and Solaris. For Solaris, the procedures are still quite manual. For FreeBSD the same procedures may be used but they are not yet developed. For Win32, a number of Bacula users have reported success using BartPE.
Here are a few important considerations concerning disaster recovery that you should take into account before a disaster strikes.
As an alternative to creating a Rescue CD, please see the section below entitled Bare Metal Recovery using a LiveCDLiveCD.
Bacula previously had a Rescue CD. Unfortunately, this CD did not work on every Linux Distro, and in addition, Linux is evolving with different boot methods, more and more complex hardware configurations (LVM, RAID, WiFi, USB, ...). As a consequence, the Bacula Rescue CD as it was originally envisioned no longer exists.
However there are many other good rescue disks available. A so called "Bare Metal" recovery is one where you start with an empty hard disk and you restore your machine. There are also cases where you may lose a file or a directory and want it restored. Please see the previous chapter for more details for those cases.
Bare Metal Recovery assumes that you have the following items for your system:
Now, let's assume that your hard disk has just died and that you have replaced it with an new identical drive. In addition, we assume that you have:
This is a relatively simple case, and later in this chapter, as time permits, we will discuss how you might recover from a situation where the machine that crashes is your main Bacula server (i.e. has the Director, the Catalog, and the Storage daemon).
You will take the following steps to get your system back up and running:
Now for the details ...
Each rescue disk boots somewhat differently. Please see the instructions that go with your CDROM.
You can test it by pinging another machine, or pinging your broken machine machine from another machine. Do not proceed until your network is up.
chroot /mnt/disk /tmp/bacula-fd -c /tmp/bacula-fd.conf
The above command starts the Bacula File daemon with the proper root disk location (i.e. /mnt/disk/tmp. If Bacula does not start, correct the problem and start it. You can check if it is running by entering:
ps fax
You can kill Bacula by entering:
kill -TERM <pid>
where pid is the first number printed in front of the first occurrence of bacula-fd in the ps fax command.
Now, you should be able to use another computer with Bacula installed to check the status by entering:
status client=xxxx
into the Console program, where xxxx is the name of the client you are restoring.
One common problem is that your bacula-dir.conf may contain machine addresses that are not properly resolved on the stripped down system to be restored because it is not running DNS. This is particularly true for the address in the Storage resource of the Director, which may be very well resolved on the Director's machine, but not on the machine being restored and running the File daemon. In that case, be prepared to edit bacula-dir.conf to replace the name of the Storage daemon's domain name with its IP address.
On the computer that is running the Director, you now run a restore command and select the files to be restored (normally everything), but before starting the restore, there is one final change you must make using the mod option. You must change the Where directory to be the root by using the mod option just before running the job and selecting Where. Set it to:
/
then run the restore.
You might be tempted to avoid using chroot and running Bacula directly and then using a Where to specify a destination of /mnt/disk. This is possible, however, the current version of Bacula always restores files to the new location, and thus any soft links that have been specified with absolute paths will end up with /mnt/disk prefixed to them. In general this is not fatal to getting your system running, but be aware that you will have to fix these links if you do not use chroot.
/sbin/grub-install --root-directory=/mnt/disk /dev/hda
Note, in this case, you omit the chroot command, and you must replace /dev/hda with your boot device. If you don't know what your boot device is, run the ./run_grub script once and it will tell you.
Finally, I've even run into a case where grub-install was unable to rewrite the boot block. In my case, it produced the following error message:
/dev/hdx does not have any corresponding BIOS drive.
The solution is to insure that all your disks are properly mounted on /mnt/disk, then do the following:
chroot /mnt/disk mount /dev/pts
Then edit the file /boot/grub/grub.conf and uncomment the line that reads:
#boot=/dev/hda
So that it reads:
boot=/dev/hda
Note, the /dev/hda may be /dev/sda or possibly some other drive depending on your configuration, but in any case, it is the same as the one that you previously tried with grub-install.
Then, enter the following commands:
grub --batch --device-map=/boot/grub/device.map \ --config-file=/boot/grub/grub.conf --no-floppy root (hd0,0) setup (hd0) quit
If the grub call worked, you will get a prompt of grub before the root, setup, and quit commands, and after entering the setup command, it should indicate that it successfully wrote the MBR (master boot record).
First unmount all your hard disks, otherwise they will not be cleanly shutdown, then reboot your machine by entering exit until you get to the main prompt then enter Ctrl-d. Once back to the main CDROM prompt, you will need to turn the power off, then back on to your machine to get it to reboot.
If everything went well, you should now be back up and running. If not, re-insert the emergency boot CDROM, boot, and figure out what is wrong.
Above, we considered how to recover a client machine where a valid Bacula server was running on another machine. However, what happens if your server goes down and you no longer have a running Director, Catalog, or Storage daemon? There are several solutions:
The first option, is very difficult because it requires you to have created a static version of the Director and the Storage daemon as well as the Catalog. If the Catalog uses MySQL or PostgreSQL, this may or may not be possible. In addition, to loading all these programs on a bare system (quite possible), you will need to make sure you have a valid driver for your tape drive.
The second suggestion is probably a much simpler solution, and one I have done myself. To do so, you might want to consider the following steps:
For additional details of restoring your database, please see the Restoring When Things Go Wrongdatabase_restore section of the Console Restore Command chapter of this manual.
Since every flavor and every release of Linux is different, there are likely to be some small difficulties with the scripts, so please be prepared to edit them in a minimal environment. A rudimentary knowledge of vi is very useful. Also, these scripts do not do everything. You will need to reformat Windows partitions by hand, for example.
Getting the boot loader back can be a problem if you are using grub because it is so complicated. If all else fails, reboot your system from your floppy but using the restored disk image, then proceed to a reinstallation of grub (looking at the run-grub script can help). By contrast, lilo is a piece of cake.
As an alternative to the old now defunct Bacula Rescue CDROM, you can use any system rescue or LiveCD to recover your system. The big problem with most rescue or LiveCDs is that they are not designed to capture the current state of your system, so when you boot them on a damaged system, you might be somewhat lost - e.g. how many of you remember your exact hard disk partitioning.
This lack can be easily corrected by running the part of the Bacula Rescue code that creates a directory containing a static-bacula-fd, a snapshot of your current system disk configuration, and scripts that help restoring it.
Before a disaster strikes:
Then when disaster strikes, do the following:
In order to create the Bacula recovery directory, you need a copy of the Bacula Rescue code as described above, and you must first configure that directory.
Once the configuration is done, you can do the following to create the Bacula recovery directory:
cd <bacula-rescue-source>/linux/cdrom su (become root) make bacula
The directory you want to save will be created in the current directory with the name bacula. You need only save that directory either as a directory or possibly as a compressed tar file. If you run this procedure on multiple machines, you will probably want to rename this directory to something like bacula-hostname.
The same basic techniques described above also apply to FreeBSD. Although we don't yet have a fully automated procedure, Alex Torres Molina has provided us with the following instructions with a few additions from Jesse Guardiani and Dan Langille:
The same basic techniques described above apply to Solaris:
However, during the recovery phase, the boot and disk preparation procedures are different:
Once the disk is partitioned, formatted and mounted, you can continue with bringing up the network and reloading Bacula.
As mentioned above, before a disaster strikes, you should prepare the information needed in the case of problems. To do so, in the rescue/solaris subdirectory enter:
su ./getdiskinfo ./make_rescue_disk
The getdiskinfo script will, as in the case of Linux described above, create a subdirectory diskinfo containing the output from several system utilities. In addition, it will contain the output from the SysAudit program as described in Curtis Preston's book. This file diskinfo/sysaudit.bsi will contain the disk partitioning information that will allow you to manually follow the procedures in the "Unix Backup & Recovery" book to repartition and format your hard disk. In addition, the getdiskinfo script will create a start_network script.
Once you have your disks repartitioned and formatted, do the following:
When a pre-1.30 version of Bacula restores a directory, it first must create the directory, then it populates the directory with its files and subdirectories. The act of creating the files and subdirectories updates both the modification and access times associated with the directory itself. As a consequence, all modification and access times of all directories will be updated to the time of the restore.
This has been corrected in Bacula version 1.30 and later. The directory modification and access times are reset to the value saved in the backup after all the files and subdirectories have been restored. This has been tested and verified on normal restore operations, but not verified during a bare metal recovery.
If any of you look closely at the bootstrap file that is produced and used for the restore (I sure do), you will probably notice that the FileIndex item does not include all the files saved to the tape. This is because in some instances there are duplicates (especially in the case of an Incremental save), and in such circumstances, Bacula restores only the last of multiple copies of a file or directory.
Due to open system files, and registry problems, Bacula cannot save and restore a complete Win2K/XP/NT environment.
A suggestion by Damian Coutts using Microsoft's NTBackup utility in conjunction with Bacula should permit a Full bare metal restore of Win2K/XP (and possibly NT systems). His suggestion is to do an NTBackup of the critical system state prior to running a Bacula backup with the following command:
ntbackup backup systemstate /F c:\systemstate.bkf
The backup is the command, the systemstate says to backup only the system state and not all the user files, and the /F c:\systemstate.bkf specifies where to write the state file. this file must then be saved and restored by Bacula. This command can be put in a Client Run Before Job directive so that it is automatically run during each backup, and thus saved to a Bacula Volume.
To restore the system state, you first reload a base operating system, then you would use Bacula to restore all the users files and to recover the c:\systemstate.bkf file, and finally, run NTBackup and catalogue the system statefile, and then select it for restore. The documentation says you can't run a command line restore of the systemstate.
This procedure has been confirmed to work by Ludovic Strappazon - many thanks!
A new tool is provided in the form of a bacula plugin for the BartPE rescue CD. BartPE is a self-contained WindowsXP boot CD which you can make using the PeBuilder tools available at http://www.nu2.nu/pebuilder/http://www.nu2.nu/pebuilder/ and a valid Windows XP SP1 CDROM. The plugin is provided as a zip archive. Unzip the file and copy the bacula directory into the plugin directory of your BartPE installation. Edit the configuration files to suit your installation and build your CD according to the instructions at Bart's site. This will permit you to boot from the cd, configure and start networking, start the bacula file client and access your director with the console program. The programs menu on the booted CD contains entries to install the file client service, start the file client service, and start the WX-Console. You can also open a command line window and CD Programs\Bacula and run the command line console bconsole.
Bacula versions after 1.31 should properly restore ownership and permissions on all WinNT/XP/2K systems. If you do experience problems, generally in restores to alternate directories because higher level directories were not backed up by Bacula, you can correct any problems with the SetACL available under the GPL license at: http://sourceforge.net/projects/setacl/http://sourceforge.net/projects/setacl/.
Ludovic Strappazon has suggested an interesting way to backup and restore complete Win32 partitions. Simply boot your Win32 system with a Linux Rescue disk as described above for Linux, install a statically linked Bacula, and backup any of the raw partitions you want. Then to restore the system, you simply restore the raw partition or partitions. Here is the email that Ludovic recently sent on that subject:
I've just finished testing my brand new cd LFS/Bacula with a raw Bacula backup and restore of my portable. I can't resist sending you the results: look at the rates !!! hunt-dir: Start Backup JobId 100, Job=HuntBackup.2003-04-17_12.58.26 hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:14 JobId: 100 Job: HuntBackup.2003-04-17_12.58.26 FileSet: RawPartition Backup Level: Full Client: sauvegarde-fd Start time: 17-Apr-2003 12:58 End time: 17-Apr-2003 13:14 Files Written: 1 Bytes Written: 10,058,586,272 Rate: 10734.9 KB/s Software Compression: None Volume names(s): 000103 Volume Session Id: 2 Volume Session Time: 1050576790 Last Volume Bytes: 10,080,883,520 FD termination status: OK SD termination status: OK Termination: Backup OK hunt-dir: Begin pruning Jobs. hunt-dir: No Jobs found to prune. hunt-dir: Begin pruning Files. hunt-dir: No Files found to prune. hunt-dir: End auto prune. hunt-dir: Start Restore Job RestoreFilesHunt.2003-04-17_13.21.44 hunt-sd: Forward spacing to file 1. hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:54 JobId: 101 Job: RestoreFilesHunt.2003-04-17_13.21.44 Client: sauvegarde-fd Start time: 17-Apr-2003 13:21 End time: 17-Apr-2003 13:54 Files Restored: 1 Bytes Restored: 10,056,130,560 Rate: 5073.7 KB/s FD termination status: OK Termination: Restore OK hunt-dir: Begin pruning Jobs. hunt-dir: No Jobs found to prune. hunt-dir: Begin pruning Files. hunt-dir: No Files found to prune. hunt-dir: End auto prune.
If for some reason you want to do a Full restore to a system that has a working kernel (not recommended), you will need to take care not to overwrite the following files:
/etc/grub.conf /etc/X11/Conf /etc/fstab /etc/mtab /lib/modules /usr/modules /usr/X11R6 /etc/modules.conf
Many thanks to Charles Curley who wrote Linux Complete Backup and Recovery HOWTOhttp://www.tldp.org/HOWTO/Linux-Complete-Backup-and-Recovery-HOWTO/index.html for the The Linux Documentation Projecthttp://www.tldp.org/. This is an excellent document on how to do Bare Metal Recovery on Linux systems, and it was this document that made me realize that Bacula could do the same thing.
You can find quite a few additional resources, both commercial and free at Storage Mountainhttp://www.backupcentral.com, formerly known as Backup Central.
And finally, the O'Reilly book, "Unix Backup & Recovery" by W. Curtis Preston covers virtually every backup and recovery topic including bare metal recovery for a large range of Unix systems.