Subsections


Testing Your Tape Drive With Bacula

This chapter is concerned with testing and configuring your tape drive to make sure that it will work properly with Bacula using the btape program.

Get Your Tape Drive Working

In general, you should follow the following steps to get your tape drive to work with Bacula. Start with a tape mounted in your drive. If you have an autochanger, load a tape into the drive. We use /dev/nst0 as the tape drive name, you will need to adapt it according to your system.

Do not proceed to the next item until you have succeeded with the previous one.

  1. Make sure that Bacula (the Storage daemon) is not running or that you have unmounted the drive you will use for testing.

  2. Use tar to write to, then read from your drive:

       mt -f /dev/nst0 rewind
       tar cvf /dev/nst0 .
       mt -f /dev/nst0 rewind
       tar tvf /dev/nst0
    

  3. Make sure you have a valid and correct Device resource corresponding to your drive. For Linux users, generally, the default one works. For FreeBSD users, there are two possible Device configurations (see below). For other drives and/or OSes, you will need to first ensure that your system tape modes are properly setup (see below), then possibly modify you Device resource depending on the output from the btape program (next item). When doing this, you should consult the Storage Daemon Configuration of this manual.

  4. If you are using a Fibre Channel to connect your tape drive to Bacula, please be sure to disable any caching in the NSR (network storage router, which is a Fibre Channel to SCSI converter).

  5. Run the btape test command:

       ./btape -c bacula-sd.conf /dev/nst0
       test
    

    It isn't necessary to run the autochanger part of the test at this time, but do not go past this point until the basic test succeeds. If you do have an autochanger, please be sure to read the Autochanger chapter of this manual.

  6. Run the btape fill command, preferably with two volumes. This can take a long time. If you have an autochanger and it is configured, Bacula will automatically use it. If you do not have it configured, you can manually issue the appropriate mtx command, or press the autochanger buttons to change the tape when requested to do so.

  7. FreeBSD users, if you have a pre-5.0 system run the tapetest program, and make sure your system is patched if necessary. The tapetest program can be found in the platform/freebsd directory. The instructions for its use are at the top of the file.

  8. Run Bacula, and backup a reasonably small directory, say 60 Megabytes. Do three successive backups of this directory.

  9. Stop Bacula, then restart it. Do another full backup of the same directory. Then stop and restart Bacula.

  10. Do a restore of the directory backed up, by entering the following restore command, being careful to restore it to an alternate location:

       restore select all done
       yes
    

    Do a diff on the restored directory to ensure it is identical to the original directory. If you are going to backup multiple different systems (Linux, Windows, Mac, Solaris, FreeBSD, ...), be sure you test the restore on each system type.

  11. If you have an autochanger, you should now go back to the btape program and run the autochanger test:

         ./btape -c bacula-sd.conf /dev/nst0
         auto
    

    Adjust your autochanger as necessary to ensure that it works correctly. See the Autochanger chapter of this manual for a complete discussion of testing your autochanger.

  12. We strongly recommend that you use a dedicated SCSI controller for your tape drives. Scanners are known to induce serious problems with the SCSI bus, causing it to reset. If the SCSI bus is reset while Bacula has the tape drive open, it will most likely be fatal to your tape since the drive will rewind. These kinds of problems show up in the system log. For example, the following was most likely caused by a scanner:

    Feb 14 17:29:55 epohost kernel: (scsi0:A:2:0): No or incomplete CDB sent to device.
    Feb 14 17:29:55 epohost kernel: scsi0: Issued Channel A Bus Reset. 1 SCBs aborted
    

If you have reached this point, you stand a good chance of having everything work. If you get into trouble at any point, carefully read the documentation given below. If you cannot get past some point, ask the bacula-users email list, but specify which of the steps you have successfully completed. In particular, you may want to look at the Tips for Resolving Problems section below.

Problems When no Tape in Drive

When Bacula was first written the Linux 2.4 kernel permitted opening the drive whether or not there was a tape in the drive. Thus the Bacula code is based on the concept that if the drive cannot be opened, there is a serious problem, and the job is failed.

With version 2.6 of the Linux kernel, if there is no tape in the drive, the OS will wait two minutes (default) and then return a failure, and consequently, Bacula version 1.36 and below will fail the job. This is important to keep in mind, because if you use an option such as Offline on Unmount = yes, there will be a point when there is no tape in the drive, and if another job starts or if Bacula asks the operator to mount a tape, when Bacula attempts to open the drive (about a 20 minute delay), it will fail and Bacula will fail the job.

In version 1.38.x, the Bacula code partially gets around this problem -- at least in the initial open of the drive. However, functions like Polling the drive do not work correctly if there is no tape in the drive. Providing you do not use Offline on Unmount = yes, you should not experience job failures as mentioned above. If you do experience such failures, you can also increase the Maximum Open Wait time interval, which will give you more time to mount the next tape before the job is failed.

Specifying the Configuration File

Starting with version 1.27, each of the tape utility programs including the btape program requires a valid Storage daemon configuration file (actually, the only part of the configuration file that btape needs is the Device resource definitions). This permits btape to find the configuration parameters for your archive device (generally a tape drive). Without those parameters, the testing and utility programs do not know how to properly read and write your drive. By default, they use bacula-sd.conf in the current directory, but you may specify a different configuration file using the -c option.

Specifying a Device Name For a Tape

btape device-name where the Volume can be found. In the case of a tape, this is the physical device name such as /dev/nst0 or /dev/rmt/0ubn depending on your system that you specify on the Archive Device directive. For the program to work, it must find the identical name in the Device resource of the configuration file. If the name is not found in the list of physical names, the utility program will compare the name you entered to the Device names (rather than the Archive device names).

When specifying a tape device, it is preferable that the "non-rewind" variant of the device file name be given. In addition, on systems such as Sun, which have multiple tape access methods, you must be sure to specify to use Berkeley I/O conventions with the device. The b in the Solaris (Sun) archive specification /dev/rmt/0mbn is what is needed in this case. Bacula does not support SysV tape drive behavior.

See below for specifying Volume names.

Specifying a Device Name For a File

If you are attempting to read or write an archive file rather than a tape, the device-name should be the full path to the archive location including the filename. The filename (last part of the specification) will be stripped and used as the Volume name, and the path (first part before the filename) must have the same entry in the configuration file. So, the path is equivalent to the archive device name, and the filename is equivalent to the volume name.


btape

This program permits a number of elementary tape operations via a tty command interface. The test command, described below, can be very useful for testing tape drive compatibility problems. Aside from initial testing of tape drive compatibility with Bacula, btape will be mostly used by developers writing new tape drivers.

btape can be dangerous to use with existing Bacula tapes because it will relabel a tape or write on the tape if so requested regardless of whether or not the tape contains valuable data, so please be careful and use it only on blank tapes.

To work properly, btape needs to read the Storage daemon's configuration file. As a default, it will look for bacula-sd.conf in the current directory. If your configuration file is elsewhere, please use the -c option to specify where.

The physical device name or the Device resource name must be specified on the command line, and this same device name must be present in the Storage daemon's configuration file read by btape

Usage: btape [options] device_name
       -b <file>   specify bootstrap file
       -c <file>   set configuration file to file
       -d <nn>     set debug level to nn
       -p          proceed inspite of I/O errors
       -s          turn off signals
       -v          be verbose
       -?          print this message.

Using btape to Verify your Tape Drive

An important reason for this program is to ensure that a Storage daemon configuration file is defined so that Bacula will correctly read and write tapes.

It is highly recommended that you run the test command before running your first Bacula job to ensure that the parameters you have defined for your storage device (tape drive) will permit Bacula to function properly. You only need to mount a blank tape, enter the command, and the output should be reasonably self explanatory. For example:

(ensure that Bacula is not running)
./btape -c /usr/bin/bacula/bacula-sd.conf /dev/nst0

The output will be:

Tape block granularity is 1024 bytes.
btape: btape.c:376 Using device: /dev/nst0
*

Enter the test command:

test

The output produced should be something similar to the following: I've cut the listing short because it is frequently updated to have new tests.

=== Append files test ===
This test is essential to Bacula.
I'm going to write one record  in file 0,
                   two records in file 1,
             and three records in file 2
btape: btape.c:387 Rewound /dev/nst0
btape: btape.c:855 Wrote one record of 64412 bytes.
btape: btape.c:857 Wrote block to device.
btape: btape.c:410 Wrote EOF to /dev/nst0
btape: btape.c:855 Wrote one record of 64412 bytes.
btape: btape.c:857 Wrote block to device.
btape: btape.c:855 Wrote one record of 64412 bytes.
btape: btape.c:857 Wrote block to device.
btape: btape.c:410 Wrote EOF to /dev/nst0
btape: btape.c:855 Wrote one record of 64412 bytes.
btape: btape.c:857 Wrote block to device.
btape: btape.c:855 Wrote one record of 64412 bytes.
btape: btape.c:857 Wrote block to device.
btape: btape.c:855 Wrote one record of 64412 bytes.
btape: btape.c:857 Wrote block to device.
btape: btape.c:410 Wrote EOF to /dev/nst0
btape: btape.c:387 Rewound /dev/nst0
btape: btape.c:693 Now moving to end of media.
btape: btape.c:427 Moved to end of media
We should be in file 3. I am at file 3. This is correct!
Now the important part, I am going to attempt to append to the tape.
...
=== End Append files test ===

If you do not successfully complete the above test, please resolve the problem(s) before attempting to use Bacula. Depending on your tape drive, the test may recommend that you add certain records to your configuration. We strongly recommend that you do so and then re-run the above test to insure it works the first time.

Some of the suggestions it provides for resolving the problems may or may not be useful. If at all possible avoid using fixed blocking. If the test suddenly starts to print a long series of:

Got EOF on tape.
Got EOF on tape.
...

then almost certainly, you are running your drive in fixed block mode rather than variable block mode. See below for more help of resolving fix versus variable block problems.

It is also possible that you have your drive set in SysV tape drive mode. The drive must use BSD tape conventions. See the section above on setting your Archive device correctly.

For FreeBSD users, please see the notes below for doing further testing of your tape drive.

Linux SCSI Tricks

You can find out what SCSI devices you have by doing:

lsscsi

Typical output is:

[0:0:0:0]    disk    ATA      ST3160812AS      3.AD  /dev/sda
[2:0:4:0]    tape    HP       Ultrium 2-SCSI   F6CH  /dev/st0
[2:0:5:0]    tape    HP       Ultrium 2-SCSI   F6CH  /dev/st1
[2:0:6:0]    mediumx OVERLAND LXB              0107  -
[2:0:9:0]    tape    HP       Ultrium 1-SCSI   E50H  /dev/st2
[2:0:10:0]   mediumx OVERLAND LXB              0107  -

There are two drives in one autochanger: /dev/st0 and /dev/st1 and a third tape drive at /dev/st2. For using them with Bacula, one would normally reference them as /dev/nst0 ... /dev/nst2. Not also, there are two different autochangers identified as "mediumx OVERLAND LXB". They can be addressed via their /dev/sgN designation, which can be obtained by counting from the beginning as 0 to each changer. In the above case, the two changers are located on /dev/sg3 and /dev/sg5. The one at /dev/sg3, controls drives /dev/nst0 and /dev/nst1; and the one at /dev/sg5 controles drive /dev/nst2.

If you do not have the lsscsi command, you can obtain the same information as follows:

cat /proc/scsi/scsi

For the above example with the three drives and two autochangers, I get:

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: ST3160812AS      Rev: 3.AD
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 04 Lun: 00
  Vendor: HP       Model: Ultrium 2-SCSI   Rev: F6CH
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 05 Lun: 00
  Vendor: HP       Model: Ultrium 2-SCSI   Rev: F6CH
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 06 Lun: 00
  Vendor: OVERLAND Model: LXB              Rev: 0107
  Type:   Medium Changer                   ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 09 Lun: 00
  Vendor: HP       Model: Ultrium 1-SCSI   Rev: E50H
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 10 Lun: 00
  Vendor: OVERLAND Model: LXB              Rev: 0107
  Type:   Medium Changer                   ANSI SCSI revision: 02

As an additional example, I get the following (on a different machine from the above example):

Attached devices:
Host: scsi2 Channel: 00 Id: 01 Lun: 00
  Vendor: HP       Model: C5713A           Rev: H107
  Type:   Sequential-Access                ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 04 Lun: 00
  Vendor: SONY     Model: SDT-10000        Rev: 0110
  Type:   Sequential-Access                ANSI SCSI revision: 02

The above represents first an autochanger and second a simple tape drive. The HP changer (the first entry) uses the same SCSI channel for data and for control, so in Bacula, you would use:

Archive Device = /dev/nst0
Changer Device = /dev/sg0

If you want to remove the SDT-10000 device, you can do so as root with:

echo "scsi remove-single-device 2 0 4 0">/proc/scsi/scsi

and you can put add it back with:

echo "scsi add-single-device 2 0 4 0">/proc/scsi/scsi

where the 2 0 4 0 are the Host, Channel, Id, and Lun as seen on the output from cat /proc/scsi/scsi. Note, the Channel must be specified as numeric.

Below is a slightly more complicated output, which is a single autochanger with two drives, and which operates the changer on a different channel from from the drives:

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: WDC WD1600JD-75H Rev: 08.0
  Type:   Direct-Access                    ANSI SCSI revision: 05
Host: scsi2 Channel: 00 Id: 04 Lun: 00
  Vendor: HP       Model: Ultrium 2-SCSI   Rev: F6CH
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 05 Lun: 00
  Vendor: HP       Model: Ultrium 2-SCSI   Rev: F6CH
  Type:   Sequential-Access                ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 06 Lun: 00
  Vendor: OVERLAND Model: LXB              Rev: 0106
  Type:   Medium Changer                   ANSI SCSI revision: 02

The above tape drives are accessed on /dev/nst0 and /dev/nst1, while the control channel for those two drives is /dev/sg3.

Tips for Resolving Problems

Bacula Saves But Cannot Restore Files

If you are getting error messages such as:

Volume data error at 0:1! Wanted block-id: "BB02", got "". Buffer discarded

It is very likely that Bacula has tried to do block positioning and ended up at an invalid block. This can happen if your tape drive is in fixed block mode while Bacula's default is variable blocks. Note that in such cases, Bacula is perfectly able to write to your Volumes (tapes), but cannot position to read them.

There are two possible solutions.

  1. The first and best is to always ensure that your drive is in variable block mode. Note, it can switch back to fixed block mode on a reboot or if another program uses the drive. So on such systems you need to modify the Bacula startup files to explicitly set:

    mt -f /dev/nst0 defblksize 0
    

    or whatever is appropriate on your system. Note, if you are running a Linux system, and the above command does not work, it is most likely because you have not loaded the appropriate mt package, which is often called mt_st, but may differ according to your distribution.

  2. The second possibility, especially, if Bacula wrote while the drive was in fixed block mode, is to turn off block positioning in Bacula. This is done by adding:

    Block Positioning = no
    

    to the Device resource. This is not the recommended procedure because it can enormously slow down recovery of files, but it may help where all else fails. This directive is available in version 1.35.5 or later (and not yet tested).

If you are getting error messages such as:

Volume data error at 0:0!
Block checksum mismatch in block=0 len=32625 calc=345678 blk=123456

You are getting tape read errors, and this is most likely due to one of the following things:

  1. An old or bad tape.
  2. A dirty drive that needs cleaning (particularly for DDS drives).
  3. A loose SCSI cable.
  4. Old firmware in your drive. Make sure you have the latest firmware loaded.
  5. Computer memory errors.
  6. Over-clocking your CPU.
  7. A bad SCSI card.

Bacula Cannot Open the Device

If you get an error message such as:

dev open failed: dev.c:265 stored: unable to open
device /dev/nst0:> ERR=No such device or address

the first time you run a job, it is most likely due to the fact that you specified the incorrect device name on your Archive Device.

If Bacula works fine with your drive, then all off a sudden you get error messages similar to the one shown above, it is quite possible that your driver module is being removed because the kernel deems it idle. This is done via crontab with the use of rmmod -a. To fix the problem, you can remove this entry from crontab, or you can manually modprob your driver module (or add it to the local startup script). Thanks to Alan Brown for this tip.

Incorrect File Number

When Bacula moves to the end of the medium, it normally uses the ioctl(MTEOM) function. Then Bacula uses the ioctl(MTIOCGET) function to retrieve the current file position from the mt_fileno field. Some SCSI tape drivers will use a fast means of seeking to the end of the medium and in doing so, they will not know the current file position and hence return a -1. As a consequence, if you get "This is NOT correct!" in the positioning tests, this may be the cause. You must correct this condition in order for Bacula to work.

There are two possible solutions to the above problem of incorrect file number:

Incorrect Number of Blocks or Positioning Errors

Bacula's preferred method of working with tape drives (sequential devices) is to run in variable block mode, and this is what is set by default. You should first ensure that your tape drive is set for variable block mode (see below).

If your tape drive is in fixed block mode and you have told Bacula to use different fixed block sizes or variable block sizes (default), you will get errors when Bacula attempts to forward space to the correct block (the kernel driver's idea of tape blocks will not correspond to Bacula's).

All modern tape drives support variable tape blocks, but some older drives (in particular the QIC drives) as well as the ATAPI ide-scsi driver run only in fixed block mode. The Travan tape drives also apparently must run in fixed block mode (to be confirmed).

Even in variable block mode, with the exception of the first record on the second or subsequent volume of a multi-volume backup, Bacula will write blocks of a fixed size. However, in reading a tape, Bacula will assume that for each read request, exactly one block from the tape will be transferred. This the most common way that tape drives work and is well supported by Bacula.

Drives that run in fixed block mode can cause serious problems for Bacula if the drive's block size does not correspond exactly to Bacula's block size. In fixed block size mode, drivers may transmit a partial block or multiple blocks for a single read request. From Bacula's point of view, this destroys the concept of tape blocks. It is much better to run in variable block mode, and almost all modern drives (the OnStream is an exception) run in variable block mode. In order for Bacula to run in fixed block mode, you must include the following records in the Storage daemon's Device resource definition:

Minimum Block Size = nnn
Maximum Block Size = nnn

where nnn must be the same for both records and must be identical to the driver's fixed block size.

We recommend that you avoid this configuration if at all possible by using variable block sizes.

If you must run with fixed size blocks, make sure they are not 512 bytes. This is too small and the overhead that Bacula has with each record will become excessive. If at all possible set any fixed block size to something like 64,512 bytes or possibly 32,768 if 64,512 is too large for your drive. See below for the details on checking and setting the default drive block size.

To recover files from tapes written in fixed block mode, see below.

Ensuring that the Tape Modes Are Properly Set -- Linux Only

If you have a modern SCSI tape drive and you are having problems with the test command as noted above, it may be that some program has set one or more of your SCSI driver's options to non-default values. For example, if your driver is set to work in SysV manner, Bacula will not work correctly because it expects BSD behavior. To reset your tape drive to the default values, you can try the following, but ONLY if you have a SCSI tape drive on a Linux system:

become super user
mt -f /dev/nst0 rewind
mt -f /dev/nst0 stoptions buffer-writes async-writes read-ahead

The above commands will clear all options and then set those specified. None of the specified options are required by Bacula, but a number of other options such as SysV behavior must not be set. Bacula does not support SysV tape behavior. On systems other than Linux, you will need to consult your mt man pages or documentation to figure out how to do the same thing. This should not really be necessary though -- for example, on both Linux and Solaris systems, the default tape driver options are compatible with Bacula. On Solaris systems, you must take care to specify the correct device name on the Archive device directive. See above for more details.

You may also want to ensure that no prior program has set the default block size, as happened to one user, by explicitly turning it off with:

mt -f /dev/nst0 defblksize 0

If you are running a Linux system, and the above command does not work, it is most likely because you have not loaded the appropriate mt package, which is often called mt_st, but may differ according to your distribution.

If you would like to know what options you have set before making any of the changes noted above, you can now view them on Linux systems, thanks to a tip provided by Willem Riede. Do the following:

become super user
mt -f /dev/nst0 stsetoptions 0
grep st0 /var/log/messages

and you will get output that looks something like the following:

kernel: st0: Mode 0 options: buffer writes: 1, async writes: 1, read ahead: 1
kernel: st0:    can bsr: 0, two FMs: 0, fast mteom: 0, auto lock: 0,
kernel: st0:    defs for wr: 0, no block limits: 0, partitions: 0, s2 log: 0
kernel: st0:    sysv: 0 nowait: 0

Note, I have chopped off the beginning of the line with the date and machine name for presentation purposes.

Some people find that the above settings only last until the next reboot, so please check this otherwise you may have unexpected problems.

Beginning with Bacula version 1.35.8, if Bacula detects that you are running in variable block mode, it will attempt to set your drive appropriately. All OSes permit setting variable block mode, but some OSes do not permit setting the other modes that Bacula needs to function properly.

Tape Hardware Compression and Blocking Size

As far as I can tell, there is no way with the mt program to check if your tape hardware compression is turned on or off. You can, however, turn it on by using (on Linux):

become super user
mt -f /dev/nst0 defcompression 1

and of course, if you use a zero instead of the one at the end, you will turn it off.

If you have built the mtx program in the depkgs package, you can use tapeinfo to get quite a bit of information about your tape drive even if it is not an autochanger. This program is called using the SCSI control device. On Linux for tape drive /dev/nst0, this is usually /dev/sg0, while on FreeBSD for /dev/nsa0, the control device is often /dev/pass2. For example on my DDS-4 drive (/dev/nst0), I get the following:

tapeinfo -f /dev/sg0
Product Type: Tape Drive
Vendor ID: 'HP      '
Product ID: 'C5713A          '
Revision: 'H107'
Attached Changer: No
MinBlock:1
MaxBlock:16777215
SCSI ID: 5
SCSI LUN: 0
Ready: yes
BufferedMode: yes
Medium Type: Not Loaded
Density Code: 0x26
BlockSize: 0

where the DataCompEnabled: yes means that tape hardware compression is turned on. You can turn it on and off (yes|no) by using the mt commands given above. Also, this output will tell you if the BlockSize is non-zero and hence set for a particular block size. Bacula is not likely to work in such a situation because it will normally attempt to write blocks of 64,512 bytes, except the last block of the job which will generally be shorter. The first thing to try is setting the default block size to zero using the mt -f /dev/nst0 defblksize 0 command as shown above. On FreeBSD, this would be something like: mt -f /dev/nsa0 blocksize 0.

On some operating systems with some tape drives, the amount of data that can be written to the tape and whether or not compression is enabled is determined by the density usually the mt -f /dev/nst0 setdensity xxx command. Often mt -f /dev/nst0 status will print out the current density code that is used with the drive. Most systems, but unfortunately not all, set the density to the maximum by default. On some systems, you can also get a list of all available density codes with: mt -f /dev/nst0 densities or a similar mt command. Note, for DLT and SDLT devices, no-compression versus compression is very often controlled by the density code. On FreeBSD systems, the compression mode is set using mt -f /dev/nsa0 comp xxx where xxx is the mode you want. In general, see man mt for the options available on your system.

Note, some of the above mt commands may not be persistent depending on your system configuration. That is they may be reset if a program other than Bacula uses the drive or, as is frequently the case, on reboot of your system.

If your tape drive requires fixed block sizes (very unusual), you can use the following records:

Minimum Block Size = nnn
Maximum Block Size = nnn

in your Storage daemon's Device resource to force Bacula to write fixed size blocks (where you sent nnn to be the same for both of the above records). This should be done only if your drive does not support variable block sizes, or you have some other strong reasons for using fixed block sizes. As mentioned above, a small fixed block size of 512 or 1024 bytes will be very inefficient. Try to set any fixed block size to something like 64,512 bytes or larger if your drive will support it.

Also, note that the Medium Type field of the output of tapeinfo reports Not Loaded, which is not correct. As a consequence, you should ignore that field as well as the Attached Changer field.

To recover files from tapes written in fixed block mode, see below.

Tape Modes on FreeBSD

On most FreeBSD systems such as 4.9 and most tape drives, Bacula should run with:

mt  -f  /dev/nsa0  seteotmodel  2
mt  -f  /dev/nsa0  blocksize   0
mt  -f  /dev/nsa0  comp  enable

You might want to put those commands in a startup script to make sure your tape driver is properly initialized before running Bacula, because depending on your system configuration, these modes may be reset if a program other than Bacula uses the drive or when your system is rebooted.

Then according to what the btape test command returns, you will probably need to set the following (see below for an alternative):

  Hardware End of Medium = no
  BSF at EOM = yes
  Backward Space Record = no
  Backward Space File = no
  Fast Forward Space File = no
  TWO EOF = yes

Then be sure to run some append tests with Bacula where you start and stop Bacula between appending to the tape, or use btape version 1.35.1 or greater, which includes simulation of stopping/restarting Bacula.

Please see the file platforms/freebsd/pthreads-fix.txt in the main Bacula directory concerning important information concerning compatibility of Bacula and your system. A much more optimal Device configuration is shown below, but does not work with all tape drives. Please test carefully before putting either into production.

Note, for FreeBSD 4.10-RELEASE, using a Sony TSL11000 L100 DDS4 with an autochanger set to variable block size and DCLZ compression, Brian McDonald reports that to get Bacula to append correctly between Bacula executions, the correct values to use are:

mt  -f  /dev/nsa0  seteotmodel  1
mt  -f  /dev/nsa0  blocksize  0
mt  -f /dev/nsa0  comp  enable

and

  Hardware End of Medium = no
  BSF at EOM = no
  Backward Space Record = no
  Backward Space File = no
  Fast Forward Space File = yes
  TWO EOF = no

This has been confirmed by several other people using different hardware. This configuration is the preferred one because it uses one EOF and no backspacing at the end of the tape, which works much more efficiently and reliably with modern tape drives.

Finally, here is a Device configuration that Danny Butroyd reports to work correctly with the Overland Powerloader tape library using LT0-2 and FreeBSD 5.4-Stable:

# Overland Powerloader LT02 - 17 slots single drive
Device {
  Name = Powerloader
  Media Type = LT0-2
  Archive Device = /dev/nsa0
  AutomaticMount = yes;              
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  Changer Command = "/usr/local/sbin/mtx-changer %c %o %S %a %d"
  Changer Device = /dev/pass2
  AutoChanger = yes
  Alert Command = "sh -c 'tapeinfo -f %c |grep TapeAlert|cat'"

  # FreeBSD Specific Settings
  Offline On Unmount = no
  Hardware End of Medium = no
  BSF at EOM = yes
  Backward Space Record = no
  Fast Forward Space File = no
  TWO EOF = yes
}

The following Device resource works fine with Dell PowerVault 110T and
120T devices on both FreeBSD 5.3 and on NetBSD 3.0.  It also works
with Sony AIT-2 drives on FreeBSD.
\footnotesize
\begin{verbatim}
Device {
  ...
  # FreeBSD/NetBSD Specific Settings
  Hardware End of Medium = no
  BSF at EOM = yes
  Backward Space Record = no
  Fast Forward Space File = yes
  TWO EOF = yes
}

On FreeBSD version 6.0, it is reported that you can even set Backward Space Record = yes.

Finding your Tape Drives and Autochangers on FreeBSD

On FreeBSD, you can do a camcontrol devlist as root to determine what drives and autochangers you have. For example,

undef# camcontrol devlist
    at scbus0 target 2 lun 0 (pass0,sa0)
    at scbus0 target 4 lun 0 (pass1,sa1)
    at scbus0 target 4 lun 1 (pass2)

from the above, you can determine that there is a tape drive on /dev/sa0 and another on /dev/sa1 in addition since there is a second line for the drive on /dev/sa1, you know can assume that it is the control device for the autochanger (i.e. /dev/pass2). It is also the control device name to use when invoking the tapeinfo program. E.g.

tapeinfo -f /dev/pass2

Using the OnStream driver on Linux Systems

Bacula version 1.33 (not 1.32x) is now working and ready for testing with the OnStream kernel osst driver version 0.9.14 or above. Osst is available from: http://sourceforge.net/projects/osst/.

To make Bacula work you must first load the new driver then, as root, do:

  mt -f /dev/nosst0 defblksize 32768

Also you must add the following to your Device resource in your Storage daemon's conf file:

 Minimum Block Size = 32768
 Maximum Block Size = 32768

Here is a Device specification provided by Michel Meyers that is known to work:

Device {
  Name = "Onstream DI-30"
  Media Type = "ADR-30"
  Archive Device = /dev/nosst0
  Minimum Block Size = 32768
  Maximum Block Size = 32768
  Hardware End of Medium = yes
  BSF at EOM = no
  Backward Space File = yes
  Fast Forward Space File = yes
  Two EOF = no
  AutomaticMount = yes
  AlwaysOpen = yes
  Removable Media = yes
}

Hardware Compression on EXB-8900

To active, check, or disable the hardware compression feature on an EXB-8900, use the exabyte MammothTool. You can get it here: http://www.exabyte.com/support/online/downloads/index.cfm. There is a Solaris version of this tool. With option -C 0 or 1 you can disable or activate compression. Start this tool without any options for a small reference.

Using btape to Simulate Filling a Tape

Because there are often problems with certain tape drives or systems when end of tape conditions occur, btape has a special command fill that causes it to write random data to a tape until the tape fills. It then writes at least one more Bacula block to a second tape. Finally, it reads back both tapes to ensure that the data has been written in a way that Bacula can recover it. Note, there is also a single tape option as noted below, which you should use rather than the two tape test. See below for more details.

This can be an extremely time consuming process (here it is about 6 hours) to fill a full tape. Note, that btape writes random data to the tape when it is filling it. This has two consequences: 1. it takes a bit longer to generate the data, especially on slow CPUs. 2. the total amount of data is approximately the real physical capacity of your tape, regardless of whether or not the tape drive compression is on or off. This is because random data does not compress very much.

To begin this test, you enter the fill command and follow the instructions. There are two options: the simple single tape option and the multiple tape option. Please use only the simple single tape option because the multiple tape option still doesn't work totally correctly. If the single tape option does not succeed, you should correct the problem before using Bacula.

Recovering Files Written With Fixed Block Sizes

If you have been previously running your tape drive in fixed block mode (default 512) and Bacula with variable blocks (default), then in version 1.32f-x and 1.34 and above, Bacula will fail to recover files because it does block spacing, and because the block sizes don't agree between your tape drive and Bacula it will not work.

The long term solution is to run your drive in variable block mode as described above. However, if you have written tapes using fixed block sizes, this can be a bit of a pain. The solution to the problem is: while you are doing a restore command using a tape written in fixed block size, ensure that your drive is set to the fixed block size used while the tape was written. Then when doing the restore command in the Console program, do not answer the prompt yes/mod/no. Instead, edit the bootstrap file (the location is listed in the prompt) using any ASCII editor. Remove all VolBlock lines in the file. When the file is re-written, answer the question, and Bacula will run without using block positioning, and it should recover your files.

Tape Blocking Modes

SCSI tapes may either be written in variable or fixed block sizes. Newer drives support both modes, but some drives such as the QIC devices always use fixed block sizes. Bacula attempts to fill and write complete blocks (default 65K), so that in normal mode (variable block size), Bacula will always write blocks of the same size except the last block of a Job. If Bacula is configured to write fixed block sizes, it will pad the last block of the Job to the correct size. Bacula expects variable tape block size drives to behave as follows: Each write to the drive results in a single record being written to the tape. Each read returns a single record. If you request less bytes than are in the record, only those number of bytes will be returned, but the entire logical record will have been read (the next read will retrieve the next record). Thus data from a single write is always returned in a single read, and sequentially written records are returned by sequential reads.

Bacula expects fixed block size tape drives to behave as follows: If a write length is greater than the physical block size of the drive, the write will be written as two blocks each of the fixed physical size. This single write may become multiple physical records on the tape. (This is not a good situation). According to the documentation, one may never write an amount of data that is not the exact multiple of the blocksize (it is not specified if an error occurs or if the the last record is padded). When reading, it is my understanding that each read request reads one physical record from the tape. Due to the complications of fixed block size tape drives, you should avoid them if possible with Bacula, or you must be ABSOLUTELY certain that you use fixed block sizes within Bacula that correspond to the physical block size of the tape drive. This will ensure that Bacula has a one to one correspondence between what it writes and the physical record on the tape.

Please note that Bacula will not function correctly if it writes a block and that block is split into two or more physical records on the tape. Bacula assumes that each write causes a single record to be written, and that it can sequentially recover each of the blocks it has written by using the same number of sequential reads as it had written.

Details of Tape Modes

Rudolf Cejka has provided the following information concerning certain tape modes and MTEOM.

Tape level
It is always possible to position filemarks or blocks, whereas positioning to the end-of-data is only optional feature, however it is implemented very often. SCSI specification also talks about optional sequential filemarks, setmarks and sequential setmarks, but these are not implemented so often. Modern tape drives keep track of file positions in built-in chip (AIT, LTO) or at the beginning of the tape (SDLT), so there is not any speed difference, if end-of-data or filemarks is used (I have heard, that LTO-1 from all 3 manufacturers do not use its chip for file locations, but a tape as in SDLT case, and I'm not sure about LTO-2 and LTO-3 case). However there is a big difference, that end-of-data ignores file position, whereas filemarks returns the real number of skipped files, so OS can track current file number just in filemarks case.

OS level
Solaris does use just SCSI SPACE Filemarks, it does not support SCSI SPACE End-of-data. When MTEOM is called, Solaris does use SCSI SPACE Filemarks with count = 1048576 for fast mode, and combination of SCSI SPACE Filemarks with count = 1 with SCSI SPACE Blocks with count = 1 for slow mode, so EOD mark on the tape on some older tape drives is not skipped. File number is always tracked for MTEOM.

Linux does support both SCSI SPACE Filemarks and End-of-data: When MTEOM is called in MT_ST_FAST_MTEOM mode, SCSI SPACE End-of-data is used. In the other case, SCSI SPACE Filemarks with count = 8388607 is used. There is no real slow mode like in Solaris - I just expect, that for older tape drives Filemarks may be slower than End-of-data, but not so much as in Solaris slow mode. File number is tracked for MTEOM just without MT_ST_FAST_MTEOM - when MT_ST_FAST_MTEOM is used, it is not.

FreeBSD does support both SCSI SPACE Filemarks and End-of-data, but when MTEOD (MTEOM) is called, SCSI SPACE End-of-data is always used. FreeBSD never use SCSI SPACE Filemarks for MTEOD. File number is never tracked for MTEOD.

Bacula level
When Hardware End of Medium = Yes is used, MTEOM is called, but it does not mean, that hardware End-of-data must be used. When Hardware End of Medium = No, if Fast Forward Space File = Yes, MTFSF with count = 32767 is used, else Block Read with count = 1 with Forward Space File with count = 1 is used, which is really very slow.

Hardware End of Medium = Yes|No
The name of this option is misleading and is the source of confusion, because it is not the hardware EOM, what is really switched here.

If I use Yes, OS must not use SCSI SPACE End-of-data, because Bacula expects, that there is tracked file number, which is not supported by SCSI specification. Instead, the OS have to use SCSI SPACE Filemarks.

If I use No, an action depends on Fast Forward Space File.

When I set Hardware End of Medium = no and Fast Forward Space File = no file positioning was very slow on my LTO-3 (about ten to 100 minutes), but

with Hardware End of Medium = no and Fast Forward Space File = yes, the time is ten to 100 times faster (about one to two minutes).

Autochanger Errors

If you are getting errors such as:

3992 Bad autochanger "load slot 1, drive 1": ERR=Child exited with code 1.

and you are running your Storage daemon as non-root, then most likely you are having permissions problems with the control channel. Running as root, set permissions on /dev/sgX so that the userid and group of your Storage daemon can access the device. You need to ensure that you all access to the proper control device, and if you don't have any SCSI disk drives (including SATA drives), you might want to change the permissions on /dev/sg*.

Syslog Errors

If you are getting errors such as:

: kernel: st0: MTSETDRVBUFFER only allowed for root

you are most likely running your Storage daemon as non-root, and Bacula is attempting to set the correct OS buffering to correspond to your Device resource. Most OSes allow only root to issue this ioctl command. In general, the message can be ignored providing you are sure that your OS parameters are properly configured as described earlier in this manual. If you are running your Storage daemon as root, you should not be getting these system log messages, and if you are, something is probably wrong.


What To Do When Bacula Crashes (Kaboom)

If you are running on a Linux system, and you have a set of working configuration files, it is very unlikely that Bacula will crash. As with all software, however, it is inevitable that someday, it may crash, particularly if you are running on another operating system or using a new or unusual feature.

This chapter explains what you should do if one of the three Bacula daemons (Director, File, Storage) crashes.

Traceback

Each of the three Bacula daemons has a built-in exception handler which, in case of an error, will attempt to produce a traceback. If successful the traceback will be emailed to you.

For this to work, you need to ensure that a few things are setup correctly on your system:

  1. You must have an installed copy of gdb (the GNU debugger), and it must be on Bacula's path.
  2. The Bacula installed script file btraceback must be in the same directory as the daemon which dies, and it must be marked as executable.
  3. The script file btraceback.gdb must have the correct path to it specified in the btraceback file.
  4. You must have a mail program which is on Bacula's path.

If all the above conditions are met, the daemon that crashes will produce a traceback report and email it to you. If the above conditions are not true, you can either run the debugger by hand as described below, or you may be able to correct the problems by editing the btraceback file. I recommend not spending too much time on trying to get the traceback to work as it can be very difficult.

The changes that might be needed are to add a correct path to the gdb program, correct the path to the btraceback.gdb file, change the mail program or its path, or change your email address. The key line in the btraceback file is:

gdb -quiet -batch -x /home/kern/bacula/bin/btraceback.gdb \
     $1 $2 2>\&1 | mail -s "Bacula traceback" your-address@xxx.com

Since each daemon has the same traceback code, a single btraceback file is sufficient if you are running more than one daemon on a machine.

Testing The Traceback

To "manually" test the traceback feature, you simply start Bacula then obtain the PID of the main daemon thread (there are multiple threads). Unfortunately, the output had to be split to fit on this page:

[kern@rufus kern]$ ps fax --columns 132 | grep bacula-dir
 2103 ?        S      0:00 /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2104 ?        S      0:00  \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2106 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2105 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf

which in this case is 2103. Then while Bacula is running, you call the program giving it the path to the Bacula executable and the PID. In this case, it is:

./btraceback /home/kern/bacula/k/src/dird 2103

It should produce an email showing you the current state of the daemon (in this case the Director), and then exit leaving Bacula running as if nothing happened. If this is not the case, you will need to correct the problem by modifying the btraceback script.

Typical problems might be that gdb is not on the default path. Fix this by specifying the full path to it in the btraceback file. Another common problem is that the mail program doesn't work or is not on the default path. On some systems, it is preferable to use Mail rather than mail.

Getting A Traceback On Other Systems

It should be possible to produce a similar traceback on systems other than Linux, either using gdb or some other debugger. Solaris with gdb loaded works quite fine. On other systems, you will need to modify the btraceback program to invoke the correct debugger, and possibly correct the btraceback.gdb script to have appropriate commands for your debugger. If anyone succeeds in making this work with another debugger, please send us a copy of what you modified.

Manually Running Bacula Under The Debugger

If for some reason you cannot get the automatic traceback, or if you want to interactively examine the variable contents after a crash, you can run Bacula under the debugger. Assuming you want to run the Storage daemon under the debugger (the technique is the same for the other daemons, only the name changes), you would do the following:

  1. Start the Director and the File daemon. If the Storage daemon also starts, you will need to find its PID as shown above (ps fax | grep bacula-sd) and kill it with a command like the following:

          kill -15 PID
    

    where you replace PID by the actual value.

  2. At this point, the Director and the File daemon should be running but the Storage daemon should not.
  3. cd to the directory containing the Storage daemon
  4. Start the Storage daemon under the debugger:

        gdb ./bacula-sd
    

  5. Run the Storage daemon:

         run -s -f -c ./bacula-sd.conf
    

    You may replace the ./bacula-sd.conf with the full path to the Storage daemon's configuration file.

  6. At this point, Bacula will be fully operational.
  7. In another shell command window, start the Console program and do what is necessary to cause Bacula to die.
  8. When Bacula crashes, the gdb shell window will become active and gdb will show you the error that occurred.
  9. To get a general traceback of all threads, issue the following command:

           thread apply all bt
    

    After that you can issue any debugging command.

Getting Debug Output from Bacula

Each of the daemons normally has debug compiled into the program, but disabled. There are two ways to enable the debug output. One is to add the -d nnn option on the command line when starting the debugger. The nnn is the debug level, and generally anything between 50 and 200 is reasonable. The higher the number, the more output is produced. The output is written to standard output.

The second way of getting debug output is to dynamically turn it on using the Console using the setdebug command. The full syntax of the command is:

 setdebug level=nnn client=client-name storage=storage-name dir

If none of the options are given, the command will prompt you. You can selectively turn on/off debugging in any or all the daemons (i.e. it is not necessary to specify all the components of the above command).

Kern Sibbald 2008-01-31