Ohne eine ordnungsgemäße Einrichtung und Verwaltung kann es sein, dass Ihr Katalog immer größer wird wenn Jobs laufen und Daten gesichert werden. Zudem kann der Katalog ineffizient und langsam werden. Wie schnell der Katalog wächst, hängt von der Anzahl der Jobs und der Menge der dabei gesicherten Dateien ab. Durch das Löschen von Einträgen im Katalog kann Platz geschaffen werden für neue Einträge der folgenden Jobs. Durch regelmäßiges löschen alter abgelaufener Daten (älter als durch die Aufbewahrungszeiträume (Retention Periods) angegeben), wird dafür gesorgt, dass die Katalog-Datenbank eine konstante Größe beibehält.
Sie können mit der vorgegebenen Konfiguration beginnen, sie enthält bereits sinnvolle Vorgaben für eine kleine Anzahl von Clients (kleiner 5), in diesem Fall wird die Katalogwartung, wenn Sie einige hundert Megabyte freien Plattenplatz haben, nicht dringlich sein. Was aber auch immer der Fall ist, einiges Wissen über die Retention Periods/Aufbewahrungszeiträume der Daten im Katalog und auf den Volumes ist hilfreich.
Bacula benutzt drei verschiedene Aufbewahrungszeiträume: die File Retention: der Aufbewahrungszeitraum für Dateien, die Job Retention: der Aufbewahrungszeitraum für Jobs und die Volume Retention: der Aufbewahrungszeitraum für Volumes. Von diesen drei ist der Aufbewahrungszeitraum für Dateien der entscheidende, wenn es darum geht, wie groß die Datenbank werden wird.
Die File Retention und die Job Retention werden in der Client-Konfiguration, wie unten gezeigt, angegeben. Die Volume Retention wird in der Pool-Konfiguration angegeben, genauere Informationen dazu finden Sie im nächsten Kapitel dieses Handbuchs.
Da die Datei-Einträge ca. 80 Prozent der Katalog-Datenbankgröße ausmachen, sollten Sie sorgfälltig ermitteln über welchen Zeitraum Sie die Einträge aufbewahren wollen. Nachdem die Datei-Einträge gelöscht wurden, ist es nicht mehr möglich einzelne dieser Dateien mit einem Rücksicherungs-Job wiederherzustellen, aber die Bacula-Versionen 1.37 und später sind in der Lage, aufgrund des Job-Eintrags im Katalog, alle Dateien des Jobs zurückzusichern solange der Job-Eintrag im Katalog vorhanden ist.
Aufbewahrungszeiträume werden in Sekunden angegeben, aber der Einfachheit halber sind auch eine Reihe von Hilfsangaben möglich, so dass man Minuten, Stunden, Tage, Wochen, Monate, Quartale und Jahre konfigurieren kann. Lesen Sie bitte das Konfigurations-Kapitel dieses Handbuchs um mehr über diese Hilfsangaben zu erfahren.
Der Standardwert der Aufbewahrungszeit für Dateien ist 60 Tage.
Wie oben erwähnt, sind Sie nicht mehr in der Lage einzelne Dateien eines Jobs zurückzusichern, wenn die Datei-Einträge aus der Katalog-Datenbank entfernt wurden. Jedoch, solange der Job-Eintrag im Katalog vorhanden ist, können Sie immer noch den kompletten Job mit allen Dateien wiederherstellen (ab Bacula-Version 1.37 und größer). Daher ist es eine gute Idee, die Job-Einträge im Katalog länger als die Datei-Einträge aufzubewahren.
Aufbewahrungszeiträume werden in Sekunden angegeben, aber der Einfachheit halber sind auch eine Reihe von Hilfsangaben möglich, so dass man Minuten, Stunden, Tage, Wochen, Monate, Quartale und Jahre konfigurieren kann. Lesen Sie bitte das Konfigurations-Kapitel dieses Handbuchs um mehr über diese Hilfsangaben zu erfahren.
Der Standardwert der Aufbewahrungszeit für Jobs ist 180 Tage.
Mit der Zeit, wie oben schon angemerkt, wird Ihre Datenbank dazu neigen zu wachsen. Auch wenn Bacula regelmäßig Datei-Einträge löscht, wird die MySQL-Datenbank ständig größer werden. Um dies zu vermeiden, muss die Datenbank komprimiert werden. Normalerweise kennen große kommerzielle Datenbanken, wie Oracle, bestimmte Kommandos um den verschwendeten Festplattenplatz wieder freizugeben. MySQL hat das OPTIMIZE TABLE Kommando und bei SQLite (Version 2.8.4 und größer) können Sie das VACUUM Kommando zu diesem Zweck benutzen. Wir überlassen es Ihnen, die Nützlichkeit von OPTIMIZE TABLE oder VACUUM zu ermitteln.
Alle Datenbanken haben Hilfsmittel, um die enthaltenen Daten im ASCII-Format in eine Datei zu schreiben und diese Datei dann auch wieder einzulesen. Wenn man das tut, wird die Datenbank erneut erzeugt, was ein sehr kompaktes Datenbank-Format als Ergebnis hat. Weiter unten zeigen wir Ihnen, wie Sie das bei MySQL, SQLite und PostgreSQL durchführen können.
Bei einer MySQL Datenbank können Sie den Inhalt der Katalog-Datenbank mit den folgenden Kommandos in eine ASCII-Datei (bacula.sql) schreiben und neu in die Datenbank importieren:
mysqldump -f --opt bacula > bacula.sql mysql bacula < bacula.sql rm -f bacula.sql
Abhängig von der Größe Ihrer Datenbank, wird dies mehr oder weniger Zeit und auch Festplattenplatz benötigen. Zum Beispiel, wenn ich in das Verzeichnis wechsle, wo meine MySQL-Datenbank liegt (typischerweise /var/lib/mysql) und dieses Kommando ausführe:
du bacula
bekomme ich die Ausgabe 620,644, was bedeutet dass das Verzeichnis bacula 620.644 Blöcke von 1024 Bytes auf der Festplatte belegt, meine Datenbank enthält also ca. 635 MB an Daten. Nachdem ich das mysqldump ausgeführt habe, ist die dabei entstandene Datei bacula.sql 174.356 Blöcke groß, wenn diese Datei mit dem Kommando mysql bacula < bacula.sql wieder in die Datenbank importiert wird, ergibt sich eine Datenbankgröße von nur noch 210.464 Blöcken. Mit anderen Worten, die komprimierte Version meiner Datenbank, die seit ca. 1 Jahr in Benutzung ist, ist ungefähr nur noch ein Drittel so groß wie vorher.
Als Konsequenz wird empfohlen, auf die Größe der Datenbank zu achten und sie von Zeit zu Zeit (alle sechs Monate oder jährlich) zu komprimieren.
Wenn Sie bemerken, dass das Schreiben der MySQL-Datenbank zu Fehlern führt, oder das der Director-Dienst hängt, wenn er auf die Datenbank zugreift, sollten Sie sich die MySQL Datenbanküberprüfungs- und Reparaturprogramme ansehen. Welches Programm Sie laufen lassen sollten, hängt mit der von Ihnen benutzten Datenbank- Indizierung zusammen. Wenn Sie das Standardverfahren nutzen, werden Sie vermutlich myisamchk laufen lassen. Fär nähere Information lesen Sie bitte auch: http://dev.mysql.com/doc/refman/5.1/de/client-utility-programs.html.
Falls die auftretenden Fehler einfache SQL-Warnungen sind, sollten Sie zuerst das von Bacula mitgelieferte dbcheck-Programm ausführen, bevor Sie die MySQL-Datenbank-Reparaturprogramme nutzen. Dieses Programm kann verwaiste Datenbankeinträge finden und andere Inkonsistenzen in der Katalog-Datenbank beheben.
Eine typische Ursache von Datenbankproblemen ist das Volllaufen einer Partition. In solch einem Fall muss entweder zusätzlicher Platz geschaffen werden, oder belegter Platz freigegeben werden, bevor die Datenbank mit myisamchk repariert werden kann.
Hier ist ein Beispiel, wie man eine korrupte Datenbank reparieren könnte, falls nach dem Vollaufen einer Partition die Datenbankprobleme mit myisamchk -r nicht behoben werden können:
kopieren Sie folgende Zeilen in ein Shell-Script names repair:
#!/bin/sh for i in *.MYD ; do mv $i x${i} t=`echo $i | cut -f 1 -d '.' -` mysql bacula <<END_OF_DATA set autocommit=1; truncate table $t; quit END_OF_DATA cp x${i} ${i} chown mysql:mysql ${i} myisamchk -r ${t} done
dieses Shell-Script, wird dann wie folgt aufgerufen:
cd /var/lib/mysql/bacula ./repair
nachdem sichergestellt ist, dass die Datenbank wieder korrekt funktioniert, kann man die alten Datenbank-Dateien löschen:
cd /var/lib/mysql/bacula rm -f x*.MYD
Falls ein Fehler wie The table 'File' is full ... auftritt, passiert das vermutlich, weil bei den MySQL-Versionen 4.x die Tabellengröße standardmäßig auf 4 GB begrenzt ist und Sie dieses Limit erreicht haben. Hinweise zu der maximal möglichen Tabellengröße gibt es auf http://dev.mysql.com/doc/refman/5.1/de/table-size.html
Sie können sich die maximale Tabellengröße mit:
mysql bacula SHOW TABLE STATUS FROM bacula like "File";
anzeigen lassen. Wenn die Spalte max_data_length ca. 4 GB entspricht, dann ist dass das Problem. Sie können die maximale Größe aber mit:
mysql bacula ALTER TABLE File MAX_ROWS=281474976710656;
anpassen. Alternativ können Sie auch die /etc/my.cnf editieren, bevor Sie die Bacula-Tabellen erstellen, setzen Sie im Abschnitt [mysqld]:
set-variable = myisam_data_pointer_size=6
Die myisam Data-Pointer-Größe muss vor dem Anlegen der Bacula-Katalog-Datenbank oder ihrer Tabellen gesetzt werden, um wirksam zu sein.
Die MAX_ROWS und Pointer-Anpassungen sollten bei MySQL-Versionen größer 5.x nicht nötig sein, somit sind diese Änderungen nur bei MySQL 4.x, in Abhägigkeit von Ihrer Katalog-Datenbank-Größe, notwendig.
http://dev.mysql.com/doc/refman/5.1/de/gone-away.html
Dieselben Überlegungen, wie oben für MySQL angeführt, sind auch hier gültig. Lesen Sie die PostgreSQL-Dokumentation um zu erfahren, wie Sie Ihre Datenbank reparieren können. Erwägen Sie auch den Einsatz von Bacula's dbcheck-Programm, falls es sinnvoll erscheint (siehe oben).
Es gibt viele Wege, die verschiedenen von Bacula unterstützten Datenbanken abzustimmen, um ihre Leistung zu erhöhen. Zwischen einer schlecht und gut abgestimmten Datenbank kann ein Geschwindigkeitsunterschied von 100 und mehr liegen, wenn es darum geht Datenbankeinträge zu schreiben oder zu suchen.
Bei jeder der Datenbanken, können Sie erhebliche Verbesserungen erwarten, wenn Sie weitere Indexe hinzufügen. Die Kommentare in den Bacula make_xxx_tables-Scripts (z.B. make_postgres_tables) geben einige Hinweise, wofür Indexe geeignet sind. Sehen Sie bitte auch unten für genaue Informationen, wie Sie Ihre Indexe überprüfen können.
Für MySQL ist es sehr wichtig, die my.cnf-Datei durchzusehen (gwöhnlich /etc/my.cnf) Eventuell können Sie die Leistung wesentlich erhöhen, wenn Sie die Konfigurationsdateien my-large.cnf oder my-huge.cnf aus dem MySQL-Quellcode verwenden.
Für SQLite3 ist ein wichtiger Punkt, dass in der Konfiguration die Angabe "PRAGMA synchronous = NORMAL;" vorhanden ist. Dadurch werden die Zeitabstände vergrößert, in denen die Datenbank ihren RAM-Zwischenspeicher auf die Festplatte schreibt. Es gibt noch andere Einstellungen für PRAGMA die die Effizienz steigern können, aber auch das Risiko einer Datenbankbeschädigung beim Absturz des Systems erhöhen.
Bei PostgreSQL sollten Sie eventuell in Betracht ziehen "fsync'' abzuschalten, aber auch das kann bei Systemabstürzen zu Datenbankprobleme führen. Es gibt viele Wege die Leistungsfähigkeit von PostgreSQL zu steigern, diese Internetseiten erklären ein paar von ihnen (auf englisch): http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html.
Auch in den PostgreSQL FAQ's finden sich Hinweise die Performanz zu verbessern: http://www.postgresql.org/docs/faqs.FAQ.html.
Bei PostgreSQL sollten Sie auch auf die "effective_cache_size" achten. Für ein System mit 2GB Arbeitsspeicher können Sie sie auf 131072 setzen, aber setzen Sie sie nicht zu hoch. Zusätzlich sind "work_mem = 256000" und "maintenance_work_mem = 256000", für Systeme mit 2GB Speicher, sinnvolle Werte. Stellen Sie sicher das "checkpoint_segments" auf mindestens 8 gesetzt ist.
Die wichtigsten Indexe für eine schnelle Datenbank sind die drei Indexe auf der File-Tabelle. Der erste Index ist auf der FileId und wird automatisch anglegt, da es der eindeutige Schlüssel ist, um auf die Tabelle zuzugreifen. Die anderen beiden sind der JobId-Index und der Filename, PathId-Index. Wenn einer dieser Indexe fehlt, verliert Ihre Datenbank enorm an Performance.
psql bacula select * from pg_indexes where tablename='file';
Wenn Sie keine Ausgaben sehen die anzeigen das alle drei Indexe vorhanden sind, können Sie die beiden zusätzlichen Indexe mit:
psql bacula CREATE INDEX file_jobid_idx on file (jobid); CREATE INDEX file_fp_idx on file (filenameid, pathid);
anlegen.
mysql bacula show index from File;
überprüfen. Wenn Indexe fehlen, besonders der JobId-Index, kann er mit:
mysql bacula CREATE INDEX file_jobid_idx on File (JobId); CREATE INDEX file_jpf_idx on File (Job, FilenameId, PathId);
erzeugt werden.
Obgleich das normalerweise kein Problem darstellt, sollten Sie sicherstellen, dass Ihre Indexe für Filename und PathId beide auf 255 Zeichen gesetzt sind. Einige Benutzer berichten von Problemen mit Indexen die auf 50 Zeichen gesetzt sind. Um das zu kontrollieren, führen Sie:
mysql bacula show index from Filename; show index from Path;
aus. Fü die Dateinamen ist es wichtig, dass Sie einen Index haben mit dem Key_name "Name" und dem Sub_part "255". Fü den Pfad müssen Sie einen Index mit dem Key_name "Path" und dem Sub_part "255" haben. Wenn einer der Indexe nicht existiert oder der Sub_part kleiner 255 ist, können Sie den Index neu anlegen indem Sie die folgende Kommandos benutzen:
mysql bacula DROP INDEX Path on Path; CREATE INDEX Path on Path (Path(255); DROP INDEX Name on Filename; CREATE INDEX Name on Filename (Name(255));
sqlite <path>bacula.db select * from sqlite_master where type='index' and tbl_name='File';
Falls ein Index fehlt, im besonderen der JobId-Index, können Sie ihn mit den folgenden Befehlen erstellen:
mysql bacula CREATE INDEX file_jobid_idx on File (JobId); CREATE INDEX file_jfp_idx on File (Job, FilenameId, PathId);
Über die Zeit, wie schon oben angemerkt, wird Ihre Datenbank wachsen. Auch wenn Bacula regelmäßig alte Daten löscht, wird das PostgreSQL Kommando VACUUM Ihnen helfen die Datenbank zu komprimieren. Alternativ wollen Sie eventuell das vacuumdb-Kommando nutzen, das vom cron-Dienst gestartet werden kann.
Alle Datenbanken haben Hilfsmittel, um die Daten in eine ASCII-Datei zu schreiben um sie dann erneut einzulesen. Wenn Sie das tun, wird die Datenbank komplett neu aufgebaut und so eine kompaktere Version entstehen. Wie Sie so etwas tun können, zeigt Ihnen das folgende PostgreSQL Beispiel.
Bei einer PostgreSQL-Datenbank lassen Sie die Daten in eine ASCII-Datei schreiben und neu einlesen, wenn Sie diese Kommandos ausführen:
pg_dump -c bacula > bacula.sql cat bacula.sql | psql bacula rm -f bacula.sql
Abhägig von Ihrer Datenabnkgröße wird dieser Vorgang mehr oder weniger Zeit und Festplattenplatz in Anspruch nehmen. Sie sollten vorher in das Arbeitsverzeichnis Ihrer Datenbank wechseln (typischerweise /var/lib/postgres/data) und die Größe überprüfen.
Bestimmte PostgreSQL-Nutzer empfehlen nicht die oben genannte Prozedur, sie sind der Meinung: bei PostgreSQL ist es nicht notwendig, die Daten zu exportieren um sie dann wieder einzulesen. Das normale Ausführen des VACUUM-Kommandos reicht, um die Datenbank performant zu halten. Wenn Sie es ganz genau machen wollen, benutzen Sie speziellen Kommandos VACUUM FULL, REINDEX und CLUSTER um sich den Umweg über das exportieren und wiedereinlesen der Daten zu ersparen.
Zum Schluß wollen Sie vielleicht noch einen Blick auf die zugehörige PostgreSQL-Dokumentation werfen, Sie finden sie (auf englisch) unter: http://www.postgresql.org/docs/8.2/interactive/maintenance.html.
Lesen Sie bitte zuerst die vorherigen Abschnitte die erklären, warum es erforderlich ist, eine Datenbank zu komprimieren. SQLite-Versionen größer 2.8.4 haben das Vacuum-Kommando um die Datenbank zu komprimieren:
cd {\bf working-directory} echo 'vacuum;' | sqlite bacula.db
Als Alternative können Sie auch die folgenden Kommandos (auf Ihr System angepasst) benutzen:
cd {\bf working-directory} echo '.dump' | sqlite bacula.db > bacula.sql rm -f bacula.db sqlite bacula.db < bacula.sql rm -f bacula.sql
Wobei working-directory das Verzeichnis ist, dass Sie in Ihrer Director-Dienst-Konfiguration angegeben haben. Beachten Sie bitte, dass es im Fall von SQLite erforderlich ist, die alte Datenbank komplett zu löschen, bevor die komprimierte Version angelegt werden kann.
Wenn Sie Bacula anfangs mit SQLite zusammen benutzt haben, gibt es später eine Reihe von Gründen, weshalb Sie eventuell auf MySQL umsteigen wollen: SQLite belegt mehr Festplattenplatz für dieselbe Datenmenge als MySQL; falls die Datenbank beschädigt wird, ist es mit SQLite problematischer als bei MySQL oder PostgreSQL, sie wiederherzustellen. Viele Benutzer sind erfolgreich von SQLite auf MySQL umgestiegen, indem sie zuerst die Daten exportiert haben und sie dann mit einem z.B. Perl-Script in ein passendes Format konvertiert haben, um sie in die MySQL-Datenbank zu importieren. Dies ist aber kein sehr einfacher Vorgang.
Falls jemals der Rechner auf dem Ihre Bacula-Installation läuft abstürzt, und Sie diesen wiederherstellen müssen, wird es einer der ersten Schritte sein, die Datenbank zurückzusichern. Obwohl Bacula fröhlich die Datenbank sichert, wenn sie im FileSet angegeben ist, ist das kein sehr guter Weg, da Bacula die Datenbank ändert, während sie gesichert wird. Dadurch ist die gesicherte Datenbank wahrscheinlich in einem inkonsistenten Zustand. Noch schlimmer ist, dass die Datenbank gesichert wird, bevor Bacula alle Aktualisierungen durchführen kann.
Um diese Problem zu umgehen, müssen Sie die Datenbank sichern nachdem alle Backup-Jobs gelaufen sind. Zusätzlich werden Sie wohl eine Kopie der Datenbank erstellen wollen, während Bacula keine Aktualisierungen vornimmt. Um das zu erreichen, können Sie die beiden Scripte make_catalog_backup und delete_catalog_backup benutzen, die Ihrer Bacula-Version beiliegen. Diese Dateien werden, zusammen mit den anderen Bacula-Scripts, automatisch erzeugt. Das erste Script erzeugt eine ASCII-Kopie Ihrer Datenbank namens bacula.sql in dem Arbeitsverzeichnis, dass Sie in der Konfiguration angegeben haben. Das zweite Script löscht die Datei bacula.sql wieder.
Die grundlegenden Arbeitsschritte damit alles korrekt funktioniert, sind folgende:
Angenommen Sie starten alle Ihre Backup-Jobs nachts um 01:05, können Sie das Catalog-Backup mit der folgenden zusätzlichen Director-Dienst-Konfiguration ausführen lassen:
# Catalog-Datenbank-Backup (nach der n\"{a}chtlichen Sicherung) Job { Name = "BackupCatalog" Type = Backup Client=rufus-fd FileSet="Catalog" Schedule = "WeeklyCycleAfterBackup" Storage = DLTDrive Messages = Standard Pool = Default # Achtung!!! Das Passwort auf der Kommandozeile zu \"{u}bergeben ist nicht sicher. # Lesen Sie bitte die Kommentare in der Datei make_catalog_backup. RunBeforeJob = "/home/kern/bacula/bin/make_catalog_backup" RunAfterJob = "/home/kern/bacula/bin/delete_catalog_backup" Write Bootstrap = "/home/kern/bacula/working/BackupCatalog.bsr" } # Diese Schedule starten das Catalog-Backup nach den anderen Sicherungen Schedule { Name = "WeeklyCycleAfterBackup Run = Level=Full sun-sat at 1:10 } # Das FileSet f\"{u}r die ASCII-Kopie der Datenbank FileSet { Name = "Catalog" Include { Options { signature=MD5 } File = \lt{}working_directory\gt{}/bacula.sql } }
Stellen Sie sicher, dass, wie in dem Beispiel, eine Bootstrap-Datei geschrieben wird. Bevorzugterweise wird eine Kopie dieser Bootstrap-Datei auf einem andern Computer gespeichert. Dies erlaubt eine schnelle Wiederherstellung der Datenbank, falls erforderlich. Wenn Sie keine Bootstrap-Datei haben, ist es trotzdem möglich, erfordert aber mehr Arbeit und dauert länger.
Das Script make_catalog_backup wird als Beispiel bereitgestellt, wie Sie Ihre Bacula Datenbank sichern können. Wir erwarten das Sie, entsprechend Ihrer Situation, Vorsichtsmaßnahmen treffen. make_catalog_backup ist so ausgelegt, dass das Passwort auf der Kommandozeile übergeben wird. Das ist in Ordnung, solange sich nur vertrauenswürdige Benutzer am System anmelden können, ansonsten ist es inakzeptabel. Die meisten Datenbanksysteme bieten eine alternative Methode an, um das Passwort nicht auf der Kommandozeile übergeben zu müssen.
Das Script make_catalog_backup enthält einige Warnungen dies betreffend. Bitte lesen Sie die Kommentare im Script.
Bei PostgreSQL können Sie z.B. eine Passwort-Datei verwenden, siehe .pgpass, und MySQL hat die .my.cnf.
Wir hoffen, dass wir Ihnen damit etwas helfen konnten, aber nur Sie könenn beurteilen, was in Ihrer Situation erforderlich ist.
Wie oben schon erwähnt wurde, führt das Sichern von Datenbank-Dateien im laufenden Betrieb dazu, dass die gesicherten Dateien sich wahrscheinlich in einem inkonsistenten Zustand befinden.
Die beste Lösung dafür ist, die Datenbank vor der Sicherung zu stoppen, oder datenbankspezifische Hilfsprogramme zu verwenden, um eine gültige Sicherungsdatei zu erstellen, die Bacula dann auf die Volumes schreiben kann. Wenn Sie unsicher sind, wie Sie das am besten mit der von Ihnen benutzten Datenbank erreichen können, hilft Ihnen eventuell die Webseite von Backup Central weiter. Auf Free Backup and Recovery Software finden Sie Links zu Scripts die zeigen, wie man die meisten größeren Datenbanken sichern kann.
Wenn Sie nicht automatisch alte Datensätze aus Ihrer Katalog-Datenbank löschen lassen, wird Ihre Datenbank mit jedem gelaufenen Backup-Job wachsen (siehe auch weiter oben). Normalerweise sollten Sie sich entscheiden, wie lange Sie die Datei-Einträge im Katalog aufbewaren wollen und die File Retention entsprechend konfigurieren. Dann können Sie entweder abwarten wie groß Ihre Katalog-Datenbank werden wird, oder es aber auch ungeähr berechnen. Dazu müssen Sie wissen, dass für jede gesicherte Datei in etwa 154 Bytes in der Katalog-Datenbank belegt werden und wieviele Dateien Sie auf wievielen Computern sichern werden.
Ein Beispiel: angenommen Sie sichern zwei Computer, jeder mit 100.000 Dateien. Weiterhin angenommen, Sie machen ein wöchentliches Full-Backup und ein inkrementelles jeden Tag, wobei bei einem inkrementellen Backup typischerweise 4.000 Dateien gesichert werden. Die ungefähre Größe Ihrer Datenbank nach einem Monat kann dann so berechnet werden:
Gr\"{o}{\ss}e = 154 * Anzahl Computer * (100.000 * 4 + 10.000 * 26)
wenn ein Monat mit 4 Wochen angenommen wird, werden also 26 inkrementelle Backups im Monat laufen. Das ergibt das folgende:
Gr\"{o}{\ss}e = 154 * 2 * (100.000 * 4 + 10.000 * 26) or Gr\"{o}{\ss}e = 308 * (400.000 + 260.000) or Gr\"{o}{\ss}e = 203.280.000 Bytes
für die beiden oben angenommen Computer können wir also davon ausgehen, dass die Datenbank in etwa 200 Megabytes groß wird. Natürlich hängt dieser Wert davon ab, wieviele Dateien wirklich gesichert werden.
Unten sehen Sie ein paar Statistiken für eine MySQL-Datenbank die Job-Einträge für 5 Clients über 8.5 Monate und Datei-Einträge über 80 Tage enthält (ältere Datei-Einträge wurden schon gelöscht). Bei diesen 5 Clients wurden nur die Benutzer- und System-Dateien gesichert, die sich ständig ändern. Bei allen anderen Dateien wird angenommen, dass sie leicht aus den Software-Paketen des Betriebssystems wiederherstellbar sind.
In der Liste sind die Dateien (die den MySQL-Tabellen entsprechen) mit der Endung .MYD die, die die eigentlichen Daten enthalten und die mit der Endung .MYI enthalten die Indexe.
Sie werden bemerken, dass die meisten Einträge in der Datei File.MYD (die die Datei-Attribute enthält) enthalten sind und diese auch den meisten Platz auf der Festplatte belegt. Die File Retention (der Aufbewahrungszeitraum für Dateien) ist also im wesentlichen dafür verantwortlich, wie groß die Datenbank wird. Eine kurze Berechnung zeigt, dass die Datenbank mit jeder gesicherten Datei ungefähr um 154 Bytes wächst.
Gr\"{o}{\ss}e in Bytes Eintr\"{a}ge Dateiname ============ ========= =========== 168 5 Client.MYD 3,072 Client.MYI 344,394,684 3,080,191 File.MYD 115,280,896 File.MYI 2,590,316 106,902 Filename.MYD 3,026,944 Filename.MYI 184 4 FileSet.MYD 2,048 FileSet.MYI 49,062 1,326 JobMedia.MYD 30,720 JobMedia.MYI 141,752 1,378 Job.MYD 13,312 Job.MYI 1,004 11 Media.MYD 3,072 Media.MYI 1,299,512 22,233 Path.MYD 581,632 Path.MYI 36 1 Pool.MYD 3,072 Pool.MYI 5 1 Version.MYD 1,024 Version.MYI
Die Datenbank hat eine Größe von ca. 450 Megabytes..
Hätten wir SQLite genommen, wäre die Bestimmung der Datenbankgröße viel einfacher gewesen, da SQLite alle Daten in einer einzigen Datei speichert, dann aber hätten wir nicht so einfach erkennen können, welche der Tabellen den meisten Speicherplatz benötigt.
SQLite Datenbanken können bis zu 50 % größer sein als MySQL-Datenbanken (bei gleichem Datenbestand), weil bei SQLite alle Daten als ASCII-Zeichenketten gespeichert werden. Sogar binäre Daten werden als ASCII-Zeichenkette dargestellt, und das scheint den Speicherverbrauch zu erhöhen.
By default, once Bacula starts writing a Volume, it can append to the volume, but it will not overwrite the existing data thus destroying it. However when Bacula recycles a Volume, the Volume becomes available for being reused, and Bacula can at some later time over write the previous contents of that Volume. Thus all previous data will be lost. If the Volume is a tape, the tape will be rewritten from the beginning. If the Volume is a disk file, the file will be truncated before being rewritten.
You may not want Bacula to automatically recycle (reuse) tapes. This would require a large number of tapes though, and in such a case, it is possible to manually recycle tapes. For more on manual recycling, see the section entitled Manually Recycling Volumes below in this chapter.
Most people prefer to have a Pool of tapes that are used for daily backups and recycled once a week, another Pool of tapes that are used for Full backups once a week and recycled monthly, and finally a Pool of tapes that are used once a month and recycled after a year or two. With a scheme like this, the number of tapes in your pool or pools remains constant.
By properly defining your Volume Pools with appropriate Retention periods, Bacula can manage the recycling (such as defined above) automatically.
Automatic recycling of Volumes is controlled by three records in the Pool resource definition in the Director's configuration file. These three records are:
Automatic recycling of Volumes is performed by Bacula only when it wants a new Volume and no appendable Volumes are available in the Pool. It will then search the Pool for any Volumes with the Recycle flag set and whose Volume Status is Full. At that point, the recycling occurs in two steps. The first is that the Catalog for a Volume must be purged of all Jobs and Files contained on that Volume, and the second step is the actual recycling of the Volume. The Volume will be purged if the VolumeRetention period has expired. When a Volume is marked as Purged, it means that no Catalog records reference that Volume, and the Volume can be recycled. Until recycling actually occurs, the Volume data remains intact. If no Volumes can be found for recycling for any of the reasons stated above, Bacula will request operator intervention (i.e. it will ask you to label a new volume).
A key point mentioned above, that can be a source of frustration, is that Bacula will only recycle purged Volumes if there is no other appendable Volume available, otherwise, it will always write to an appendable Volume before recycling even if there are Volume marked as Purged. This preserves your data as long as possible. So, if you wish to "force" Bacula to use a purged Volume, you must first ensure that no other Volume in the Pool is marked Append. If necessary, you can manually set a volume to Full. The reason for this is that Bacula wants to preserve the data on your old tapes (even though purged from the catalog) as long as absolutely possible before overwriting it.
As Bacula writes files to tape, it keeps a list of files, jobs, and volumes in a database called the catalog. Among other things, the database helps Bacula to decide which files to back up in an incremental or differential backup, and helps you locate files on past backups when you want to restore something. However, the catalog will grow larger and larger as time goes on, and eventually it can become unacceptably large.
Bacula's process for removing entries from the catalog is called Pruning. The default is Automatic Pruning, which means that once an entry reaches a certain age (e.g. 30 days old) it is removed from the catalog. Once a job has been pruned, you can still restore it from the backup tape, but one additional step is required: scanning the volume with bscan. The alternative to Automatic Pruning is Manual Pruning, in which you explicitly tell Bacula to erase the catalog entries for a volume. You'd usually do this when you want to reuse a Bacula volume, because there's no point in keeping a list of files that USED TO BE on a tape. Or, if the catalog is starting to get too big, you could prune the oldest jobs to save space. Manual pruning is done with the prune command in the console. (thanks to Bryce Denney for the above explanation).
There are three pruning durations. All apply to catalog database records and not to the actual data in a Volume. The pruning (or retention) durations are for: Volumes (Media records), Jobs (Job records), and Files (File records). The durations inter-depend a bit because if Bacula prunes a Volume, it automatically removes all the Job records, and all the File records. Also when a Job record is pruned, all the File records for that Job are also pruned (deleted) from the catalog.
Having the File records in the database means that you can examine all the files backed up for a particular Job. They take the most space in the catalog (probably 90-95% of the total). When the File records are pruned, the Job records can remain, and you can still examine what Jobs ran, but not the details of the Files backed up. In addition, without the File records, you cannot use the Console restore command to restore the files.
When a Job record is pruned, the Volume (Media record) for that Job can still remain in the database, and if you do a "list volumes", you will see the volume information, but the Job records (and its File records) will no longer be available.
In each case, pruning removes information about where older files are, but it also prevents the catalog from growing to be too large. You choose the retention periods in function of how many files you are backing up and the time periods you want to keep those records online, and the size of the database. You can always re-insert the records (with 98% of the original data) by using "bscan" to scan in a whole Volume or any part of the volume that you want.
By setting AutoPrune to yes you will permit Bacula to automatically prune all Volumes in the Pool when a Job needs another Volume. Volume pruning means removing records from the catalog. It does not shrink the size of the Volume or affect the Volume data until the Volume gets overwritten. When a Job requests another volume and there are no Volumes with Volume Status Append available, Bacula will begin volume pruning. This means that all Jobs that are older than the VolumeRetention period will be pruned from every Volume that has Volume Status Full or Used and has Recycle set to yes. Pruning consists of deleting the corresponding Job, File, and JobMedia records from the catalog database. No change to the physical data on the Volume occurs during the pruning process. When all files are pruned from a Volume (i.e. no records in the catalog), the Volume will be marked as Purged implying that no Jobs remain on the volume. The Pool records that control the pruning are described below.
When this time period expires, and if AutoPrune is set to yes, and a new Volume is needed, but no appendable Volume is available, Bacula will prune (remove) Job records that are older than the specified Volume Retention period.
The Volume Retention period takes precedence over any Job Retention period you have specified in the Client resource. It should also be noted, that the Volume Retention period is obtained by reading the Catalog Database Media record rather than the Pool resource record. This means that if you change the VolumeRetention in the Pool resource record, you must ensure that the corresponding change is made in the catalog by using the update pool command. Doing so will insure that any new Volumes will be created with the changed Volume Retention period. Any existing Volumes will have their own copy of the Volume Retention period that can only be changed on a Volume by Volume basis using the update volume command.
When all file catalog entries are removed from the volume, its VolStatus is set to Purged. The files remain physically on the Volume until the volume is overwritten.
Retention periods are specified in seconds, minutes, hours, days, weeks, months, quarters, or years on the record. See the Configuration chapter of this manual for additional details of time specification.
The default is 1 year.
It is also possible to "force" pruning of all Volumes in the Pool associated with a Job by adding Prune Files = yes to the Job resource.
After all Volumes of a Pool have been pruned (as mentioned above, this happens when a Job needs a new Volume and no appendable Volumes are available), Bacula will look for the oldest Volume that is Purged (all Jobs and Files expired), and if the Recycle flag is on (Recycle=yes) for that Volume, Bacula will relabel it and write new data on it.
The full algorithm that Bacula uses when it needs a new Volume is:
The above occurs when Bacula has finished writing a Volume or when no Volume is present in the drive.
On the other hand, if you have inserted a different Volume after the last job, and Bacula recognizes the Volume as valid, it will request authorization from the Director to use this Volume. In this case, if you have set Recycle Current Volume = yes and the Volume is marked as Used or Full, Bacula will prune the volume and if all jobs were removed during the pruning (respecting the retention periods), the Volume will be recycled and used. The recycling algorithm in this case is:
This permits users to manually change the Volume every day and load tapes in an order different from what is in the catalog, and if the volume does not contain a current copy of your backup data, it will be used.
Each Volume inherits the Recycle status (yes or no) from the Pool resource record when the Media record is created (normally when the Volume is labeled). This Recycle status is stored in the Media record of the Catalog. Using the Console program, you may subsequently change the Recycle status for each Volume. For example in the following output from list volumes:
+----------+-------+--------+---------+------------+--------+-----+ | VolumeNa | Media | VolSta | VolByte | LastWritte | VolRet | Rec | +----------+-------+--------+---------+------------+--------+-----+ | File0001 | File | Full | 4190055 | 2002-05-25 | 14400 | 1 | | File0002 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0003 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0004 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0005 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0006 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0007 | File | Purged | 1896466 | 2002-05-26 | 14400 | 1 | +----------+-------+--------+---------+------------+--------+-----+
all the volumes are marked as recyclable, and the last Volume, File0007 has been purged, so it may be immediately recycled. The other volumes are all marked recyclable and when their Volume Retention period (14400 seconds or 4 hours) expires, they will be eligible for pruning, and possibly recycling. Even though Volume File0007 has been purged, all the data on the Volume is still recoverable. A purged Volume simply means that there are no entries in the Catalog. Even if the Volume Status is changed to Recycle, the data on the Volume will be recoverable. The data is lost only when the Volume is re-labeled and re-written.
To modify Volume File0001 so that it cannot be recycled, you use the update volume pool=File command in the console program, or simply update and Bacula will prompt you for the information.
+----------+------+-------+---------+-------------+-------+-----+ | VolumeNa | Media| VolSta| VolByte | LastWritten | VolRet| Rec | +----------+------+-------+---------+-------------+-------+-----+ | File0001 | File | Full | 4190055 | 2002-05-25 | 14400 | 0 | | File0002 | File | Full | 1897236 | 2002-05-26 | 14400 | 1 | | File0003 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0004 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0005 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0006 | File | Full | 1896460 | 2002-05-26 | 14400 | 1 | | File0007 | File | Purged| 1896466 | 2002-05-26 | 14400 | 1 | +----------+------+-------+---------+-------------+-------+-----+
In this case, File0001 will never be automatically recycled. The same effect can be achieved by setting the Volume Status to Read-Only.
Most people will want Bacula to fill a tape and when it is full, a new tape will be mounted, and so on. However, as an extreme example, it is possible for Bacula to write on a single tape, and every night to rewrite it. To get this to work, you must do two things: first, set the VolumeRetention to less than your save period (one day), and the second item is to make Bacula mark the tape as full after using it once. This is done using UseVolumeOnce = yes. If this latter record is not used and the tape is not full after the first time it is written, Bacula will simply append to the tape and eventually request another volume. Using the tape only once, forces the tape to be marked Full after each use, and the next time Bacula runs, it will recycle the tape.
An example Pool resource that does this is:
Pool { Name = DDS-4 Use Volume Once = yes Pool Type = Backup AutoPrune = yes VolumeRetention = 12h # expire after 12 hours Recycle = yes }
This example is meant to show you how one could define a fixed set of volumes that Bacula will rotate through on a regular schedule. There are an infinite number of such schemes, all of which have various advantages and disadvantages.
We start with the following assumptions:
We start the system by doing a Full save to one of the weekly volumes or one of the monthly volumes. The next morning, we remove the tape and insert a Daily tape. Friday evening, we remove the Daily tape and insert the next tape in the Weekly series. Monday, we remove the Weekly tape and re-insert the Daily tape. On the first Friday of the next month, we insert the next Monthly tape in the series rather than a Weekly tape, then continue. When a Daily tape finally fills up, Bacula will request the next one in the series, and the next day when you notice the email message, you will mount it and Bacula will finish the unfinished incremental backup.
What does this give? Well, at any point, you will have the last complete Full save plus several Incremental saves. For any given file you want to recover (or your whole system), you will have a copy of that file every day for at least the last 14 days. For older versions, you will have at least 3 and probably 4 Friday full saves of that file, and going back further, you will have a copy of that file made on the beginning of the month for at least a year.
So you have copies of any file (or your whole system) for at least a year, but as you go back in time, the time between copies increases from daily to weekly to monthly.
What would the Bacula configuration look like to implement such a scheme?
Schedule { Name = "NightlySave" Run = Level=Full Pool=Monthly 1st sat at 03:05 Run = Level=Full Pool=Weekly 2nd-5th sat at 03:05 Run = Level=Incremental Pool=Daily tue-fri at 03:05 } Job { Name = "NightlySave" Type = Backup Level = Full Client = LocalMachine FileSet = "File Set" Messages = Standard Storage = DDS-4 Pool = Daily Schedule = "NightlySave" } # Definition of file storage device Storage { Name = DDS-4 Address = localhost SDPort = 9103 Password = XXXXXXXXXXXXX Device = FileStorage Media Type = 8mm } FileSet { Name = "File Set" Include = signature=MD5 { fffffffffffffffff } Exclude = { *.o } } Pool { Name = Daily Pool Type = Backup AutoPrune = yes VolumeRetention = 10d # recycle in 10 days Maximum Volumes = 10 Recycle = yes } Pool { Name = Weekly Use Volume Once = yes Pool Type = Backup AutoPrune = yes VolumeRetention = 30d # recycle in 30 days (default) Recycle = yes } Pool { Name = Monthly Use Volume Once = yes Pool Type = Backup AutoPrune = yes VolumeRetention = 365d # recycle in 1 year Recycle = yes }
Perhaps the best way to understand the various resource records that come into play during automatic pruning and recycling is to run a Job that goes through the whole cycle. If you add the following resources to your Director's configuration file:
Schedule { Name = "30 minute cycle" Run = Level=Full Pool=File Messages=Standard Storage=File hourly at 0:05 Run = Level=Full Pool=File Messages=Standard Storage=File hourly at 0:35 } Job { Name = "Filetest" Type = Backup Level = Full Client=XXXXXXXXXX FileSet="Test Files" Messages = Standard Storage = File Pool = File Schedule = "30 minute cycle" } # Definition of file storage device Storage { Name = File Address = XXXXXXXXXXX SDPort = 9103 Password = XXXXXXXXXXXXX Device = FileStorage Media Type = File } FileSet { Name = "Test Files" Include = signature=MD5 { fffffffffffffffff } Exclude = { *.o } } Pool { Name = File Use Volume Once = yes Pool Type = Backup LabelFormat = "File" AutoPrune = yes VolumeRetention = 4h Maximum Volumes = 12 Recycle = yes }
Where you will need to replace the ffffffffff's by the appropriate files to be saved for your configuration. For the FileSet Include, choose a directory that has one or two megabytes maximum since there will probably be approximately 8 copies of the directory that Bacula will cycle through.
In addition, you will need to add the following to your Storage daemon's configuration file:
Device { Name = FileStorage Media Type = File Archive Device = /tmp LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; }
With the above resources, Bacula will start a Job every half hour that saves a copy of the directory you chose to /tmp/File0001 ... /tmp/File0012. After 4 hours, Bacula will start recycling the backup Volumes (/tmp/File0001 ...). You should see this happening in the output produced. Bacula will automatically create the Volumes (Files) the first time it uses them.
To turn it off, either delete all the resources you've added, or simply comment out the Schedule record in the Job resource.
Although automatic recycling of Volumes is implemented in version 1.20 and later (see the Automatic Recycling of Volumes chapter of this manual), you may want to manually force reuse (recycling) of a Volume.
Assuming that you want to keep the Volume name, but you simply want to write new data on the tape, the steps to take are:
Once the Volume is marked Purged, it will be recycled the next time a Volume is needed.
If you wish to reuse the tape by giving it a new name, follow the following steps:
Please note that the relabel command applies only to tape Volumes.
For Bacula versions prior to 1.30 or to manually relabel the Volume, use the instructions below:
mt -f /dev/nst0 rewind mt -f /dev/nst0 weof
where you replace /dev/nst0 with the appropriate device name on your system.
Please be aware that the delete command can be dangerous. Once it is done, to recover the File records, you must either restore your database as it was before the delete command, or use the bscan utility program to scan the tape and recreate the database entries.
This chapter presents most all the features needed to do Volume management. Most of the concepts apply equally well to both tape and disk Volumes. However, the chapter was originally written to explain backing up to disk, so you will see it is slanted in that direction, but all the directives presented here apply equally well whether your volume is disk or tape.
If you have a lot of hard disk storage or you absolutely must have your backups run within a small time window, you may want to direct Bacula to backup to disk Volumes rather than tape Volumes. This chapter is intended to give you some of the options that are available to you so that you can manage either disk or tape volumes.
Getting Bacula to write to disk rather than tape in the simplest case is rather easy. In the Storage daemon's configuration file, you simply define an Archive Device to be a directory. For example, if you want your disk backups to go into the directory /home/bacula/backups, you could use the following:
Device { Name = FileBackup Media Type = File Archive Device = /home/bacula/backups Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; }
Assuming you have the appropriate Storage resource in your Director's configuration file that references the above Device resource,
Storage { Name = FileStorage Address = ... Password = ... Device = FileBackup Media Type = File }
Bacula will then write the archive to the file /home/bacula/backups/<volume-name> where <volume-name> is the volume name of a Volume defined in the Pool. For example, if you have labeled a Volume named Vol001, Bacula will write to the file /home/bacula/backups/Vol001. Although you can later move the archive file to another directory, you should not rename it or it will become unreadable by Bacula. This is because each archive has the filename as part of the internal label, and the internal label must agree with the system filename before Bacula will use it.
Although this is quite simple, there are a number of problems. The first is that unless you specify otherwise, Bacula will always write to the same volume until you run out of disk space. This problem is addressed below.
In addition, if you want to use concurrent jobs that write to several different volumes at the same time, you will need to understand a number of other details. An example of such a configuration is given at the end of this chapter under Concurrent Disk Jobs.
Some of the options you have, all of which are specified in the Pool record, are:
UseVolumeOnce = yes.
Maximum Volume Jobs = nnn.
Maximum Volume Bytes = mmmm.
Volume Use Duration = ttt.
Note that although you probably would not want to limit the number of bytes on a tape as you would on a disk Volume, the other options can be very useful in limiting the time Bacula will use a particular Volume (be it tape or disk). For example, the above directives can allow you to ensure that you rotate through a set of daily Volumes if you wish.
As mentioned above, each of those directives is specified in the Pool or Pools that you use for your Volumes. In the case of Maximum Volume Job, Maximum Volume Bytes, and Volume Use Duration, you can actually specify the desired value on a Volume by Volume basis. The value specified in the Pool record becomes the default when labeling new Volumes. Once a Volume has been created, it gets its own copy of the Pool defaults, and subsequently changing the Pool will have no effect on existing Volumes. You can either manually change the Volume values, or refresh them from the Pool defaults using the update volume command in the Console. As an example of the use of one of the above, suppose your Pool resource contains:
Pool { Name = File Pool Type = Backup Volume Use Duration = 23h }
then if you run a backup once a day (every 24 hours), Bacula will use a new Volume for each backup, because each Volume it writes can only be used for 23 hours after the first write. Note, setting the use duration to 23 hours is not a very good solution for tapes unless you have someone on-site during the weekends, because Bacula will want a new Volume and no one will be present to mount it, so no weekend backups will be done until Monday morning.
Use of the above records brings up another problem -- that of labeling your Volumes. For automated disk backup, you can either manually label each of your Volumes, or you can have Bacula automatically label new Volumes when they are needed. While, the automatic Volume labeling in version 1.30 and prior is a bit simplistic, but it does allow for automation, the features added in version 1.31 permit automatic creation of a wide variety of labels including information from environment variables and special Bacula Counter variables. In version 1.37 and later, it is probably much better to use Python scripting and the NewVolume event since generating Volume labels in a Python script is much easier than trying to figure out Counter variables. See the Python Scripting chapter of this manual for more details.
Please note that automatic Volume labeling can also be used with tapes, but it is not nearly so practical since the tapes must be pre-mounted. This requires some user interaction. Automatic labeling from templates does NOT work with autochangers since Bacula will not access unknown slots. There are several methods of labeling all volumes in an autochanger magazine. For more information on this, please see the Autochanger chapter of this manual.
Automatic Volume labeling is enabled by making a change to both the Pool resource (Director) and to the Device resource (Storage daemon) shown above. In the case of the Pool resource, you must provide Bacula with a label format that it will use to create new names. In the simplest form, the label format is simply the Volume name, to which Bacula will append a four digit number. This number starts at 0001 and is incremented for each Volume the pool contains. Thus if you modify your Pool resource to be:
Pool { Name = File Pool Type = Backup Volume Use Duration = 23h LabelFormat = "Vol" }
Bacula will create Volume names Vol0001, Vol0002, and so on when new Volumes are needed. Much more complex and elaborate labels can be created using variable expansion defined in the Variable Expansion chapter of this manual.
The second change that is necessary to make automatic labeling work is to give the Storage daemon permission to automatically label Volumes. Do so by adding LabelMedia = yes to the Device resource as follows:
Device { Name = File Media Type = File Archive Device = /home/bacula/backups Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; LabelMedia = yes }
You can find more details of the Label Format Pool record in Label Format description of the Pool resource records.
Automatic labeling discussed above brings up the problem of Volume management. With the above scheme, a new Volume will be created every day. If you have not specified Retention periods, your Catalog will continue to fill keeping track of all the files Bacula has backed up, and this procedure will create one new archive file (Volume) every day.
The tools Bacula gives you to help automatically manage these problems are the following:
The first three records (File Retention, Job Retention, and AutoPrune) determine the amount of time that Job and File records will remain in your Catalog, and they are discussed in detail in the Automatic Volume Recycling chapter of this manual.
Volume Retention, AutoPrune, and Recycle determine how long Bacula will keep your Volumes before reusing them, and they are also discussed in detail in the Automatic Volume Recycling chapter of this manual.
The Maximum Volumes record can also be used in conjunction with the Volume Retention period to limit the total number of archive Volumes (files) that Bacula will create. By setting an appropriate Volume Retention period, a Volume will be purged just before it is needed and thus Bacula can cycle through a fixed set of Volumes. Cycling through a fixed set of Volumes can also be done by setting Recycle Oldest Volume = yes or Recycle Current Volume = yes. In this case, when Bacula needs a new Volume, it will prune the specified volume.
Now suppose you want to use multiple Pools, which means multiple Volumes, or suppose you want each client to have its own Volume and perhaps its own directory such as /home/bacula/client1 and /home/bacula/client2 ... With the single Storage and Device definition above, neither of these two is possible. Why? Because Bacula disk storage follows the same rules as tape devices. Only one Volume can be mounted on any Device at any time. If you want to simultaneously write multiple Volumes, you will need multiple Device resources in your bacula-sd.conf file, and thus multiple Storage resources in your bacula-dir.conf.
OK, so now you should understand that you need multiple Device definitions in the case of different directorys or different Pools, but you also need to know that the catalog data that Bacula keeps contains only the Media Type and not the specific storage device. This permits a tape for example to be re-read on any compatible tape drive. The compatibility being determined by the Media Type. The same applies to disk storage. Since a volume that is written by a Device in say directory /home/bacula/backups cannot be read by a Device with an Archive Device definition of /home/bacula/client1, you will not be able to restore all your files if you give both those devices Media Type = File. During the restore, Bacula will simply choose the first available device, which may not be the correct one. If this is confusing, just remember that the Directory has only the Media Type and the Volume name. It does not know the Archive Device (or the full path) that is specified in the Storage daemon. Thus you must explicitly tie your Volumes to the correct Device by using the Media Type.
The example shown below shows a case where there are two clients, each using its own Pool and storing their Volumes in different directories.
The following example is not very practical, but can be used to demonstrate the proof of concept in a relatively short period of time. The example consists of a two clients that are backed up to a set of 12 archive files (Volumes) for each client into different directories on the Storage maching. Each Volume is used (written) only once, and there are four Full saves done every hour (so the whole thing cycles around after three hours).
What is key here is that each physical device on the Storage daemon has a different Media Type. This allows the Director to choose the correct device for restores ...
The Director's configuration file is as follows:
Director { Name = my-dir QueryFile = "~/bacula/bin/query.sql" PidDirectory = "~/bacula/working" WorkingDirectory = "~/bacula/working" Password = dir_password } Schedule { Name = "FourPerHour" Run = Level=Full hourly at 0:05 Run = Level=Full hourly at 0:20 Run = Level=Full hourly at 0:35 Run = Level=Full hourly at 0:50 } Job { Name = "RecycleExample" Type = Backup Level = Full Client = Rufus FileSet= "Example FileSet" Messages = Standard Storage = FileStorage Pool = Recycle Schedule = FourPerHour } Job { Name = "RecycleExample2" Type = Backup Level = Full Client = Roxie FileSet= "Example FileSet" Messages = Standard Storage = FileStorage1 Pool = Recycle1 Schedule = FourPerHour } FileSet { Name = "Example FileSet" Include = compression=GZIP signature=SHA1 { /home/kern/bacula/bin } } Client { Name = Rufus Address = rufus Catalog = BackupDB Password = client_password } Client { Name = Roxie Address = roxie Catalog = BackupDB Password = client1_password } Storage { Name = FileStorage Address = rufus Password = local_storage_password Device = RecycleDir Media Type = File } Storage { Name = FileStorage1 Address = rufus Password = local_storage_password Device = RecycleDir1 Media Type = File1 } Catalog { Name = BackupDB dbname = bacula; user = bacula; password = "" } Messages { Name = Standard ... } Pool { Name = Recycle Use Volume Once = yes Pool Type = Backup LabelFormat = "Recycle-" AutoPrune = yes VolumeRetention = 2h Maximum Volumes = 12 Recycle = yes } Pool { Name = Recycle1 Use Volume Once = yes Pool Type = Backup LabelFormat = "Recycle1-" AutoPrune = yes VolumeRetention = 2h Maximum Volumes = 12 Recycle = yes }
and the Storage daemon's configuration file is:
Storage { Name = my-sd WorkingDirectory = "~/bacula/working" Pid Directory = "~/bacula/working" MaximumConcurrentJobs = 10 } Director { Name = my-dir Password = local_storage_password } Device { Name = RecycleDir Media Type = File Archive Device = /home/bacula/backups LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } Device { Name = RecycleDir1 Media Type = File1 Archive Device = /home/bacula/backups1 LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } Messages { Name = Standard director = my-dir = all }
With a little bit of work, you can change the above example into a weekly or monthly cycle (take care about the amount of archive disk space used).
Bacula can, of course, use multiple disks, but in general, each disk must be a separate Device specification in the Storage daemon's conf file, and you must then select what clients to backup to each disk. You will also want to give each Device specification a different Media Type so that during a restore, Bacula will be able to find the appropriate drive.
The situation is a bit more complicated if you want to treat two different physical disk drives (or partitions) logically as a single drive, which Bacula does not directly support. However, it is possible to back up your data to multiple disks as if they were a single drive by linking the Volumes from the first disk to the second disk.
For example, assume that you have two disks named /disk1 and /disk2. If you then create a standard Storage daemon Device resource for backing up to the first disk, it will look like the following:
Device { Name = client1 Media Type = File Archive Device = /disk1 LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; }
Since there is no way to get the above Device resource to reference both /disk1 and /disk2 we do it by pre-creating Volumes on /disk2 with the following:
ln -s /disk2/Disk2-vol001 /disk1/Disk2-vol001 ln -s /disk2/Disk2-vol002 /disk1/Disk2-vol002 ln -s /disk2/Disk2-vol003 /disk1/Disk2-vol003 ...
At this point, you can label the Volumes as Volume Disk2-vol001, Disk2-vol002, ... and Bacula will use them as if they were on /disk1 but actually write the data to /disk2. The only minor inconvenience with this method is that you must explicitly name the disks and cannot use automatic labeling unless you arrange to have the labels exactly match the links you have created.
An important thing to know is that Bacula treats disks like tape drives as much as it can. This means that you can only have a single Volume mounted at one time on a disk as defined in your Device resource in the Storage daemon's conf file. You can have multiple concurrent jobs running that all write to the one Volume that is being used, but if you want to have multiple concurrent jobs that are writting to separate disks drives (or partitions), you will need to define separate Device resources for each one, exactly as you would do for two different tape drives. There is one fundamental difference, however. The Volumes that you creat on the two drives cannot be easily exchanged as they can for a tape drive, because they are physically resident (already mounted in a sense) on the particular drive. As a consequence, you will probably want to give them different Media Types so that Bacula can distinguish what Device resource to use during a restore. An example would be the following:
Device { Name = Disk1 Media Type = File1 Archive Device = /disk1 LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } Device { Name = Disk2 Media Type = File2 Archive Device = /disk2 LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; }
With the above device definitions, you can run two concurrent jobs each writing at the same time, one to /disk2 and the other to /disk2. The fact that you have given them different Media Types will allow Bacula to quickly choose the correct Storage resource in the Director when doing a restore.
If we take the above example and add a second Client, here are a few considerations:
In this example, we have two clients, each with a different Pool and a different number of archive files retained. They also write to different directories with different Volume labeling.
The Director's configuration file is as follows:
Director { Name = my-dir QueryFile = "~/bacula/bin/query.sql" PidDirectory = "~/bacula/working" WorkingDirectory = "~/bacula/working" Password = dir_password } # Basic weekly schedule Schedule { Name = "WeeklySchedule" Run = Level=Full fri at 1:30 Run = Level=Incremental sat-thu at 1:30 } FileSet { Name = "Example FileSet" Include = compression=GZIP signature=SHA1 { /home/kern/bacula/bin } } Job { Name = "Backup-client1" Type = Backup Level = Full Client = client1 FileSet= "Example FileSet" Messages = Standard Storage = File1 Pool = client1 Schedule = "WeeklySchedule" } Job { Name = "Backup-client2" Type = Backup Level = Full Client = client2 FileSet= "Example FileSet" Messages = Standard Storage = File2 Pool = client2 Schedule = "WeeklySchedule" } Client { Name = client1 Address = client1 Catalog = BackupDB Password = client1_password File Retention = 7d } Client { Name = client2 Address = client2 Catalog = BackupDB Password = client2_password } # Two Storage definitions with differen Media Types # permits different directories Storage { Name = File1 Address = rufus Password = local_storage_password Device = client1 Media Type = File1 } Storage { Name = File2 Address = rufus Password = local_storage_password Device = client2 Media Type = File2 } Catalog { Name = BackupDB dbname = bacula; user = bacula; password = "" } Messages { Name = Standard ... } # Two pools permits different cycling periods and Volume names # Cycle through 15 Volumes (two weeks) Pool { Name = client1 Use Volume Once = yes Pool Type = Backup LabelFormat = "Client1-" AutoPrune = yes VolumeRetention = 13d Maximum Volumes = 15 Recycle = yes } # Cycle through 8 Volumes (1 week) Pool { Name = client2 Use Volume Once = yes Pool Type = Backup LabelFormat = "Client2-" AutoPrune = yes VolumeRetention = 6d Maximum Volumes = 8 Recycle = yes }
and the Storage daemon's configuration file is:
Storage { Name = my-sd WorkingDirectory = "~/bacula/working" Pid Directory = "~/bacula/working" MaximumConcurrentJobs = 10 } Director { Name = my-dir Password = local_storage_password } # Archive directory for Client1 Device { Name = client1 Media Type = File1 Archive Device = /home/bacula/client1 LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } # Archive directory for Client2 Device { Name = client2 Media Type = File2 Archive Device = /home/bacula/client2 LabelMedia = yes; Random Access = Yes; AutomaticMount = yes; RemovableMedia = no; AlwaysOpen = no; } Messages { Name = Standard director = my-dir = all }
Bacula allows you to specify that you want to write to DVD. However, this feature is implemented only in version 1.37 or later. You may in fact write to DVD+RW, DVD+R, DVD-R, or DVD-RW media. The actual process used by Bacula is to first write the image to a spool directory, then when the Volume reaches a certain size or, at your option, at the end of a Job, Bacula will transfer the image from the spool directory to the DVD. The actual work of transferring the image is done by a script dvd-handler, and the heart of that script is a program called growisofs which allows creating or adding to a DVD ISO filesystem.
You must have dvd+rw-tools loaded on your system for DVD writing to work. Please note that the original dvd+rw-tools package does NOT work with Bacula. You must apply a patch which can be found in the patches directory of Bacula sources with the name dvd+rw-tools-5.21.4.10.8.bacula.patch.
The fact that Bacula cannot use the OS to write directly to the DVD makes the whole process a bit more error prone than writing to a disk or a tape, but nevertheless, it does work if you use some care to set it up properly. However, at the current time (28 October 2005) we still consider this code to be experimental and of BETA quality. As a consequence, please do careful testing before relying on DVD backups in production.
The remainder of this chapter explains the various directives that you can use to control the DVD writing.
The following directives are added to the Storage daemon's Device resource.
Most frequently, you will define it as follows:
Mount Command = "/bin/mount -t iso9660 -o ro %a %m"
Most frequently, you will define it as follows:
Unmount Command = "/bin/umount %m"
For a DVD, you will most frequently specify the Bacula supplied dvd-handler script as follows:
Write Part Command = "/path/dvd-handler %a write %e %v"
Where /path is the path to your scripts install directory, and dvd-handler is the Bacula supplied script file. This command will already be present, but commented out, in the default bacula-sd.conf file. To use it, simply remove the comment (#) symbol.
For a DVD, you will most frequently specify the Bacula supplied dvd-handler script as follows:
Free Space Command = "/path/dvd-handler %a free"
Where /path is the path to your scripts install directory, and dvd-freespace is the Bacula supplied script file. If you want to specify your own command, please look at the code in dvd-handler to see what output Bacula expects from this command. This command will already be present, but commented out, in the default bacula-sd.conf file. To use it, simply remove the comment (#) symbol.
If you do not set it, Bacula will expect there is always free space on the device.
In addition to the directives specified above, you must also specify the other standard Device resource directives. Please see the sample DVD Device resource in the default bacula-sd.conf file. Be sure to specify the raw device name for Archive Device. It should be a name such as /dev/cdrom or /media/cdrecorder or /dev/dvd depending on your system. It will not be a name such as /mnt/cdrom.
The following directives are added to the Director's Job resource.
It should be set to yes when writing to devices that require a mount (for example DVD), so you are sure that the current part, containing this job's data, is written to the device, and that no data is left in the temporary file on the hard disk. However, on some media, like DVD+R and DVD-R, a lot of space (about 10Mb) is lost everytime a part is written. So, if you run several jobs each after another, you could set this directive to no for all jobs, except the last one, to avoid wasting too much space, but to ensure that the data is written to the medium when all jobs are finished.
This directive is ignored for devices other than DVDs.
To retrieve the current mode of a DVD-RW, run:
dvd+rw-mediainfo /dev/xxxwhere you replace xxx with your DVD device name.
Mounted Media line should give you the information.
To set the device to Restricted Overwrite mode, run:
dvd+rw-format /dev/xxxIf you want to set it back to the default Incremental Sequential mode, run:
dvd+rw-format -blank /dev/xxx
dd if=/dev/zero bs=1024 count=512 | growisofs -Z /dev/xxx=/dev/fd/0Then, try to mount the device, if it cannot be mounted, it will be considered as blank by Bacula, if it can be mounted, try a full blank (see below).
growisofs -Z /dev/xxx=/dev/zerowhere you replace xxx with your DVD device name. However, note that this blanks the whole DVD, which takes quite a long time (16 minutes on mine).
If you manage 5 or 10 machines and have a nice tape backup, you don't need Pools, and you may wonder what they are good for. In this chapter, you will see that Pools can help you optimize disk storage space. The same techniques can be applied to a shop that has multiple tape drives, or that wants to mount various different Volumes to meet their needs.
The rest of this chapter will give an example involving backup to disk Volumes, but most of the information applies equally well to tape Volumes.
A site that I administer (a charitable organization) had a tape DDS-3 tape drive that was failing. The exact reason for the failure is still unknown. Worse yet, their full backup size is about 15GB whereas the capacity of their broken DDS-3 was at best 8GB (rated 6/12). A new DDS-4 tape drive and the necessary cassettes was more expensive than their budget could handle.
They want to maintain 6 months of backup data, and be able to access the old files on a daily basis for a week, a weekly basis for a month, then monthly for 6 months. In addition, offsite capability was not needed (well perhaps it really is, but it was never used). Their daily changes amount to about 300MB on the average, or about 2GB per week.
As a consequence, the total volume of data they need to keep to meet their needs is about 100GB (15GB x 6 + 2GB x 5 + 0.3 x 7) = 102.1GB.
The chosen solution was to buy a 120GB hard disk for next to nothing -- far less than 1/10th the price of a tape drive and the cassettes to handle the same amount of data, and to have Bacula write to disk files.
The rest of this chapter will explain how to setup Bacula so that it would automatically manage a set of disk files with the minimum intervention on my part. The system has been running since 22 January 2004 until today (08 April 2004) with no intervention. Since we have not yet crossed the six month boundary, we still lack some data to be sure the system performs as desired.
Getting Bacula to write to disk rather than tape in the simplest case is rather easy, and is documented in the previous chapter. In addition, all the directives discussed here are explained in that chapter. We'll leave it to you to look at the details there. If you haven't read it and are not familiar with Pools, you probably should at least read it once quickly for the ideas before continuing here.
One needs to consider about what happens if we have only a single large Bacula Volume defined on our hard disk. Everything works fine until the Volume fills, then Bacula will ask you to mount a new Volume. This same problem applies to the use of tape Volumes if your tape fills. Being a hard disk and the only one you have, this will be a bit of a problem. It should be obvious that it is better to use a number of smaller Volumes and arrange for Bacula to automatically recycle them so that the disk storage space can be reused. The other problem with a single Volume, is that at the current time (1.34.0) Bacula does not seek within a disk Volume, so restoring a single file can take more time than one would expect.
As mentioned, the solution is to have multiple Volumes, or files on the disk. To do so, we need to limit the use and thus the size of a single Volume, by time, by number of jobs, or by size. Any of these would work, but we chose to limit the use of a single Volume by putting a single job in each Volume with the exception of Volumes containing Incremental backup where there will be 6 jobs (a week's worth of data) per volume. The details of this will be discussed shortly.
The next problem to resolve is recycling of Volumes. As you noted from above, the requirements are to be able to restore monthly for 6 months, weekly for a month, and daily for a week. So to simplify things, why not do a Full save once a month, a Differential save once a week, and Incremental saves daily. Now since each of these different kinds of saves needs to remain valid for differing periods, the simplest way to do this (and possibly the only) is to have a separate Pool for each backup type.
The decision was to use three Pools: one for Full saves, one for Differential saves, and one for Incremental saves, and each would have a different number of volumes and a different Retention period to accomplish the requirements.
Putting a single Full backup on each Volume, will require six Full save Volumes, and a retention period of six months. The Pool needed to do that is:
Pool { Name = Full-Pool Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 6 months Accept Any Volume = yes Maximum Volume Jobs = 1 Label Format = Full- Maximum Volumes = 6 }
Since these are disk Volumes, no space is lost by having separate Volumes for each backup (done once a month in this case). The items to note are the retention period of six months (i.e. they are recycled after 6 months), that there is one job per volume (Maximum Volume Jobs = 1), the volumes will be labeled Full-0001, ... Full-0006 automatically. One could have labeled these manual from the start, but why not use the features of Bacula.
For the Differential backup Pool, we choose a retention period of a bit longer than a month and ensure that there is at least one Volume for each of the maximum of five weeks in a month. So the following works:
Pool { Name = Diff-Pool Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 40 days Accept Any Volume = yes Maximum Volume Jobs = 1 Label Format = Diff- Maximum Volumes = 6 }
As you can see, the Differential Pool can grow to a maximum of six volumes, and the Volumes are retained 40 days and thereafter they can be recycled. Finally there is one job per volume. This, of course, could be tightened up a lot, but the expense here is a few GB which is not too serious.
Finally, here is the resource for the Incremental Pool:
Pool { Name = Inc-Pool Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 20 days Accept Any Volume = yes Maximum Volume Jobs = 6 Label Format = Inc- Maximum Volumes = 5 }
We keep the data for 20 days rather than just a week as the needs require. To reduce the proliferation of volume names, we keep a week's worth of data (6 incremental backups) in each Volume. In practice, the retention period should be set to just a bit more than a week and keep only two or three volumes instead of five. Again, the lost is very little and as the system reaches the full steady state, we can adjust these values so that the total disk usage doesn't exceed the disk capacity.
The following example shows you the actual files used, with only a few minor modifications to simplify things.
The Director's configuration file is as follows:
Director { # define myself Name = bacula-dir DIRport = 9101 QueryFile = "/home/bacula/bin/query.sql" WorkingDirectory = "/home/bacula/working" PidDirectory = "/home/bacula/working" Maximum Concurrent Jobs = 1 Password = " " Messages = Standard } # By default, this job will back up to disk in /tmp Job { Name = client Type = Backup Client = client-fd FileSet = "Full Set" Schedule = "WeeklyCycle" Storage = File Messages = Standard Pool = Default Full Backup Pool = Full-Pool Incremental Backup Pool = Inc-Pool Differential Backup Pool = Diff-Pool Write Bootstrap = "/home/bacula/working/client.bsr" Priority = 10 } # List of files to be backed up FileSet { Name = "Full Set" Include = signature=SHA1 compression=GZIP9 { / /usr /home } Exclude = { /proc /tmp /.journal /.fsck } } Schedule { Name = "WeeklyCycle" Run = Full 1st sun at 1:05 Run = Differential 2nd-5th sun at 1:05 Run = Incremental mon-sat at 1:05 } Client { Name = client-fd Address = client FDPort = 9102 Catalog = MyCatalog Password = " " AutoPrune = yes # Prune expired Jobs/Files Job Retention = 6 months File Retention = 60 days } Storage { Name = File Address = localhost SDPort = 9103 Password = " " Device = FileStorage Media Type = File } Catalog { Name = MyCatalog dbname = bacula; user = bacula; password = "" } Pool { Name = Full-Pool Pool Type = Backup Recycle = yes # automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 6 months Accept Any Volume = yes # write on any volume in the pool Maximum Volume Jobs = 1 Label Format = Full- Maximum Volumes = 6 } Pool { Name = Inc-Pool Pool Type = Backup Recycle = yes # automatically recycle Volumes AutoPrune = yes # Prune expired volumes Volume Retention = 20 days Accept Any Volume = yes Maximum Volume Jobs = 6 Label Format = Inc- Maximum Volumes = 5 } Pool { Name = Diff-Pool Pool Type = Backup Recycle = yes AutoPrune = yes Volume Retention = 40 days Accept Any Volume = yes Maximum Volume Jobs = 1 Label Format = Diff- Maximum Volumes = 6 } Messages { Name = Standard mailcommand = "bsmtp -h mail.domain.com -f \"\(Bacula\) %r\" -s \"Bacula: %t %e of %c %l\" %r" operatorcommand = "bsmtp -h mail.domain.com -f \"\(Bacula\) %r\" -s \"Bacula: Intervention needed for %j\" %r" mail = root@domain.com = all, !skipped operator = root@domain.com = mount console = all, !skipped, !saved append = "/home/bacula/bin/log" = all, !skipped }
and the Storage daemon's configuration file is:
Storage { # definition of myself Name = bacula-sd SDPort = 9103 # Director's port WorkingDirectory = "/home/bacula/working" Pid Directory = "/home/bacula/working" } Director { Name = bacula-dir Password = " " } Device { Name = FileStorage Media Type = File Archive Device = /files/bacula LabelMedia = yes; # lets Bacula label unlabeled media Random Access = Yes; AutomaticMount = yes; # when device opened, read it RemovableMedia = no; AlwaysOpen = no; } Messages { Name = Standard director = bacula-dir = all }
Kern Sibbald 2008-01-31