TSM / ISP Backup
The LRZ offers a backup and archive solution from IBM called TSM or the new brand SP.
Structure
We use the LRZ backup service to protect the data from our Ceph storage cluster. This data includes all application data (wiki, inventory, orders, icinga, etc.), VM image backups, datasets and employee and student data. The policy of the LRZ allows for performance reasons only 20TB and 10 million files per backup node. For our ceph cluster several nodes need to be requested and configured to protect the whole cluster.
The CephFS structure and the regarding backup nodes are split in the following way:
Ceph Folder Structure --------------------- admin ---| | |-- one_vms |-- backup --|-- wiki_cm | |-- etc. | –------–--------------------------------------------------- ^ Node 1 - 10TB (50GB x 100VMs = 5TB, + 5TB application data) | |-- internal |-- <thesis1> | student ---|-- <thesis2> | |-- etc. user ---|-- data | –-------–-----------------------–-------------------------- ^ Node 2 - 15TB (student + employee data, simluations, etc.) | |-- UF | | –---------------------–------------------------------------ ^ Node 3 - 10TB (dataset with 8.2TB, 8.300.000 files) | | |-- datasets --|-- <set1> | |-- <set2> | –---------------------------------------------------------- ^ Node 4 - 20TB (several "small" datasets) | |-- <...> –---------------------------------------------------------- ^ Node N - XTB (future datasets)
Usage
This section describes the usage of the TSM backup client. Further documentation can be found on the LRZ website.
After the installation and first configuration the config files can be found here: /opt/tivoli/tsm/client/ba/bin/
- dsm.sys → primary configuration file
- dsm.opt → additional file, specify files to backup etc.
- dsmsched.log → Log File tracking the call of the tsm sheduler
- dsmerror.log → Important! Log errors and warnings during backup
/etc/adsm/TSM.PWD
→ Stores the encrypted node password
All backup commands are also listed and explained on this LRZ website. To execute most of of the following commands you need to cd into the config directory and execute them from there
cd /opt/tivoli/tsm/client/ba/bin
Query Backup
- Query backup for all files
dsmc query backup -subdir=yes "/etc/ssh/*" dsmc query backup -subdir=yes -detail "/etc/ssh/" # show also files that were deleted but exist in the backup dsmc query backup -inactive -subdir=yes "/etc/ssh/*"
- Show all filespaces / virtual mount points that are backed up
dsmc query filespace
- Show client version, retention policies, etc.
dsmc query mgmt -detail
Init Backup
- Start backup manually
dsmc incremental backup
Restore Files
- Restore whole folders
dsmc restore -replace=no -subdir=yes "/etc/ssh/*" /etc/ssh/
- Restore but not files that are on the host and are more recent
dsmc restore -replace=yes -subdir=yes -ifnewer "/etc/ssh/*" /etc/ssh/
If the restore process was aborted (user, network outage), a pending restartable restore session is spawned until it resolves no further restore or backup operations can be performed!!
- Show all pending restore sessions
dsmc query restore
- Restart the restore session, choose the right session in the dialog
dsmc restart restore
Config File
Config file used for the ceph cluster backup. After making changes to the config make sure everything works (no typos) by executing a query
dsmc query filespace
. The command will throw an error if the config file is corrupted. Nevertheless check the logs in detail the day /backup after changing the config!
VirtualMountPoints are shown in the query filespace command (dsmc query filespace
) but only after the backup executed.
ISP Config Ceph Backup - dsm.sys
***************************************************************** * Ceph Storage Backup * * * * LRZ Tivoli Storage Manager Config * ***************************************************************** * ceph01 ******************************************************** * ############################################################### servername ceph01 tcpserveraddress s<nn>.abs.lrz.de tcpport <1234> nodename ceph01.cm passwordaccess generate managedservices schedule ERRORLOGRETENTION 7 D SCHELOGRETENTION 7 D errorlogname /var/log/backup/ceph01_error.log schedlogname /var/log/backup/ceph01_sched.log subdir yes VirtualMountPoint /ceph/backup domain /ceph/backup exclude * include /ceph/backup/../* * ceph02 ******************************************************** * ############################################################### servername ceph02 tcpserveraddress s<nn>.abs.lrz.de tcpport <1234> nodename ceph02.cm passwordaccess generate managedservices schedule ERRORLOGRETENTION 7 D SCHELOGRETENTION 7 D errorlogname /var/log/backup/ceph02_error.log schedlogname /var/log/backup/ceph02_sched.log subdir yes VirtualMountPoint /ceph/internal VirtualMountPoint /ceph/students VirtualMountPoint /ceph/data domain /ceph/internal domain /ceph/students domain /ceph/data exclude * include /ceph/internal/../* include /ceph/students/../* include /ceph/data/../*