Reading Time: 4 minutesBy Andy Feather
I often hear from customers that are sitting on scores of legacy tapes with unknown contents beyond a generic “business data” level, and 99+ percent of them are not known at a granular level. As we all know too well, disaster recovery backups morphed into unintentional data archiving these past 10 – 15 years thanks to litigation and government regulatory investigations, along with general business obligations to retain certain records. The duty to preserve has forced businesses to preserve backup tapes if at least one file on the tape might be under some form of preservation obligation. The IT staff almost never has the equipment or human resources to perform targeted restores of data under preservation and stack it together with other similar data, so they take the easy way out: buy more tape and retain existing tapes vs. overwriting their contents. Companies change backup software providers and migrate to newer backup platforms and get stuck paying maintenance and support for software and hardware they no longer use, but might one day.
An additional problem lies in the fact that companies are waking up and realizing that while tape as a storage mechanism is a great value, the real estate and costs associated with parking and retaining them in mass quantities can add up. In response, companies like Seagate and TapeArk offer to move large volumes of data into the cloud, but does this provide value to the customer? Why pay to migrate thousands of tapes to the cloud on the chance that you might one day need to access them?
So I came across a neat solution to this problem from a service provider/software developer named SullivanStricklerout of Atlanta. They recognize the gap between the status quo and the cloud and created TRACS/TDF and TRACS/TSF. TRACS stands for Tape Restoration and Cataloging System, TDF for Tape Duplicate File and TSF for Tape Session File. TDF and TSF files are both file containers which consist of data from legacy backup tapes, regardless of the source tape type and backup software format. TDF and TSF provide customers with a catalog of the contents of the tape and the ability to immediately restore the contents of the once backup tape, now TDF or TSF file, and/or stack and store the TDF/TSF files onto newer, higher capacity media using LTFS or some other backup software.
The economics of tape stacking have been explored for years, but the “value” of the exercise provided little ROI until 6.0 TB LTO-7 tapes arrived. The combination of reducing the storage costs associated with 60 LTO-1 (100 GB) tapes and replacing them with one LTO-7 tape, along with the increased value of discovering the contents of long forgotten backups and never having to pay licensing and support fees for technologies you no longer use, combine to provide the justification for businesses to begin to explore a stacking/migration effort.
Some customers ask, “But if I am going to undertake this effort, why do I need to migrate everything instead of only what I need to keep?” This is a very valid question, and is a good segue into the differences between TRACS/TDF and TRACS/TSF files.
TDF or Tape Duplicate File, is a byte-for-byte copy of the source tape, with the addition of a catalog of the tape contents appended to the file. Files ranging in quantity from one to all can be restored from a TDF file, and as a bonus the conversion process is reversible. This means that customers who convert from tape to TDF format can ultimately rewrite the data back out to tape so that it can once again be used by the backup software which originally created the tape, should there ever be a need.
TSF, or Tape Session File, differs slightly from a TDF file. Whereas a TDF file is a duplicate copy of an entire tape in one logical volume container, a TSF file is an individual logical session container from a tape. A TSF file can be created for one backup session, up to all of the backup sessions on the tape. TSF files are exciting because of the business value they provide. TDF files provide great value due to the stacking and cataloging elements, but TSF files allow users to pick and choose which backup sessions to retain and which can be deleted. If a company’s preservation requirements are such that they need to retain all backups of their email system and their file servers, but not their domain controllers, print servers, departmental databases, etc., then TSF files allow them to do this by breaking up the “if I need one file I need to keep the entire tape” limitation. This process results in an even larger business value than TDF through the reduction in risk associated with retaining data which need not be retained, and since not all sessions will be retained by customers, the reduction in data volume is multiplied.
Additionally, with one eye on the growing number of state, national and international regulations concerning data privacy and information governance, such as the EU’s General Data Protection Regulation (GDPR) or California’s Consumer Privacy Act, TSF allows for the defensible deletion of files stored within backups, without impacting the remaining backed up files. This type of targeted deletion of data originating from tape is quite unique, and all performed without restoring the data from a single tape.
Of course there are other solutions but I like the simplicity and logic of TRACS/TDF and TRACS/TSF. Certainly it’s more practical and affordable than what Seagate and TapeArk propose!