FUJIFILM INSIGHTS BLOG

Data Storage

It’s Time to Wake up and Smell the Tape!

Reading Time: 2 minutes

By Rich Gadomski

I just spent a full day at a meeting of the Active Archive Alliance and as I was flying home it occurred to me that it’s time for data storage managers to rise up from the sleepy status quo of buying more disk arrays to address runaway data growth problems. It’s time to wake up and smell the sweet aroma of freshly made modern data tape (sort of like that new car smell if you don’t know).

Why do that you ask? Because best practices and undeniable facts say so. Consider the following:

Data goes through a lifecycle from hot to cold, that is to say from a period of active use to a period of inactivity. This can happen in as little as 30 days or less.

Inactive data should not stay on primary storage devices. It takes up space on expensive storage media, consumes more energy and adds to the backup burden.

What to do? Delete it? You probably can’t get permission to delete it, all data is now potentially valuable with new artificial intelligence (AI) and analytic tools emerging to derive value from that data. But you can move it and stop copying it!

Where do you move it to? Put it in an active archive consisting of low cost disk cache and even lower cost long term storage like a high density automated tape library. To store one petabyte of data for 10 years in a tape library will cost around $220,000 depending on your TCO variables. Alternatively, you could spend $900,000 on HDD and around $1,300,000 for cloud. Need more capacity? Tape libraries easily scale by adding more slots and tapes. You can export full tapes and plug new ones in. Move the full tapes offsite and get the benefit of air gap since the data is physically isolated from other networks. At least you know that data can’t be accessed and held for ransom.

Getting end user access requests for that data all of a sudden? Move it back to disk cache and serve it from there. When done, move it back to the tape library. Tape is super-fast, 360 MB a second and file access is made easier and faster with LTFS.

How to orchestrate all this? Intelligent data management solutions help move data automatically. Leverage metadata and AI tools to analyze files and move them off primary storage if they don’t belong there.

Does this sound like a tiered storage strategy? It is and it’s also known as an active archive. This is a best practice used by the biggest and most advanced data generating companies in the industry. If it works for them, it will work for you too.

There’s a lot of hype in the storage industry with lots of folks looking for new, better ways to do things. But some things are tried and true, like tape, with the benefits of constantly evolving capacities, performance, reliability and long term archivability. So wake up and smell the tape…put your data where it belongs and get on with your day!

Read More

Video: How CERN Migrated 100PB of Data

Reading Time: < 1 minute

For over five decades, CERN has used tape for its archival storage. In this Fujifilm Summit video, Vladimir Bahyl of CERN explains how they increased the capacity of their tape archive by reformatting certain types of tape cartridges at a higher density.

 

 

Read More

Storage Switzerland Video: Reintroducing Tape to Disaster Recovery

Reading Time: < 1 minute

Previously, Storage Switzerland blogged about the merits of employing a tape storage hierarchy to cut backup storage costs. Tape media can furthermore add value as a tier in the broader disaster recovery strategy, as well.

As Lead Analyst George Crump overviewed in a recent video, applications are not all created equal when it comes to recovery time objectives (RTOs, the amount of time that it takes to get an application back up and running following an outage)

Check out George’s blog for more details and to view the video:

Reintroducing Tape to Disaster Recovery

 

Read More

A Neat Solution for Tape Stacking and Migration

Reading Time: 4 minutes

By Andy Feather

I often hear from customers that are sitting on scores of legacy tapes with unknown contents beyond a generic “business data” level, and 99+ percent of them are not known at a granular level. As we all know too well, disaster recovery backups morphed into unintentional data archiving these past 10 – 15 years thanks to litigation and government regulatory investigations, along with general business obligations to retain certain records.  The duty to preserve has forced businesses to preserve backup tapes if at least one file on the tape might be under some form of preservation obligation.  The IT staff almost never has the equipment or human resources to perform targeted restores of data under preservation and stack it together with other similar data, so they take the easy way out: buy more tape and retain existing tapes vs. overwriting their contents.  Companies change backup software providers and migrate to newer backup platforms and get stuck paying maintenance and support for software and hardware they no longer use, but might one day.

An additional problem lies in the fact that companies are waking up and realizing that while tape as a storage mechanism is a great value, the real estate and costs associated with parking and retaining them in mass quantities can add up.  In response, companies like Seagate and TapeArk offer to move large volumes of data into the cloud, but does this provide value to the customer?  Why pay to migrate thousands of tapes to the cloud on the chance that you might one day need to access them?

So I came across a neat solution to this problem from a service provider/software developer named SullivanStricklerout of Atlanta. They recognize the gap between the status quo and the cloud and created TRACS/TDF and TRACS/TSF.  TRACS stands for Tape Restoration and Cataloging System, TDF for Tape Duplicate File and TSF for Tape Session File. TDF and TSF files are both file containers which consist of data from legacy backup tapes, regardless of the source tape type and backup software format. TDF and TSF provide customers with a catalog of the contents of the tape and the ability to immediately restore the contents of the once backup tape, now TDF or TSF file, and/or stack and store the TDF/TSF files onto newer, higher capacity media using LTFS or some other backup software.

The economics of tape stacking have been explored for years, but the “value” of the exercise provided little ROI until 6.0 TB LTO-7 tapes arrived.  The combination of reducing the storage costs associated with 60 LTO-1 (100 GB) tapes and replacing them with one LTO-7 tape, along with the increased value of discovering the contents of long forgotten backups and never having to pay licensing and support fees for technologies you no longer use, combine to provide the justification for businesses to begin to explore a stacking/migration effort.

Some customers ask, “But if I am going to undertake this effort, why do I need to migrate everything instead of only what I need to keep?”  This is a very valid question, and is a good segue into the differences between TRACS/TDF and TRACS/TSF files.

TDF or Tape Duplicate File, is a byte-for-byte copy of the source tape, with the addition of a catalog of the tape contents appended to the file.  Files ranging in quantity from one to all can be restored from a TDF file, and as a bonus the conversion process is reversible.  This means that customers who convert from tape to TDF format can ultimately rewrite the data back out to tape so that it can once again be used by the backup software which originally created the tape, should there ever be a need.

TSF, or Tape Session File, differs slightly from a TDF file.  Whereas a TDF file is a duplicate copy of an entire tape in one logical volume container, a TSF file is an individual logical session container from a tape.  A TSF file can be created for one backup session, up to all of the backup sessions on the tape.  TSF files are exciting because of the business value they provide.  TDF files provide great value due to the stacking and cataloging elements, but TSF files allow users to pick and choose which backup sessions to retain and which can be deleted.  If a company’s preservation requirements are such that they need to retain all backups of their email system and their file servers, but not their domain controllers, print servers, departmental databases, etc., then TSF files allow them to do this by breaking up the “if I need one file I need to keep the entire tape” limitation.   This process results in an even larger business value than TDF through the reduction in risk associated with retaining data which need not be retained, and since not all sessions will be retained by customers, the reduction in data volume is multiplied.

Additionally, with one eye on the growing number of state, national and international regulations concerning data privacy and information governance, such as the EU’s General Data Protection Regulation (GDPR) or California’s Consumer Privacy Act, TSF allows for the defensible deletion of files stored within backups, without impacting the remaining backed up files.  This type of targeted deletion of data originating from tape is quite unique, and all performed without restoring the data from a single tape.

Of course there are other solutions but I like the simplicity and logic of TRACS/TDF and TRACS/TSF. Certainly it’s more practical and affordable than what Seagate and TapeArk propose!

Read More

Using Tape in Active Archive to Store Scientific Data

Reading Time: < 1 minute

Brookhaven National Labs (BNL) has grown from 60 PB of data archived in 2015 to 145 PB of data archived in 2018. In this Fujifilm Summit video, David Yu explains how BNL is using tape storage to cost-effectively manage this data growth. In addition, BNL uses an active archive system to provide easy access to data that is frequently needed by the BNL data center and other research institutions.

Read More

The Advantages of an Active Archive in Today’s Data-Flooded World

Reading Time: 4 minutes

The vast volumes of data created daily, coupled with the opportunity to derive value from that data, is making active archives an increasingly important part of organizations’ data management game plans across the globe.

In this Q&A, Active Archive Alliance Chairman, Peter Faulhaber, FUJIFILM Recording Media, U.S.A., Inc., shares his perspective on the role of active archives in managing the data deluge.

Q: What are some of the key trends driving the shift to active archive?

A: I would say the relentless rate of data growth and how to manage it. The answer lies in proper data classification and moving data to the right tier of storage at the right time. Analysts say that 60% of data becomes archival after 90 days or less. So there is a need to cost-effectively store, search for and retrieve enormous volumes of rapidly growing archival content.

Q: So what exactly does an active archive enable?

A: An active archive enables online access to data throughout its lifecycle regardless of which tier it resides in the storage hierarchy. Active archive file systems span all pools of storage, whether they are SSDs, HDDs, tape or cloud. But tape is a key enabler. Since tape has the lowest total cost of ownership for long term data retention, you can cost-effectively maintain on-line access to all of your data in an active archive.

Q: Speaking of tape, is cloud killing tape?

A: That’s a misconception as cloud storage providers such as Microsoft Azure have publicly stated their use of tape as part of their deep archive service offerings. The main reason is economics, which is supported by tape’s high reliability, long life and future areal density roadmap. I also think the industry is settling on a sensible balance of on-premises and off-site storage where tape has a role in both. So no, cloud is not killing tape, rather it’s an opportunity.

Q: How is an archive different than a backup?

A: Backup and archive are entirely different processes with different objectives. Think of the backup process as a “copy” of your data for recovery purposes. Backups are cycled and updated frequently to account for and protect the latest versions of important data assets.  As for archiving, think of this as a “move” of your fixed data to a new, more cost-effective tier for long-term retention. But if you ask end users, they don’t want their data sitting on a shelf off-line. They want it online, searchable and readily available and that’s what active archive provides.

Q: What is the market opportunity for active archives?

A: The market opportunity is significant due to the volume of archival data, the value of that data and the velocity or speed of access that’s required today. A recent ESG Research survey indicated that less than 40% of corporations have a dedicated archive strategy in place, yet every organization has archival data! The market is ready for modern, leading-edge archiving concepts like active archive.

Q: How is an active archive implemented?

A: There are numerous software and hardware solutions ranging from stand-alone active archive appliances to intelligent data management software that includes active archiving among other capabilities. End users canleveragetheir existing storage systems to implement an active archive strategy. Most active archive solutions allow customers to repurpose existing tape libraries to create an active archive partition that looks to users like another disk volume. When this is combined with open standard LTFS, or Linear Tape File System, the active archive is free from vendor lock-in, and ensures data portability and copy management for long-term archives.

Q: What are some of the advantages of an active archive?

A: An active archive enables users to easily find and utilize archived data, while also removing complexity and operational load on IT administrators. By automating archiving so that data doesn’t get stranded on an inaccessible “shelf” somewhere, the data value increases without tying up the most expensive storage resources. This improves storage performance, lowers total cost of ownership, and reduces risk of non-compliance and data loss.

Q: You mention simplified data storage and ease of use, how is that achieved?

A: Active archive solves complexity by leveraging an intelligent data management layer. Access and management of data is getting more complex so we need modern strategies with intelligent data management techniques that are automated and policy based.  Classifying data upon its creation by its value, and automatically updating performance and capacity requirements over its lifecycle will enable the right data to be in the right place at the right time.

Q: Can active archives be implemented in the cloud?

A: Yes. An active archive can combine onsite, offsite, and cloud environments. Most, if not all, cloud providers are offeringarchival data services including active archiving. Active archive brings the same benefits to public clouds as it does to on-premises solutions.

Q: What’s in store for active archives in the future?

A: As organizations fully embrace digital transformation, they are quickly learning the value of analyzing large amounts of previously dormant archival data, and that makes having quick and affordable access to that data so important. New tools and use cases such as artificial intelligence, machine learning, big data, IoT, and video surveillance for example will drive increased demand for active archives by organizations that need to effectively manage data from terabytes to exabytes across multiple storage tiers.

Finally, I would say that organizations archive their data because they either want to preserve the value of the content or because they have to, such as for compliance — but either way, the magnitude of archival storage requirements will be a major challenge. With the amount of archival data exploding with no end in sight, active archives will play a vital and necessary role in optimizing data storage to reduce costs, but also to ensure archived data is accessible and protected. That’s an attractive value proposition, so the future is bright for active archives.

Originally published in Storage Newsletter, January 14, 2019.

 

Read More

Active Archive Alliance Members Share Top Data Storage and Active Archive Predictions for 2019

Reading Time: 3 minutes

Explosive data growth continues to be a top challenge for today’s organizations and this growth is only going to increase in the future. In fact, according to analyst firm IDC, by 2025 worldwide data will grow 61% to 175 zettabytes, with as much of the data residing in the cloud as in data centers.

New technologies and approaches are continually being created to help address this data storage deluge. Members of the Active Archive Alliance from Fujifilm Recording Media, U.S.A., Inc, Spectra Logic, StrongBox Data and Quantum recently shared their insights into what the future looks like for active archives and data storage in 2019. Here are some of their top predictions:

Artificial Intelligence Creates Demand for Active Archives
The evolution of deep learning, machine learning and artificial intelligence will continue to expand in 2019 across every industry as the digital transformation wave produces an explosion of big data. With these AI tools, organizations will be able to extract more value from their data than ever before giving rise to an insatiable demand for more data, more analytics…more competitive advantage. A dramatic increase in storage and specifically active archive will be required to cost effectively and efficiently provide accessibility to big data at scale.

Flash Will Gain Wide-Scale Adoption, But a Need to Store Everything Will Make Secondary Storage More Important Than Ever
In the coming year we will see wide-scale adoption of flash storage. Organizations of all sizes will include solid-state drive (SSD) for greater performance, energy savings, space efficiency, and reduced management. New technologies like integrated data protection, storage federation/automation, policy-based provisioning, tiered data movement, and public cloud integration will be built on top of this flash foundation.

With the increased adoption of flash, organizations will also face the challenge of how to affordably store the data that is not mission critical, but still has value and therefore cannot be deleted. With the move to flash organizations will utilize a secondary storage tier to affordably manage all the organizations data, and this will happen through intelligent data management software designed to move data to a more affordable tier, without sacrificing access and searchability of the data.

Shift From Managing Your Storage to Managing Your Data
Data, not the underlying physical storage, is what matters. However, traditional storage systems are “big dumb buckets” that provide precious little insight into what data is growing, what applications or users are accessing it, or what is consuming storage performance and why.

Next-generation storage systems are “data-aware,” with real-time analytics built directly into the storage itself, delivering real-time information on data and performance at massive scale providing insight into data and storage.  As organizations better understand their data (how it is being generated, at what pace, by who, for what project) they are more informed as to how to plan and budget for the future growth of their data, and better understand how to move data to different tiers based on customized policies.

Cross-platform Storage Automation Reduces Costs, Increases Productivity
The reality is that there is not a “one-size-fits-all” storage solution that addresses the multiple requirements faced by most organizations.  The result is that large environments typically rely on multiple storage vendors and point solutions to address the different performance and cost profiles needed for their data. The problem is this adds complexity for IT managers, requiring them to do more with static or shrinking operational budgets. This trend is driving a demand in the industry for solutions that provide automation of data and storage resource management across any storage type from any vendor. Such solutions leverage policy engines and management tools that are driven by multiple types of metadata about the files and their business value as they evolve over time. Such automation tools help data managers know what they have, and gives them control of cross-platform data migration, tiering, active archiving, and protection, without interrupting users. This type of metadata-driven automation will be an increasing trend over the next few years, because it provides demonstrable ROI by reducing OPEX and complexity for IT, breaking storage vendor lock-in, while increasing storage utilization efficiency and user productivity.

Rich Media Content Will Grow Exponentially, Across Many Industries
Video now constitutes 50% of all data. Rich media comprises our video surveillance; consumer images, voice and video; medical imagery, IoT, entertainment and social media. Large and unstructured data is often 50 times or larger than the average corporate database. Video is unique, and it is not typically a good fit for traditional backup; it cannot be compressed or deduplicated, it doesn’t work well with replication, snaps or clones, and ingest speed is critical. Rich media is projected to surpass 100 Zetabytes worldwide by 2020. Expect enterprise data services to be increasingly optimized for large or rich media data sets, with infrastructure optimized for ingest processing and the full life cycle management of forms of rich media.

Read More

Leveraging Artificial Intelligence in an Active Archive

Reading Time: < 1 minute

In this Fujifilm Summit presentation, Molly Presley, founder of the Active Archive Alliance, explains how an active archive can provide visibility into your applications and machine generated data with actively assigned metadata no matter what tier it is stored on.

Watch the video to learn more:

 

Read More

Breaking Down Data Silos — Highlights From SC18

Reading Time: 3 minutes

By Kevin Benitez

I had the opportunity to attend SC18 last month in Dallas. Every year the Supercomputing Conference brings together the latest in supercomputing technology and the most brilliant minds in HPC. People from all over the world and different backgrounds converged this year for the 30thSupercomputing Conference.

As you can imagine, some of the demonstrations were absolutelymind-blowing and worth sharing. For starters, power consumption in data centers is becoming more of a challenge as data rates continue to surge. Fortunately, 3M was live on the trade show floor tackling this issue by demonstrating immersion cooling for data centerswhich has the potential to slash energy use and cost by up to 97%. As this technology continues to evolve,we could see huge gains in performance and in reducing environmental impacts.

The race to dominate quantum computing continues! IBM’s 50-Qubit quantum computer made an appearance at this year’s show. What does it mean to have a computer with 50 qubits working perfectly? (Side note, in quantum computing a qubitis the basic unit of quantum information). According to Robert Schoelkopf, a Yale professor, if you had 50 or 100 qubitsyou could “do unfathomable calculations that can’t be replicated on any classical machine, now or ever.” Although the quantum computer churns out enough computational power to rank within the top ten supercomputers in the world,the device can only compute for 100 milliseconds due to a short-lived power supply.

StrongBox Data’s flagship product, StrongLink, was demonstrated on the show floor as a way to store and contain the vast amount of data that research universities and laboratories are producing. StrongLinkis a software solution that simplifies and reduces the cost of managing multi-vendor storage environments. StrongLink provides multi-protocol access across any file system, object storage, tape and cloud in a global namespace. Users maintain a constant view of files regardless of where they arestored, which maximizes their storage environment for performance and cost.

Recently the University of Southampton’s Supercomputer Iridis 5 teamed up with StrongLink to get more value out of its data. Oz Parchment, Director of the University’s iSolutions IT support division, commented in March saying: “One wayStrongLink interested us was its cognitive component, the ability to look at and match up metadata at scale, which gets interesting when you combine that with different data infrastructures. Our set up currently includes large-scale tape stores, large-scale disc stores, some of that being active data, some of that being nearline data, some being effectively offline data. But then, by linking these into the [Iridis] framework, which StrongLink allows us to do, we can connect these various data lakes that we have across the research side of the organization, and begin to create an open data space for our community where people in one discipline can look through data and see what kinds of data are available in other communities.“

Never has HPC been more crucial. As we say here at Fujifilm “Never Stop Transforming Ourselves and the World.”

Read More

LET’S DISCUSS YOUR NEEDS

We can help you reduce cost, decrease vendor lock-in, and increase productivity of storage staff while ensuring accessibility and longevity of data.

Contact Us >