FUJIFILM INSIGHTS BLOG

Data Storage

There is New Value in Old Data Amid AI/ML Boom

Reading Time: 7 minutes

Executive Q & A with Chuck Sobey, Chief Scientist of ChannelScience

Q1: Welcome, Chuck, to this Fujifilm Insights Executive Q & A! Please tell us a bit about your role and responsibility as Chief Scientist at ChannelScience.

Ans: Thank you, Rich – I appreciate the opportunity to talk with you.

ChannelScience is the consulting firm I started in 1996 to provide R&D services for emerging memory and storage technologies. At that time, the Internet boom was just starting and the fear of Y2K was building. My initial focus was hard disk drive (HDD) technology. This grew from my prior experience as a designer of thin film magnetic recording heads for HDDs at Applied Magnetics, near Santa Barbara; and then as a read channel architect at Texas Instruments, in Dallas. Data storage technology was growing even faster than semiconductors at that time, so it was an exciting field.

To promote my consulting work, I wrote storage technology classes for KnowledgeTek and taught them at practically every company related to data storage over the next two decades. This enabled me to engage with large and small companies to help them develop a wide variety of novel storage innovations. These spanned from ever-smaller HDDs, to laser optical tape, to solid-state memory and storage.

My current responsibilities at ChannelScience include staying current with the state-of-the-art in storage and memory, signal processing, and error correction coding (ECC), and connecting with customers to help develop their new technologies and prepare them for the market. We are also early proponents of semiconductor chiplets and offer pathfinding and strategy consulting on this rapidly growing technology.

Q2: Can you tell us about the breakthrough work you are doing in recovery of old data from obsolete tape stock?

Ans: The “state-of-the-art” of recovering legacy data formats is to locate several vintage drives (often on eBay) and scavenge/refurbish them to make one working drive.  There are several challenges with this, in addition to finding and refurbishing the drives. A sufficient supply of vintage heads and rollers must be secured, because these wear out with use. Operators and technicians must be trained to work with a wide variety of drives that do not have support.

I often point out that if you refurbish a 1970s tape drive, when you have done a perfect job what you have is a 1970s tape drive. Unfortunately, the 50-year-old vintage tapes are no longer in original condition, so this performance level can be insufficient. That said, it is remarkable how well properly-cared-for tapes have held up. It is my belief that some of what we are learning about decaying magnetic patterns on tapes can be used to continue the improvement of modern tape and drive development.

Based on this state-of-the-art, we recognized that a modern, multi-format tape reader that could read vintage tapes better than the original equipment would address all of these issues. With my background in head design and read channel signal processing, the answer was clear to me: Use modern, sensitive magnetoresistive (MR) heads and pair them with the latest signal processing algorithms for data detection. Furthermore, with such sensitive heads, we believed we could have minimal contact between the head and tape and still get sufficient signal fidelity for improved detection. Minimal contact means we are gentler with delicate tapes, and the system may need less-frequent head cleaning.

Furthermore, ChannelScience had already developed methods for extreme recoveries for HDDs, DVDs, and solid-state drives (SSDs) and flash (see links below).

[http://www.channelscience.com/files/Drive-Independent_Data_Recovery.pdf

http://www.channelscience.com/files/Drive%20Independent%20Data%20Recovery%20Sobey%20Orto%20Sakaguchi%20TMRC%202005%20D5%20PREPRINT.pdf ]

Q3:  What was the genesis of your multi-format “Do-No-Harm” legacy tape reader?

Ans: During the pandemic, a Department of Energy (DOE) Funding Opportunity Announcement (FOA, DE-FOA-0002360, issued December 14, 2020) was published that was seeking proposals for “Digitizing and Analyzing Legacy Seismo-Acoustic Data.” It was from DOE’s Office of Defense Nuclear Nonproliferation Research and Development.

The Comprehensive Nuclear Test Ban Treaty was signed in 1996, and the last tests in the US were conducted in 1992. A wealth of seismic data was recorded for each test. This information went to two places: Paper graphs and 9-track tape. These test results now represent irreproducible scientific data. Other types of irreproducible data are from scientific instruments that no longer exist, such as specific particle accelerators, telescopes, or seismic exploration of no-longer-accessible locations.

These data sets have new value now – more than they did decades ago – for a simple reason: AI/ML. It is now possible to examine the entire corpus of data for a range of experiments and train and refine new machine learning (ML) models to do new science and make better predictions and classifications. For example, the ability to distinguish a rogue nuclear detonation from an earthquake or a mine excavating explosion can be vastly improved.

We are grateful for the support of the Department of Energy for our tape reader project. They awarded us three SBIR (Small Business Innovation Research) grants to develop our breakthrough technologies. We received a Phase I award (DE-SC0021850) to apply machine learning to waveforms from damaged tapes. And we received Phase I and Phase II awards (DE-SC0021879) to develop the prototype of our multi-format legacy tape reader. DOE also provided excellent business training through their Energy I-corps and Phase Shift programs.

We are now seeking first customers to fund the productization of our prototype. What we currently have is a scientific instrument that is operated by Ph.D. scientists. Our next step is to turn this into a robust product that can be shipped and used by adequately trained operators and technicians. If we are successful with our product, we can “make obsolete media obsolete!”

 [At right is the Current ChannelScience Multi-format Legacy Tape Reader prototype, shown with 1” analog instrumentation tape mounted.]

Q4: Beyond the value for AI/ML, what other applications are there and what type of organizations might be interested in this unique capability?

Ans: The ability to train and refine AI/ML models on rare data sets is certainly the driver for this funding. There are many types of irreproducible experiments that organizations want to use data from. In addition to nuclear weapons tests, these include particle accelerator data, telemetry from space missions, medical records, demographics, business records, and many others.

Another area that I believe may have even more potential, is audio and video tape. Although there are many vintage units still available, they are getting scarcer, and key components are wearing out. As always, the data is deteriorating, so better signal fidelity and signal processing than the vintage equipment can provide are needed. The image below shows the resolution we are able to get out of our prototype system. We can resolve individual transitions and the inter-track gaps.

[Magnetic force microscope-like image of a 9-track ½” digital data tape, created from ChannelScience’s prototype multi-format tape reader.]

Surprisingly, international diplomacy is another area where we’ve discovered unique opportunities. For example, there is a wealth of under-utilized data in former Eastern Bloc countries. It is stored in non-Western formats and there has been much less focus on recovering these rare data sets. With targeted development, we are confident that our tape reader can recover any such legacy data. Providing technology to access a country’s valuable legacy data is a diplomatic approach the US Department of State has used in the past.

Another unique opportunity, “sovereign AI,” was described by Jensen Huang (CEO of NVIDIA) in a recent interview. He envisions every country training its own large language model (LLM, like ChatGPT), based on their language, laws, customs, and unique history. This will need as much of each country’s legacy data as possible for training.  

Q5: Where can readers get more information about this innovative solution?

Ans: A direct link to an overview slide deck is here.  We will be adding more information to our website over time. A YouTube video of my recent talk at the Vintage Computer Festival Southwest was just posted.

In addition, I share new information on LinkedIn. For example, I have posted some behind-the-scenes photos of my recent visits to George Blood, the Library of Congress, and the Smithsonian. I invite your readers to connect with me at https://linkedin.com/in/ChuckSobey  

Q6: You are also deeply involved in one of the largest IT Trade Shows out there, Flash Memory Summit, now known as “FMS: the Future of Memory and Storage.” What can you tell us about FMS and how FMS is evolving?

Ans: 2024 is the 18th year of FMS. I have been the General Chair since 2017 and an organizer and advisor for several years before that. Registration is open now for this August 6-8, 2024, event at the Santa Clara Convention Center (SCCC).

I’d like to thank you, Rich, for your help this year, and last, in putting together our cold data and archive sessions. People like you are helping us expand our scope beyond flash (hence, the name-change to simply “FMS”). Our coverage now includes DRAM, HDD, tape, and many other emerging nonvolatile memory technologies – from MRAM to DNA – as well as the applications, such as AI, that continue to drive their adoption. We believe FMS is a special show, where old friends and new meet to reconnect and move the industry forward. It is the best networking opportunity in the industry.

Coming out of the pandemic, I co-founded another growing IT event, Chiplet Summit. We will hold our 3rd annual event at SCCC on January 21-23, 2025. It is exciting to help this hardware development method expand and grow into a new ecosystem for the rapid development and deployment of leading-edge semiconductor process technologies.

I invite your readers to attend both of these events!

Q7: Finally, when you are not slaving away for Channel Science or FMS or Chiplet Summit, what do you enjoy doing in your free time?

Ans: You are right that there is not much free time! However, when both time and Texas weather permit, I try to go mountain biking. That is harder than in sounds in Plano, which in Spanish means flat! I also love playing with and training our two wonderful German Shepherd Dogs.

[Ina and Lola preparing for another game of tag.]

Thanks for your time, Chuck, and we wish you a lot of success with your legacy tape reader, FMS, and Chiplet Summit!

Read More

Celebrating 70 Years of Data Storage With Tape Technology

Reading Time: 2 minutesBy Guest Blogger, Dr. Shawn O. Brume Sc. D., IBM Tape Evangelist and Strategist

According to a study by McKinsey, the average lifespan of companies listed in Standard & Poor’s is less than 18 years! That means that tape technology is already in business almost 4 times longer than the average S&P will survive.  Tape technology celebrated 70 years young on May 21st.  Tape has been and continues to be the most transforming data storage technology in history.

In the 50’s it was the only viable technology for storing data generated by the few computers in existence. In the 60’s tape took the world to the moon and preserved the data for usage nearly 40 years later when it was retrieved to assist in modern space explorations. By the 70’s Tape was dominating storage, transforming the financial industry by providing the ability to access data on accounts with minimal human intervention. The 80’s and 90’s continued the transformation of data availability by performing transactional data storage for ATMs, but also was key in the investigation of the space shuttle Challenger disaster; an investigation enhanced as a result of the durability of tape even when submerged in saltwater.

Today tape lives in the data center, preserving Zettabytes of data. Data being preserved and utilized across nearly every industry, examples:

Healthcare –  Data preserved on tape is being utilized to develop new predictive health services. Digital medical records can be retained for the life of patients and shared across organizations.

Financial – Online transaction retention ensures customers valuable financial data is protected in the eventuality of a cyber-attack. Mortgage loans are preserved without fear of tampering.

Cloud – Data stored in public clouds are growing at a 30% faster rate than traditional storage. Cloud providers rely on tape to provide data durability and low-cost storage subscriptions.

Tape’s popularity has often been driven by the low cost of storage, modern data storage requires so much more including cyber-resiliency, data durability and low carbon footprints that enable sustainable IT.

Cyber Resiliency – Tape is the only true airgap data storage solution available.
Data Durability – Tape has a native single copy durability of 11- Nines. This means the likelihood of a single bit failure is 1 in 100 Petabytes.

Sustainability – At scale tape technology is 96% lower carbon footprint than highly dense HDD storage (when comparing OCP Bryce canyon and IBM tape technology with 27PB of data).

If preserving data, in a cyber-resilient solution, at low cost, with relatively low carbon impact meets your business outcomes, then why wait? Clearly tape is here to stay and surging in usage across nearly every business use case.

Happy 70-years to an amazing technology!

For more information about technology since tape’s introduction, check out this post from my colleague Mike Doran.

For more information on current tape products see the IBM product page.

 

Read More

2019 State of Active Archive Report Outlines Modern Strategies for Data Management

Reading Time: < 1 minuteArchival data is piling up faster than ever as organizations are quickly learning the value of analyzing vast amounts of previously untapped digital data. Industry studies consistently find that the vast majority of all digital data is rarely, if ever, accessed again after it is stored. However, this is changing now with the emergence of big data analytics made possible by Machine Learning (ML) and Artificial Intelligence (AI) tools that bring data back to life and tap its enormous value for improved efficiency and competitive advantage.

The need to securely store, search for, retrieve and analyze massive volumes of archival content is fueling new and more effective advancements in archive solutions. These trends are further compounded as an increasing number of businesses are approaching hyperscale levels with significant archival capacity requirements.

An active archive resolves complexity by leveraging the benefits of an intelligent data management layer, and an increasing number of effective software products that address this are now available. Access and management of data is getting more complex and is requiring modern strategies with intelligent data management techniques. These strategies and techniques are now being enhanced by AI to further improve and automate data management.

Download the Active Archive Alliance 2019 State of Active Archive Report to learn more about the state of the archive market and the expanding role the active archive plays.

Read More

LET’S DISCUSS YOUR NEEDS

We can help you reduce cost, decrease vendor lock-in, and increase productivity of storage staff while ensuring accessibility and longevity of data.

Contact Us >