There is New Value in Old Data Amid AI/ML Boom

7 minutes
7 minutes
Reading Time: 7 minutes

Executive Q & A with Chuck Sobey, Chief Scientist of ChannelScience

Q1: Welcome, Chuck, to this Fujifilm Insights Executive Q & A! Please tell us a bit about your role and responsibility as Chief Scientist at ChannelScience.

Ans: Thank you, Rich – I appreciate the opportunity to talk with you.

ChannelScience is the consulting firm I started in 1996 to provide R&D services for emerging memory and storage technologies. At that time, the Internet boom was just starting and the fear of Y2K was building. My initial focus was hard disk drive (HDD) technology. This grew from my prior experience as a designer of thin film magnetic recording heads for HDDs at Applied Magnetics, near Santa Barbara; and then as a read channel architect at Texas Instruments, in Dallas. Data storage technology was growing even faster than semiconductors at that time, so it was an exciting field.

To promote my consulting work, I wrote storage technology classes for KnowledgeTek and taught them at practically every company related to data storage over the next two decades. This enabled me to engage with large and small companies to help them develop a wide variety of novel storage innovations. These spanned from ever-smaller HDDs, to laser optical tape, to solid-state memory and storage.

My current responsibilities at ChannelScience include staying current with the state-of-the-art in storage and memory, signal processing, and error correction coding (ECC), and connecting with customers to help develop their new technologies and prepare them for the market. We are also early proponents of semiconductor chiplets and offer pathfinding and strategy consulting on this rapidly growing technology.

Q2: Can you tell us about the breakthrough work you are doing in recovery of old data from obsolete tape stock?

Ans: The “state-of-the-art” of recovering legacy data formats is to locate several vintage drives (often on eBay) and scavenge/refurbish them to make one working drive.  There are several challenges with this, in addition to finding and refurbishing the drives. A sufficient supply of vintage heads and rollers must be secured, because these wear out with use. Operators and technicians must be trained to work with a wide variety of drives that do not have support.

I often point out that if you refurbish a 1970s tape drive, when you have done a perfect job what you have is a 1970s tape drive. Unfortunately, the 50-year-old vintage tapes are no longer in original condition, so this performance level can be insufficient. That said, it is remarkable how well properly-cared-for tapes have held up. It is my belief that some of what we are learning about decaying magnetic patterns on tapes can be used to continue the improvement of modern tape and drive development.

Based on this state-of-the-art, we recognized that a modern, multi-format tape reader that could read vintage tapes better than the original equipment would address all of these issues. With my background in head design and read channel signal processing, the answer was clear to me: Use modern, sensitive magnetoresistive (MR) heads and pair them with the latest signal processing algorithms for data detection. Furthermore, with such sensitive heads, we believed we could have minimal contact between the head and tape and still get sufficient signal fidelity for improved detection. Minimal contact means we are gentler with delicate tapes, and the system may need less-frequent head cleaning.

Furthermore, ChannelScience had already developed methods for extreme recoveries for HDDs, DVDs, and solid-state drives (SSDs) and flash (see links below).

[http://www.channelscience.com/files/Drive-Independent_Data_Recovery.pdf

http://www.channelscience.com/files/Drive%20Independent%20Data%20Recovery%20Sobey%20Orto%20Sakaguchi%20TMRC%202005%20D5%20PREPRINT.pdf ]

Q3:  What was the genesis of your multi-format “Do-No-Harm” legacy tape reader?

Ans: During the pandemic, a Department of Energy (DOE) Funding Opportunity Announcement (FOA, DE-FOA-0002360, issued December 14, 2020) was published that was seeking proposals for “Digitizing and Analyzing Legacy Seismo-Acoustic Data.” It was from DOE’s Office of Defense Nuclear Nonproliferation Research and Development.

The Comprehensive Nuclear Test Ban Treaty was signed in 1996, and the last tests in the US were conducted in 1992. A wealth of seismic data was recorded for each test. This information went to two places: Paper graphs and 9-track tape. These test results now represent irreproducible scientific data. Other types of irreproducible data are from scientific instruments that no longer exist, such as specific particle accelerators, telescopes, or seismic exploration of no-longer-accessible locations.

These data sets have new value now – more than they did decades ago – for a simple reason: AI/ML. It is now possible to examine the entire corpus of data for a range of experiments and train and refine new machine learning (ML) models to do new science and make better predictions and classifications. For example, the ability to distinguish a rogue nuclear detonation from an earthquake or a mine excavating explosion can be vastly improved.

We are grateful for the support of the Department of Energy for our tape reader project. They awarded us three SBIR (Small Business Innovation Research) grants to develop our breakthrough technologies. We received a Phase I award (DE-SC0021850) to apply machine learning to waveforms from damaged tapes. And we received Phase I and Phase II awards (DE-SC0021879) to develop the prototype of our multi-format legacy tape reader. DOE also provided excellent business training through their Energy I-corps and Phase Shift programs.

We are now seeking first customers to fund the productization of our prototype. What we currently have is a scientific instrument that is operated by Ph.D. scientists. Our next step is to turn this into a robust product that can be shipped and used by adequately trained operators and technicians. If we are successful with our product, we can “make obsolete media obsolete!”

 [At right is the Current ChannelScience Multi-format Legacy Tape Reader prototype, shown with 1” analog instrumentation tape mounted.]

Q4: Beyond the value for AI/ML, what other applications are there and what type of organizations might be interested in this unique capability?

Ans: The ability to train and refine AI/ML models on rare data sets is certainly the driver for this funding. There are many types of irreproducible experiments that organizations want to use data from. In addition to nuclear weapons tests, these include particle accelerator data, telemetry from space missions, medical records, demographics, business records, and many others.

Another area that I believe may have even more potential, is audio and video tape. Although there are many vintage units still available, they are getting scarcer, and key components are wearing out. As always, the data is deteriorating, so better signal fidelity and signal processing than the vintage equipment can provide are needed. The image below shows the resolution we are able to get out of our prototype system. We can resolve individual transitions and the inter-track gaps.

[Magnetic force microscope-like image of a 9-track ½” digital data tape, created from ChannelScience’s prototype multi-format tape reader.]

Surprisingly, international diplomacy is another area where we’ve discovered unique opportunities. For example, there is a wealth of under-utilized data in former Eastern Bloc countries. It is stored in non-Western formats and there has been much less focus on recovering these rare data sets. With targeted development, we are confident that our tape reader can recover any such legacy data. Providing technology to access a country’s valuable legacy data is a diplomatic approach the US Department of State has used in the past.

Another unique opportunity, “sovereign AI,” was described by Jensen Huang (CEO of NVIDIA) in a recent interview. He envisions every country training its own large language model (LLM, like ChatGPT), based on their language, laws, customs, and unique history. This will need as much of each country’s legacy data as possible for training.  

Q5: Where can readers get more information about this innovative solution?

Ans: A direct link to an overview slide deck is here.  We will be adding more information to our website over time. A YouTube video of my recent talk at the Vintage Computer Festival Southwest was just posted.

In addition, I share new information on LinkedIn. For example, I have posted some behind-the-scenes photos of my recent visits to George Blood, the Library of Congress, and the Smithsonian. I invite your readers to connect with me at https://linkedin.com/in/ChuckSobey  

Q6: You are also deeply involved in one of the largest IT Trade Shows out there, Flash Memory Summit, now known as “FMS: the Future of Memory and Storage.” What can you tell us about FMS and how FMS is evolving?

Ans: 2024 is the 18th year of FMS. I have been the General Chair since 2017 and an organizer and advisor for several years before that. Registration is open now for this August 6-8, 2024, event at the Santa Clara Convention Center (SCCC).

I’d like to thank you, Rich, for your help this year, and last, in putting together our cold data and archive sessions. People like you are helping us expand our scope beyond flash (hence, the name-change to simply “FMS”). Our coverage now includes DRAM, HDD, tape, and many other emerging nonvolatile memory technologies – from MRAM to DNA – as well as the applications, such as AI, that continue to drive their adoption. We believe FMS is a special show, where old friends and new meet to reconnect and move the industry forward. It is the best networking opportunity in the industry.

Coming out of the pandemic, I co-founded another growing IT event, Chiplet Summit. We will hold our 3rd annual event at SCCC on January 21-23, 2025. It is exciting to help this hardware development method expand and grow into a new ecosystem for the rapid development and deployment of leading-edge semiconductor process technologies.

I invite your readers to attend both of these events!

Q7: Finally, when you are not slaving away for Channel Science or FMS or Chiplet Summit, what do you enjoy doing in your free time?

Ans: You are right that there is not much free time! However, when both time and Texas weather permit, I try to go mountain biking. That is harder than in sounds in Plano, which in Spanish means flat! I also love playing with and training our two wonderful German Shepherd Dogs.

[Ina and Lola preparing for another game of tag.]

Thanks for your time, Chuck, and we wish you a lot of success with your legacy tape reader, FMS, and Chiplet Summit!

Rich Gadomski

Head of Tape Evangelism

As Head of Tape Evangelism for FUJIFILM North America Corp., Data Storage Solutions, Rich is responsible for driving industry awareness and end user understanding of the purpose and value proposition of modern tape technology. Rich joined Fujifilm in 2003 as Director of Product Management, Computer Products Division, where he oversaw marketing of optical, magnetic, and flash storage products. Previously Rich held the position of Vice President of Marketing, Commercial Products, where he was responsible for the marketing of data storage products, value added services and solutions. Rich has more than 30 years of experience in the data storage industry. Before joining Fujifilm, Rich was Director of Marketing for Maxell Corp. of America where he was responsible for the marketing of data storage products. Prior to that, Rich worked for the Recording Media Products Division of Sony Electronics. Rich participates in several industry trade associations including the Active Archive Alliance, the Linear Tape-Open Consortium (LTO) and the Tape Storage Council. Rich also manages Fujifilm’s annual Global IT Executive Summit. Rich holds a BA from the University of Richmond and an MBA from Fordham University. FUJIFILM is the leading manufacturer of commercial data tape products for enterprise and midrange backup and archival applications.