Archival data may not require high performance computing, but it does need to stay accessible for productivity, legal, business value, analytics, and compliance. IT must secure that data against ransomware and other types of cyberattacks. And IT needs to do all this at reasonable costs, even given extraordinary projected data growth.
Active archiving technology is constantly evolving as marketplace demand increases. We recently released our 2022 Report: The Active Archiving Ecosystem: Building a Flexible Archival Repository Your Way, highlighting the increased demand for new data management strategies and the benefits and innovations of active archive solutions.
Top innovations within active archiving include Artificial Intelligence and Machine Learning (AI/ML), sustainability, analytics, and compliance.
Part 2: CHARACTERISTICS OF THE HYPERSCALE DATA CENTER
In Part 1 of this series, we looked explored the definition of hyperscale data centers. Now, we’ll take a look at some of the key characteristics.
HSDCs don’t publicly share an abundance of information about their infrastructure. For companies who will operate HSDCs, the cost may be the major barrier to entry, but ultimately it isn’t the biggest issue – automation is. HSDCs must focus heavily on automating and self-healing environments by using AI and ML whenever possible to overcome inevitable and unexpected failures and delays. Unlike many enterprise data centers, which rely on a large full-time staff across a range of disciplines, HSDCs employ fewer tech experts because they have used technology to automate so much of the overall management process. HSDC characteristics include:
Small footprint, dense racks–HSDCs squeeze servers, SSDs (Solid State Disks) and HDDs (Hard Disk Drives) directly into the rack itself, as opposed to separate SANs or DAS to achieve the smallest possible footprint (heavy use of racks). HSDC racks are typically larger than standard 19” racks.
Automation–Hyperscale storage tends to be software- defined and is benefitting from AI delivering a higher degree of automation and self-healing minimizing direct human involvement. AI will support automated data migration between tiers to further optimize storage assets.
Users–The HSDC typically serves millions of users with only a few applications, whereas in a conventional enterprise there are fewer users but many more applications.
Virtualization–The facilities also implement very high degrees of virtualization, with as many operating system images running on each physical server as possible.
Tape storage adoption–Automated tape libraries are on the rise to complement SSDs and HDDs to easily scale capacity, manage and contain out of control data growth, store archival and unstructured data, significantly lower infrastructure and energy costs, and provide hacker-proof cybercrime security via the tape air gap.
Fast scaling bulk storage–HSDCs require fast, easy scaling storage capacity. One petabyte using 15 TB disk drives requires 67 drives and one exabyte requires 66,700 15 TB drives. Tape easily scales capacity by adding media, disk scales by adding drives.
Minimal feature set–Hyperscale storage has a minimal, stripped-down feature set and may even lack redundancy as the goal is to maximize storage space and minimize cost.
Energy challenges–High power consumption and increasing carbon emissions has forced HSDCs to develop new energy sources to reduce and more effectively manage energy expenses.
In Part 3 of this series, we’ll take a look at the how the value of tape is rapidly rising as hyperscale data centers grow. For more information on this topic, download our white paper: The Ascent to Hyperscale.
Hyperscale data centers have spread across the globe to meet unprecedented data storage requirements. In this three-part blog series, we take a look at how the industry is preparing for the next wave of hyperscale storage challenges.
The term “hyper” means extreme or excess. While there isn’t a single, comprehensive definition for HSDCs, they are significantly larger facilities than a typical enterprise data center. The Synergy Research Group Report indicated there were 390 hyperscale data centersworldwideattheendof2017. An overwhelming majority of those facilities, 44%are in the US with China being a distant second with 8%. Currently the world’s largest data center facility has 1.1 million square feet. To put this into perspective the standard size for a professional soccer field is 60,000 square feet, the equivalent to about 18.3 soccer fields. Imagine needing binoculars to look out over an endless array of computer equipment in a single facility. Imagine paying the energy bill!
Hyperscale refers to a computer architecture that massively scales compute power, memory, a high-speed networking infrastructure, and storage resources typically serving millions of users with relatively few applications. While most enterprises can rely on out-of- the-box infrastructures from vendors, hyperscale companies must personalize nearly every aspect of their environment. A HSDC architecture is typically made up of tens of thousands of small, inexpensive, commodity component servers or nodes, providing massive compute, storage and networking capabilities. HSDCs are implementing Artificial Intelligence (AI), and Machine Learning (ML) to help manage the load and are exploiting the storage hierarchy including heavy tape usage for backup, archive, active archive and disaster recovery applications.
In Part 2 of this series, we’ll take a look at the characteristics of the hyperscale data center. For more information on this topic, download our white paper: The Ascent to Hyperscale.
Ransomware continues to threaten the security of enterprise IT infrastructures. In this Fujifilm Summit video, storage analyst George Crump talks to IBM’s Chris Bontempo about how artificial intelligence and machine learning are helping improve cybersecurity by identifying and stopping potential threats.
Explosive data growth continues to be a top challenge for today’s organizations and this growth is only going to increase in the future. In fact, according to analyst firm IDC, by 2025 worldwide data will grow 61% to 175 zettabytes, with as much of the data residing in the cloud as in data centers.
New technologies and approaches are continually being created to help address this data storage deluge. Members of the Active Archive Alliance from Fujifilm Recording Media, U.S.A., Inc, Spectra Logic, StrongBox Data and Quantum recently shared their insights into what the future looks like for active archives and data storage in 2019. Here are some of their top predictions:
Artificial Intelligence Creates Demand for Active Archives The evolution of deep learning, machine learning and artificial intelligence will continue to expand in 2019 across every industry as the digital transformation wave produces an explosion of big data. With these AI tools, organizations will be able to extract more value from their data than ever before giving rise to an insatiable demand for more data, more analytics…more competitive advantage. A dramatic increase in storage and specifically active archive will be required to cost effectively and efficiently provide accessibility to big data at scale.
Flash Will Gain Wide-Scale Adoption, But a Need to Store Everything Will Make Secondary Storage More Important Than Ever In the coming year we will see wide-scale adoption of flash storage. Organizations of all sizes will include solid-state drive (SSD) for greater performance, energy savings, space efficiency, and reduced management. New technologies like integrated data protection, storage federation/automation, policy-based provisioning, tiered data movement, and public cloud integration will be built on top of this flash foundation.
With the increased adoption of flash, organizations will also face the challenge of how to affordably store the data that is not mission critical, but still has value and therefore cannot be deleted. With the move to flash organizations will utilize a secondary storage tier to affordably manage all the organizations data, and this will happen through intelligent data management software designed to move data to a more affordable tier, without sacrificing access and searchability of the data.
Shift From Managing Your Storage to Managing Your Data
Data, not the underlying physical storage, is what matters. However, traditional storage systems are “big dumb buckets” that provide precious little insight into what data is growing, what applications or users are accessing it, or what is consuming storage performance and why.
Next-generation storage systems are “data-aware,” with real-time analytics built directly into the storage itself, delivering real-time information on data and performance at massive scale providing insight into data and storage. As organizations better understand their data (how it is being generated, at what pace, by who, for what project) they are more informed as to how to plan and budget for the future growth of their data, and better understand how to move data to different tiers based on customized policies.
Cross-platform Storage Automation Reduces Costs, Increases Productivity The reality is that there is not a “one-size-fits-all” storage solution that addresses the multiple requirements faced by most organizations. The result is that large environments typically rely on multiple storage vendors and point solutions to address the different performance and cost profiles needed for their data. The problem is this adds complexity for IT managers, requiring them to do more with static or shrinking operational budgets. This trend is driving a demand in the industry for solutions that provide automation of data and storage resource management across any storage type from any vendor. Such solutions leverage policy engines and management tools that are driven by multiple types of metadata about the files and their business value as they evolve over time. Such automation tools help data managers know what they have, and gives them control of cross-platform data migration, tiering, active archiving, and protection, without interrupting users. This type of metadata-driven automation will be an increasing trend over the next few years, because it provides demonstrable ROI by reducing OPEX and complexity for IT, breaking storage vendor lock-in, while increasing storage utilization efficiency and user productivity.
Rich Media Content Will Grow Exponentially, Across Many Industries Video now constitutes 50% of all data. Rich media comprises our video surveillance; consumer images, voice and video; medical imagery, IoT, entertainment and social media. Large and unstructured data is often 50 times or larger than the average corporate database. Video is unique, and it is not typically a good fit for traditional backup; it cannot be compressed or deduplicated, it doesn’t work well with replication, snaps or clones, and ingest speed is critical. Rich media is projected to surpass 100 Zetabytes worldwide by 2020. Expect enterprise data services to be increasingly optimized for large or rich media data sets, with infrastructure optimized for ingest processing and the full life cycle management of forms of rich media.
Usage of Cookies