free counter

Unstructured data storage on-prem vs cloud vs hybrid

We look at storage for unstructured data on-premise, in the cloud and across multiple locations. You can find benefits to a hybrid approach, but there may be hidden costs, too

Stephen Pritchard


Published: 29 Jul 2022

Businesses face the necessity to store ever-larger volumes of information, across an increasing number of formats.

Business data is not any longer confined to structured data in orderly databases or enterprise applications. Instead, businesses might need to capture, store and use documents, emails, images, videos, audio recordings and even social media marketing posts. All contain information which has the potential to boost decision-making.

But this presents challenges for this systems which were made with structured instead of unstructured data at heart.

That’s because technologies that efficiently store databases, for instance, are not suitable to the bigger file sizes, data volumes and long-term archival needs of unstructured data.

Industry analysts IDC and Gartner estimate that about 80% of new enterprise data is currently unstructured. Clearly, there exists a business benefit in having the ability to keep and analyse that data, and perhaps long-term storage is mandated for compliance reasons.

But traditional storage technologies weren’t created for either the quantity or selection of such data.

As Cesar Cid de Rivera, international VP of systems engineering at supplier Commvault, highlights, differing file sizes alone say a video file pitched against a text document present issues for storage. And enterprises face coping with what he describes as dark pools of data, generated or moved automatically from the central system to an end-users device, for instance.

Also, data is generated in other systems outside conventional IT, such as for example software-as-service (SaaS) applications, internet of things (IoT) endpoints, as well as potentially from machine learning and artificial intelligence (AI). This data must also be found, indexed and stored.

This puts pressure on storage infrastructure. And enterprises are increasingly discovering that a single method of storage all on-premise or all-cloud does not deliver the price, flexibility and performance they want. This is resulting in growing fascination with hybrid solutions as well as technologies, such as for example Snowflake, that can be storage agnostic.

The criteria to take into account will be the volume, the info gravity where it really is being generated, where it really is used, computed or consumed security, bandwidth, regulations, latency, cost, change rate, transfer required and cost, says Olivier Fraimbault, a board director at SNIA EMEA.

The primary issue I see isn’t so much storing massive levels of unstructured data, but how exactly to cope with the info management, as opposed to the storage management of it.

Nonetheless, firms have to consider conventional storage performance metrics, especially I/O and latency, and also price, resilience and security for every possible technology.

Managing unstructured data on-site

The traditional method of storing unstructured data on-site has experienced a hierarchical file system, delivered either through direct-attached storage in a server, or through dedicated network-attached storage (NAS).

Enterprises have taken care of immediately growing storage demands by moving to larger, scale-out NAS systems. The on-premise market here’s well served, with suppliers Dell EMC, NetApp, Hitachi, HPE and IBM all offering large-capacity NAS technology with different combinations of cost and performance.

Generally, applications that want low latency media streaming or, recently, training AI systems are well served by flash-based NAS hardware from the original suppliers.

But also for large datasets, and the necessity to ease movement between on-premise and cloud systems, suppliers are actually offering local versions of object storage.

The large cloud superscalers even offer on-premise, object-based technology in order that firms may take benefit of objects global namespace and data protection features, with the security and performance great things about local storage. However, as SNIA warns, these systems typically lack interoperability between suppliers.

The primary great things about on-premise storage for unstructured data are performance, security, plus compliance and control firms know their storage architecture, and may manage it in a granular way.

The disadvantages are costs, including upfront costs, too little capability to scale even scale-out NAS systems hit performance bottlenecks at large volumes and too little redundancy and, possibly, resilience.

Moving to the cloud?

It has led firms to check out cloud storage, for reasons of lower initial costs and its own capability to scale.

For object storage and virtually all cloud storage is object-based addititionally there is the opportunity to handle large volumes of unstructured data efficiently. A worldwide namespace and just how metadata and data are separate improves resilience.

Also, performance is moving nearer to that of local storage. Actually, cloud object storage is currently good enough for most business applications where I/O and especially latency are less critical.

Cloud storage cuts the (up-front) cost of hardware and permits potentially unlimited long-term storage. Nor do firms have to build redundant systems for data protection. This could be done within the cloud providers services or, with the proper architecture, by splitting data across multiple suppliers clouds.

Because data has already been in the cloud, it really is relatively straightforward to relink it to new systems, such as for example in a tragedy recovery scenario, or even to hook up to new client applications via application programming interfaces (APIs). With Amazons S3 the de facto object storage technology, business applications are easier than ever before for connecting to cloud data stores.

Sufficient reason for data in the cloud, users should see little if any practical performance hits because they maneuver around their organisation or work remotely.

Disadvantages of cloud storage include lower performance than on-premise storage, specifically for I/O-heavy or latency-intolerant applications, potential management difficulties (anyone can spin up cloud storage) and potential hidden costs.

Despite the fact that the cloud is frequently viewed as a method to cut costs, hidden costs such as for example data egress charges can easily erode cost benefits. And, as SNIA EMEAs Fraimbault cautions, though it is now simple enough to go containers between clouds, this becomes harder if they have their very own data attached.

Hybrid options

Consequently, an increasing number of suppliers now offer hybrid technologies that may combine the benefits of local, on-premise storage with object technology and the scalability of cloud resources.

This try to create the very best of both worlds is suitable to unstructured data due to the diverse nature, varied file sizes, and just how it may be accessed by multiple applications.

Something that may handle relatively small text files, such as for example emails, alongside large imaging files, and make sure they are open to business intelligence, AI systems and human users with equal efficiency is quite attractive to CIOs and data management professionals.

Also, organisations also desire to future-proof their storage technologies to aid developments such as for example containers. SNIAs Fraimbault sees just how hybrid cloud is moving to containers, instead of virtual machines, as an integral driver for storing unstructured data in object storage systems.

Hybrid cloud supplies the potential to optimise storage systems in accordance with their workloads, retaining scale-out NAS, in addition to direct-attached and SAN storage, where in fact the application and performance needs it.

But lower-performance applications can access data in the cloud, and data can proceed to the cloud for long-term storage and archiving. Eventually, data could move seamlessly to and from the cloud, and between cloud providers, without either the application form or the end-user noticing.

That is already happening through data storage technologies such as for example Snowflake, making usage of local and cloud storage and this past year upgraded its product to aid unstructured data.

Meanwhile, other suppliers, such as for example Microsoft, are increasing their support for hybrid storage through its Azure Data Factory data integration service.

On top of that worlds?

However, the thought of truly location-neutral storage still has a way to go, not least because cloud business models depend on data transfer charges. This, the Enterprise Storage Forum warns, can result in bloated costs.

Indeed, a recently available survey by supplier Aptum discovered that almost 1 / 2 of organisations be prepared to increase their usage of conventional cloud storage. Up to now, there is absolutely no one-size-fits-all technology for unstructured data.

Read more onto it architecture

Read More

Related Articles

Leave a Reply

Your email address will not be published.

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker