vipulvajpayee blog: CAS (content Addressed Storage)

CAS

What is CAS: it is called Content Addressed Storage.

As we know that data when ages means become old they become fixed, rarely any changes happen on them but they are used by users and applications. These data are called content data or fixed content.

Initially these types of data were stored on tapes; some companies keep them on production storage for so many years. I have seen some companies that they are keeping their 10-11 years old data on their production environment, they told sometimes these data are required and then lots of manual works is required to take these data from the tape or any other storage media. So they keep those data on their production environment. So if we see they are spending lots of money for their data which are rarely used.

Now here comes the CAS (content Addressed Storage) into the picture.

CAS (Content Addressed Storage)

CAS is an object-based system that has been purposely built for storing fixed content data. It is designed for secure online storage and retrieval of fixed content. Unlike file ‑level and block ‑level data access that use file names and the physical location of data for storage and retrieval, CAS stores user data and its attributes as separate objects. The stored object is assigned a globally unique address known as a content address (CA). This address is derived from the object’s binary representation. CAS provides an optimized and centrally managed storage solution that can support single-instance storage (SiS) to eliminate multiple copies of the same data.

Types of Data

Now let us understand what types of data are called fixed content data. As we know that lots of data is created day by day by companies, some data which required frequent changes like online data. Some data that typically changes but allowed to change when require for ex: bill of materials and designed data. And other type of data is fixed content which are not allowed to change like x-ray data, and other types of data which are kept same as it is for some specific period of time due to government regulations and legal obligations like emails, web pages and digital media.

Features and Benefits of CAS

CAS has emerged as an alternative to tape and optical solutions because it over comes many of their obvious deficiencies. CAS also meets the demand to improve data accessibility and to properly protect, dispose of, and ensure service‑level agreements for archived data. The features and benefits of CAS include the following:

Content authenticity: It assures the genuineness of stored content. This is achieved by generating a unique content address and automating the process of continuously checking and recalculating the content address for stored objects. Content authenticity is assured because the address assigned to each piece of fixed content is as unique as a fingerprint. Every time an object is read; CAS uses a hashing algorithm to recalculate the object’s content address as a validation step and compares the result to its original content address. If the object fails validation, it is rebuilt from its mirrored copy.

Content integrity: Refers to the assurance that the stored content has not been altered. Use of hashing algorithm for content authenticity also ensures content integrity in CAS. If the fixed content is altered, CAS assigns a new address to the altered content, rather than overwrite the original fixed content, providing an audit trail and maintaining the fixed content in its original state. As an integral part of maintaining data integrity and audit trail capabilities, CAS supports parity RAID protection in addition to mirroring. Every object in a CAS system is systematically checked in the background. Over time, every object is tested, guaranteeing content integrity even in the case of hardware failure, random error, or attempts to alter the content with malicious intent.

Location independence: CAS uses a unique identifier that applications can leverage to retrieve data rather than a centralized directory, path names, or URLs. Using a content address to access fixed content makes the physical location of the data irrelevant to the application requesting the data. Therefore the location from which the data is accessed is transparent to the application. This yields complete content mobility to applications across locations.

Single-instance storage (SiS): The unique signature is used to guarantee the storage of only a single instance of an object. This signature is derived from the binary representation of the object. At write time, the CAS system is polled to see if it already has an object with the same signature. If the object is already on the system, it is not stored, rather only a pointer to that object is created. SiS simplifies storage resource management tasks, especially when handling hundreds of terabytes of fixed content.

Retention enforcement: Protecting and retaining data objects is a core requirement of an archive system. CAS creates two immutable components: a data object and a metaobject for every object stored. The meta‑object stores object’s attributes and data handling policies. For systems that support object‑ retention capabilities, the retention policies are enforced until the policies expire.

Record-level protection and disposition: All fixed content is stored in CAS once and is backed up with a protection scheme. The array is com‑posed of one or more storage clusters. Some CAS architectures provide an extra level of protection by replicating the content onto arrays located at a different location. The disposition of records also follows the stringent guidelines established by regulators for shredding and disposing of data in electronic formats.

Technology independence: The CAS system interface is impervious to technology changes. As long as the application server is able to map the original content address the data remains accessible. Although hardware changes are inevitable, the goal of CAS hardware vendors is to ensure compatibility across platforms.

Fast record retrieval: CAS maintains all content on disks that provide sub second “time to first byte” (200 ms–400 ms) in a single cluster. Random disk access in CAS enables fast record retrieval.

CAS Architecture

The CAS architecture is. A client accesses the CAS‑Based storage over a LAN through the server that runs the CAS API (application programming interface). The CAS API is responsible for performing functions that enable an application to store and retrieve the data.

CAS architecture is a Redundant Array of Independent Nodes (RAIN). It contains storage nodes and access nodes networked as a cluster by using a private LAN that is internal to it. The internal LAN can be reconfigured automatically to detect the configuration changes such as the addition of storage or access nodes. Clients access the CAS on a separate LAN, which is used for interconnecting clients and servers to the CAS. The nodes are configured with low ‑ cost, high‑ capacity ATA HDDs. These nodes run an operating system with special software that implements the features and functionality required in a CAS system.

When the cluster is installed, the nodes are configured with a “role” defining the functionality they provide to the cluster. A node can be configured as a storage node, an access node, or a dual ‑role node. Storage nodes store and protect data objects. They are sometimes referred to as back-end nodes. Access nodes provide connectivity to application servers through the customer’s LAN. They establish connectivity through a private LAN to the storage nodes in the cluster. The number of access nodes is determined by the amount of user required throughput from the cluster. If a node is configured solely as an “access node,” its disk space cannot be used to store data objects. This configuration is generally found in older installations of CAS. Storage and retrieval requests are sent to the access node via the customer’s LAN. Dual-role nodes provide both storage and access node capabilities. This node configuration is more typical than a pure access node configuration.

vipulvajpayee blog

Thursday, 24 May 2012

CAS (content Addressed Storage)

No comments:

Post a Comment

About Me

Blog Archive