CAS
What is CAS: it is called Content Addressed Storage.
As we know that data when ages means become old they become fixed,
rarely any changes happen on them but they are used by users and applications.
These data are called content data or fixed content.
Initially these types of data were stored on tapes; some
companies keep them on production storage for so many years. I have seen some
companies that they are keeping their 10-11 years old data on their production
environment, they told sometimes these data are required and then lots of manual
works is required to take these data from the tape or any other storage media.
So they keep those data on their production environment. So if we see they are
spending lots of money for their data which are rarely used.
Now here comes the CAS (content Addressed Storage) into the
picture.
CAS (Content Addressed Storage)
CAS is an object-based system that has been purposely built for storing
fixed content data. It is designed for secure online storage and retrieval of fixed
content. Unlike file ‑level and block ‑level data access that use file names
and the physical location of data for storage and retrieval, CAS stores user
data and its attributes as separate objects. The stored object is assigned a
globally unique address known as a content address (CA). This address is
derived from the object’s binary representation. CAS provides an optimized and
centrally managed storage solution that can support single-instance storage
(SiS) to eliminate multiple copies of the same data.
Types of Data
Now let us understand what types of data are called fixed
content data. As we know that lots of data is created day by day by companies,
some data which required frequent changes like online data. Some data that
typically changes but allowed to change when require for ex: bill of materials
and designed data. And other type of data is fixed content which are not
allowed to change like x-ray data, and other types of data which are kept same
as it is for some specific period of time due to government regulations and
legal obligations like emails, web pages and digital media.
Features and Benefits of CAS
CAS has emerged as an alternative to tape and optical
solutions because it over comes many of their obvious deficiencies. CAS also
meets the demand to improve data accessibility and to properly protect, dispose
of, and ensure service‑level agreements for archived data. The features and
benefits of CAS include the following:
Content
authenticity: It assures the
genuineness of stored content. This is achieved by generating a unique content
address and automating the process of continuously checking and recalculating
the content address for stored objects. Content authenticity is assured because
the address assigned to each piece of fixed content is as unique as a
fingerprint. Every time an object is read; CAS uses a hashing algorithm to
recalculate the object’s content address as a validation step and compares the
result to its original content address. If the object fails validation, it is
rebuilt from its mirrored copy.
Content integrity: Refers to the assurance that the stored
content has not been altered. Use of
hashing algorithm for content authenticity also ensures content integrity in
CAS. If the fixed content is altered, CAS assigns a new address to the altered
content, rather than overwrite the original fixed content, providing an audit
trail and maintaining the fixed content in its original state. As an integral
part of maintaining data integrity and audit trail capabilities, CAS supports
parity RAID protection in addition to mirroring. Every object in a CAS system
is systematically checked in the background. Over time, every object is tested,
guaranteeing content integrity even in the case of hardware failure, random
error, or attempts to alter the content with malicious intent.
Location
independence: CAS uses a unique
identifier that applications can leverage to retrieve data rather than a
centralized directory, path names, or URLs. Using a content address to access
fixed content makes the physical location of the data irrelevant to the
application requesting the data. Therefore the location from which the data is
accessed is transparent to the application. This yields complete content
mobility to applications across locations.
Single-instance
storage (SiS): The unique signature
is used to guarantee the storage of only a single instance of an object. This
signature is derived from the binary representation of the object. At write
time, the CAS system is polled to see if it already has an object with the same
signature. If the object is already on the system, it is not stored, rather
only a pointer to that object is created. SiS simplifies storage resource
management tasks, especially when handling hundreds of terabytes of fixed
content.
Retention
enforcement: Protecting and retaining
data objects is a core requirement of an archive system. CAS creates two
immutable components: a data object and a metaobject for every object stored.
The meta‑object
stores object’s attributes and data handling policies. For systems that support
object‑
retention capabilities, the retention policies are enforced until the policies
expire.
Record-level
protection and disposition: All
fixed content is stored in CAS once and is backed up with a protection scheme.
The array is com‑posed
of one or more storage clusters. Some CAS architectures provide an extra level
of protection by replicating the content onto arrays located at a different
location. The disposition of records also follows the stringent guidelines
established by regulators for shredding and disposing of data in electronic
formats.
Technology
independence: The CAS system
interface is impervious to technology changes. As long as the application
server is able to map the original content address the data remains accessible.
Although hardware changes are inevitable, the goal of CAS hardware vendors is
to ensure compatibility across platforms.
Fast record retrieval:
CAS maintains all content on disks that provide sub second “time to
first byte” (200 ms–400 ms) in a single cluster. Random disk access in CAS
enables fast record retrieval.
CAS Architecture
The CAS architecture is. A client accesses the CAS‑Based
storage over a LAN through the server that runs the CAS API (application
programming interface). The CAS API is responsible for performing functions
that enable an application to store and retrieve the data.
CAS architecture is a Redundant Array of Independent Nodes
(RAIN). It contains storage nodes and access nodes networked as a cluster by
using a private LAN that is internal to it. The internal LAN can be
reconfigured automatically to detect the configuration changes such as the
addition of storage or access nodes. Clients access the CAS on a separate LAN,
which is used for interconnecting clients and servers to the CAS. The nodes are
configured with low ‑ cost, high‑ capacity ATA HDDs. These nodes run
an operating system with special software that implements the features and
functionality required in a CAS system.
When the cluster is installed, the nodes are configured with
a “role” defining the functionality they provide to the cluster. A node can be
configured as a storage node, an access node, or a dual ‑role node. Storage nodes store and protect data objects.
They are sometimes referred to as back-end nodes. Access nodes provide connectivity to
application servers through the customer’s LAN. They establish connectivity
through a private LAN to the storage nodes in the cluster. The number of access
nodes is determined by the amount of user required throughput from the cluster.
If a node is configured solely as an “access node,” its disk space cannot be
used to store data objects. This configuration is generally found in older
installations of CAS. Storage and retrieval requests are sent to the access
node via the customer’s LAN. Dual-role
nodes provide both storage and access node capabilities. This node
configuration is more typical than a pure access node configuration.
No comments:
Post a Comment