Avamar
As I was going through the EMC avamar product I thought of
writing something about it in my blog. As we know the backup is the evergreen
technology which keeps on improving day by day and the IT companies are really
investing lots of money on backup and I would like to say that Symantec
product like backup exec and netbackup were mostly heard names in backup world, but now a day’s the Emc is eating
their market by Avamar and Networker and
Data Domain products EMC is having 68% of backup market and these product are really giving the tough fight to
Symantec, and other backup vendors.
Let me introduce
about the Avamar.
Avamar is software + hardware compliance which takes the
backup on disk, avamar does not comes on model vise, rather it comes in
capacity vise like 3.9 TB or etc... , Avamar do the source base
deduplication so it becomes easy to send
data on LAN or WAN with nice speed, There is one more good thing about this
product is that it takes daily full back up , so the restoration of the data becomes much faster than the traditional
tape. As we know that because of virtualization, lots of physical server moved
into virtual world and so the complicity of taking backup of those virtual
machine also got increased and by Avamar we can easily and with nice speed take
backup of those virtual machine and even of the NAS devices sometimes the NDMP
backup becomes quite time consuming, so by Avamar the NDMP backup really
becomes very fast and smooth and the backup window also get reduced. One more
thing I forget to mention the laptop and desktop backup, mostly in IT companies
and other companies we don’t take backup of our laptop and desktop and sometimes
if our laptop lost or crashed , it becomes difficult to retrieve the data, by
avamar has an capability of taking the desktop and laptop backup. Data
Encryption on flight and on rest is added advantage on security perspective
,and centralized management makes protecting hundreds of remote offices easy,
by avamar data transport it can transport the deduplicated data on the tape for
the long term retention, Finally avamar GRID architecture provides online
scalability and patented redundant array of independent node (RAIN) technology
provide high availability.
Now when I was telling about this avamar features to one of
my friend , he laughed and told me what
new in this technology like other product also offer the deduplication
technology then what’s a new in this deduplication , right what ‘s new?
As I would say that other companies have their own
deduplication algorithm by using those algorithm these deduplication software
do the deduplication on the data, As I don’t want to discusses on algorithm but
as I worked on NetApp deduplication technology and their deduplication
technology breaks the data or scanned the data on 4kb of fixed block for finding the duplicate data, and so it
really takes lots of CPU utilization and the NetApp deduplication is always
referred or recommended to run in non-production hours, like night after 12, on
Saturday or Sundays. So by these things we can understand that its dedup
technology eats lots of CPU utilization. Now I am not telling the NetApp
deduplication is not quite good, but here I want to say that the dedup
technology is much depended on type of algorithm used by some companies like
they scanned their data on fixed block or in variable segments or block.
Variable vs. fixed
–length data segments
As segment size is key factor for eliminating the redundant
data at a segment or sub file level, fixed –block and fixed-length segment are
commonly used by lots of deduplication technology, for ex if there is small
change to the dataset (for ex if I added a small word in beginning of the file)
can change all fixed-length segment in a dataset, despite the fact that very
little of the data set has actually changed. Avamar uses an intelligent
variable –length method for determining the segment size that looks at the data
itself to determine the logical boundary points, eliminating the inefficiency.
Logical segment
determination
Avamar’s algorithm analysis the binary structure of dataset
(the 0’s and 1’s that make the data set) in order to determine the segment
boundaries that are context dependent, so that avamar’s client agent will be
able to identify the exact same segments for the dataset, no matter where the
dataset is stored in the enterprise. Avamar variable length segment average
24kb in size and then compressed to an average of just 12kb.By analyzing the
binary structure, Avamar method works for all file types and sizes, including
the database, for instance if paragraph is added in beginning or middle of a
text file, avamar algorithm will identify and backup only the new added segment
rather than backing the whole file again.
For each 24kb segment, avamar generates a 20byte ID, using
SHA-1 algorithm this unique id is like a fingerprint for that segment, Avamar
software then uses the unique ID to determine whether a data segment has been
stored before and then the only unique data is stored again , eliminating the
duplicate data.
Benefits of Avamar
1. Because of
client side deduplication there is tremendous reduction in daily network
bandwidth and backup, as in avamar whitepapers they says that there is up to
500X of reduction in bandwidth and backup data.
2. Because only
changed data are sent, so you can see 10X faster daily full backup.
3. Restoration is
fast because of daily full backup.
4. If you are
sending the backup data through LAN/WAN to remote site then there also you can
see the tremendous amount of bandwidth reduction.
5. Now all the
benefit of disk can be added to your backup, like end-to-end protection,
restoration fast, backup fast, reduction of backup window, etc…
6. As we know the
deduplication technology is resource oriented that means the client will be
highly utilized while the backup is running for the avamar, then I will say
that the avamar client deduplication features run in low priority and does not
take much of the CPU utilization.
Grid Server
Architecture
Avamar Grid Server Architecture is also one of the features
which make avamar more reliable, scalable, performance _oriented Flexible and
more available solution.
Avamar global deduplication feature also work in the unique
_ID process which I have already discussed above , by this features the
duplicated data is not copied again, but to maintain this there should be some
good indexing technology, as we know that centralized index is not much
reliable, as if the index file is corrupted we will be not able to retrieve the
data from the tape or anywhere , because we will be not aware where the data is
stored, so the avamar uses the
distributed indexing feature .
Distributed Indexing
As the data volume increases, a centralized index becomes
increasingly complex, and difficult to manage, often introducing the bottleneck
to backup operations. In addition the corruption of centralized indexing can
result in inability for organization to recover the backup data. Avamar uses
the distributed indexing method to overcome this challenge, Avamar uses the
segment ID in a manner similar to a phone number for landlines in phone number,
the area code provides the first general area where call needs to be routed and
number itself tell the exact location where the call is targeted. Avamar uses a
portion of each unique ID which can determine, which segment of data will be
stored in which storage node, and then another portion on unique ID is to
identify where the data will be stored in the specific storage node, By this
process the identification of the data is quite easy and the hence the
restoration becomes quite faster and easy, The automatic load balancing
features distribute the load across the each storage node, and distribute the
data between each storage node.
RAIN Architecture
(Redundant array of independent node)
Avamar also support the RAIN architecture across its node in
order to provide the fault tolerance and failover across its node so if any of
the nodes get failed there will be no effect on the data, online reconstruction
of the node data will be started. Even the raid 5 or raid 1 will support up to disk failure.
Even the daily checking of the data also happens in Avamar, so that the data
which is backed up can be restored properly, because avamar is the disk based
backup solution so the block checksum is performed in the disk for checking the
error block.
Flexible deployment
options
Agent-only options:
For smaller or remote offices you can install the agents on the server or
desktop laptop and can dedup the data on the source side and send it to the
centrally located avamar server through WAN.
EMC-certified server:
Avamar software installed on an EMC-certified server running the red hat
enterprise Linux from vendors including the DELL, HP and IBM.
EMC Avamar Virtual
Edition for VMware: This is the Industry first deduplication virtual appliance,
means you can install it as a virtual machine in an ESX server hosts,
leveraging the existing server CPU and disk storage.
EMC Avamar Data Store: All in one packaged solution means Hardware +
Software, it comes in two models one is scalable model and one is single node.
Scalable model can be placed in the centralized datacenter where it can store
the data coming from all the remote sites and grow up to petabytes of storage.
A single node Avamar
solution is ideal for the deployment for the remote offices that requires
faster local recovery performance , it provide
up to 1TB,2TB or 3.3 TB of deduplication backup capacity, which under
typical backup solution of tape and disk
can take up to tens of terabytes space.
Conclusion
Avamar is a best backup solution for Enterprise customer,
who is really facing lot of problem on backing up their data, but it becomes
costly solution for some SMB customer, and hence avamar solution as best in
industry for desktop/laptop back and VMware back, still face lot of challenge
when it comes of costing, and because of the costing, the data domain is taking
lot of backup market in SMB sector,
Avamar is best known solution for its desktop/laptop backup solution
and VMware backup solution, and because it do only full backup, it makes
restoration much faster.
One of the good thing of this solution is that it is end to
end solution means software and hardware and so if we implement this solution
in our environment, we get end to end support from the EMC, means if backup
fails or some issue happens, I no need to call my backup software vendor and
then hardware vendor separately, which usually happens because backup software
from other vendor and hardware from other, will make you to log a two different
support call if something happens with backup.