Friday, 30 November 2012

Avamar


Avamar
As I was going through the  EMC avamar product I thought of writing something about it in my blog. As we know the backup is the evergreen technology which keeps on improving day by day and the IT companies are really investing lots of money on backup and I would like to say that  Symantec product like backup exec and netbackup were mostly heard names in backup world, but now a day’s the Emc is eating their market  by Avamar and Networker and Data Domain products EMC is having 68% of backup market and these product are really giving the tough fight to Symantec, and other backup vendors.

Let me introduce about the Avamar.
Avamar is software + hardware compliance which takes the backup on disk, avamar does not comes on model vise, rather it comes in capacity vise like 3.9 TB or etc... , Avamar do the source base deduplication  so it becomes easy to send data on LAN or WAN with nice speed, There is one more good thing about this product is that it takes daily full back up , so the restoration of the  data becomes much faster than the traditional tape. As we know that because of virtualization, lots of physical server moved into virtual world and so the complicity of taking backup of those virtual machine also got increased and by Avamar we can easily and with nice speed take backup of those virtual machine and even of the NAS devices sometimes the NDMP backup becomes quite time consuming, so by Avamar the NDMP backup really becomes very fast and smooth and the backup window also get reduced. One more thing I forget to mention the laptop and desktop backup, mostly in IT companies and other companies we don’t take backup of our laptop and desktop and sometimes if our laptop lost or crashed , it becomes difficult to retrieve the data, by avamar has an capability of taking the desktop and laptop backup. Data Encryption on flight and on rest is added advantage on security perspective ,and centralized management makes protecting hundreds of remote offices easy, by avamar data transport it can transport the deduplicated data on the tape for the long term retention, Finally avamar GRID architecture provides online scalability and patented  redundant  array of independent node (RAIN) technology provide high availability.
Now when I was telling about this avamar features to one of my friend , he laughed  and told me what new in this technology like other product also offer the deduplication technology then what’s a new in this deduplication , right what ‘s new?
As I would say that other companies have their own deduplication algorithm by using those algorithm these deduplication software do the deduplication on the data, As I don’t want to discusses on algorithm but as I worked on NetApp deduplication technology and their deduplication technology breaks the data or scanned the data on 4kb of fixed block  for finding the duplicate data, and so it really takes lots of CPU utilization and the NetApp deduplication is always referred or recommended to run in non-production hours, like night after 12, on Saturday or Sundays. So by these things we can understand that its dedup technology eats lots of CPU utilization. Now I am not telling the NetApp deduplication is not quite good, but here I want to say that the dedup technology is much depended on type of algorithm used by some companies like they scanned their data on fixed block or in variable segments or block.

Variable vs. fixed –length data segments
As segment size is key factor for eliminating the redundant data at a segment or sub file level, fixed –block and fixed-length segment are commonly used by lots of deduplication technology, for ex if there is small change to the dataset (for ex if I added a small word in beginning of the file) can change all fixed-length segment in a dataset, despite the fact that very little of the data set has actually changed. Avamar uses an intelligent variable –length method for determining the segment size that looks at the data itself to determine the logical boundary points, eliminating the inefficiency.

Logical segment determination
Avamar’s algorithm analysis the binary structure of dataset (the 0’s and 1’s that make the data set) in order to determine the segment boundaries that are context dependent, so that avamar’s client agent will be able to identify the exact same segments for the dataset, no matter where the dataset is stored in the enterprise. Avamar variable length segment average 24kb in size and then compressed to an average of just 12kb.By analyzing the binary structure, Avamar method works for all file types and sizes, including the database, for instance if paragraph is added in beginning or middle of a text file, avamar algorithm will identify and backup only the new added segment rather than backing the whole file again.
For each 24kb segment, avamar generates a 20byte ID, using SHA-1 algorithm this unique id is like a fingerprint for that segment, Avamar software then uses the unique ID to determine whether a data segment has been stored before and then the only unique data is stored again , eliminating the duplicate data.

Benefits of Avamar
1. Because of client side deduplication there is tremendous reduction in daily network bandwidth and backup, as in avamar whitepapers they says that there is up to 500X of reduction in bandwidth and backup data.
2. Because only changed data are sent, so you can see 10X faster daily full backup.
3. Restoration is fast because of daily full backup.
4. If you are sending the backup data through LAN/WAN to remote site then there also you can see the tremendous amount of bandwidth reduction.
5. Now all the benefit of disk can be added to your backup, like end-to-end protection, restoration fast, backup fast, reduction of backup window, etc…
6. As we know the deduplication technology is resource oriented that means the client will be highly utilized while the backup is running for the avamar, then I will say that the avamar client deduplication features run in low priority and does not take much of the CPU utilization.

Grid Server Architecture
Avamar Grid Server Architecture is also one of the features which make avamar more reliable, scalable, performance _oriented Flexible and more available solution.
Avamar global deduplication feature also work in the unique _ID process which I have already discussed above , by this features the duplicated data is not copied again, but to maintain this there should be some good indexing technology, as we know that centralized index is not much reliable, as if the index file is corrupted we will be not able to retrieve the data from the tape or anywhere , because we will be not aware where the data is stored, so  the avamar uses the distributed indexing feature .

Distributed Indexing
As the data volume increases, a centralized index becomes increasingly complex, and difficult to manage, often introducing the bottleneck to backup operations. In addition the corruption of centralized indexing can result in inability for organization to recover the backup data. Avamar uses the distributed indexing method to overcome this challenge, Avamar uses the segment ID in a manner similar to a phone number for landlines in phone number, the area code provides the first general area where call needs to be routed and number itself tell the exact location where the call is targeted. Avamar uses a portion of each unique ID which can determine, which segment of data will be stored in which storage node, and then another portion on unique ID is to identify where the data will be stored in the specific storage node, By this process the identification of the data is quite easy and the hence the restoration becomes quite faster and easy, The automatic load balancing features distribute the load across the each storage node, and distribute the data between each storage node.

RAIN Architecture (Redundant array of independent node)
Avamar also support the RAIN architecture across its node in order to provide the fault tolerance and failover across its node so if any of the nodes get failed there will be no effect on the data, online reconstruction of the node data will be started. Even the raid 5 or raid 1 will support up to disk failure. Even the daily checking of the data also happens in Avamar, so that the data which is backed up can be restored properly, because avamar is the disk based backup solution so the block checksum is performed in the disk for checking the error block.

Flexible deployment options
Agent-only options: For smaller or remote offices you can install the agents on the server or desktop laptop and can dedup the data on the source side and send it to the centrally located avamar server through WAN.
EMC-certified server: Avamar software installed on an EMC-certified server running the red hat enterprise Linux from vendors including the DELL, HP and IBM.
EMC Avamar Virtual Edition for VMware: This is the Industry first deduplication virtual appliance, means you can install it as a virtual machine in an ESX server hosts, leveraging the existing server CPU and disk storage.

EMC Avamar Data Store:  All in one packaged solution means Hardware + Software, it comes in two models one is scalable model and one is single node. Scalable model can be placed in the centralized datacenter where it can store the data coming from all the remote sites and grow up to petabytes of storage.
A single node  Avamar solution is ideal for the deployment for the remote offices that requires faster local recovery performance , it provide  up to 1TB,2TB or 3.3 TB of deduplication backup capacity, which under typical backup solution of tape  and disk can take up to tens of terabytes space.

Conclusion
Avamar is a best backup solution for Enterprise customer, who is really facing lot of problem on backing up their data, but it becomes costly solution for some SMB customer, and hence avamar solution as best in industry for desktop/laptop back and VMware back, still face lot of challenge when it comes of costing, and because of the costing, the data domain is taking lot of backup market in SMB sector,
Avamar is best known solution for its desktop/laptop backup solution and VMware backup solution, and because it do only full backup, it makes restoration much faster.
One of the good thing of this solution is that it is end to end solution means software and hardware and so if we implement this solution in our environment, we get end to end support from the EMC, means if backup fails or some issue happens, I no need to call my backup software vendor and then hardware vendor separately, which usually happens because backup software from other vendor and hardware from other, will make you to log a two different support call if something happens with backup.