Monday, February 4, 2013

Desperate need for alternative storage medium

A lot of my friends reading this would argue that there is already awesome, high performance storage available from EMC, HDS and/or NetApp. If they happen to be a EMC fan they would point to VMAX or VSP if they are a HDS fan. I am not overly happy with the way these storage have been performed compared to the Compute counterparts.

Can you guess what kind of storage does a super computer use?
A EMC VMAX or HDS VSP or a NetAPP FAS6200? Probably or probably not:
Do direct me if you find use cases of any of these products for HPC(High performance computing).

As the storage medium still remain hard disks or to a certain extent Flash drives(very expensive), purchasing the above storage array might satisfy the need for capacity but from a performance perspective they might not be able to deliver. In an enterprise storage array, Flash drive's capacity is low. So, even if one assumes to have sufficient capital to invest in Flash drives, we might end up filling up the whole array without even accomplishing a Petabyte of storage. So, how do we manage performance and capacity at the same time?

Auto-tiering can be an option but this might not be the solution as there is another layer of bottle neck in the front-end adapters whose throughput are limited. Also, the ratio of high-performance disk plays a vital role in cases where the 'hot data' is more than the available High-performance disks.

There is a company named DataDirect Networks(DDN) that manufactures high density - high performance storage arrays. They are in the HPC arena, so unless any of our clients work for the major stock exchange or a research company, the chances of working on these arrays is slightly low.

Here's what it does to super-boost performance: Have high density enclosures. This means, they are going to rack and stack more drives per enclosure, and hence more storage per rack. That does compensate capacity but for performance they believe there is no super powered storage medium yet that could improve performance and hence depend on the data striping to increase the IOPS and hence improve performance and capacity at the same time. The bottleneck that could be caused for Front end adapters are replaced by 16 x 56Gb/s Infiniband ports (VMAX has Max. 10Gb/s FCoE ports, 16 Gb/s of FC would be possible in the near future). Further reading: DDN SFA 12K.

Acknowledging Moore's law, 'the number of transistors on integrated circuits doubles approximately every two years', there is no such prophecy or even a satisfactory alternative theory or promise that could say that the data storage technology would be able to satisfy the current data growth.

To understand the depth of the subject it is quite important to know the fundamentals of data storage. I am going to rush through all of them in a glance. First of all, the need for binary. We use Binary system because it was easy to store(and many other reasons) two states: 0 or 1. In electronics this actually relates to positive or negative (magnetic flux in the Hard disk drive or the tape) or pits and lands in the field of optical medium(CD/DVD/Blu-ray) or the states of cells in Single or Multi Level Cell (SLC/MLC) for Flash Memory.

Let's wind the clock for few decades when digital data was absent commercially. But we still watched movies and spoke over the phone.

Movies and music were stored in magnetic tapes. We used Video Cassette Player(VCP/VCR) to view them. Automated long distance telephone calls were possible without the need for an operator to perform the switching. Basically, everything were analog and the transformation to digital lead to the need for search for devices to store digital data.

The history of data storage doesn't depend on the way in which data is written but actually how it is read - electronically. The first step would be able to read the contents, next to modify the contents (how many times?) and then finally to read the contents again.

No matter DDN managed to provide performance and capacity bundled satisfactorily to a certain extent. They still have to depend on HDDs. Unless we find an alternate storage medium, we would simply be working around the problem and not actually solve it.

So here are few alternatives that are still under-development and will take some company with a lot of money to invest in developments.

Holographic Storage/Nanotechnology:  There are numerous research underway in the field of nanotechnology, some under serious consideration while some in the beginning phase but the good news is there is great possibility of nano tech storage to be the next generation storage. 

Blu-ray disc are the commercially available high density portable storage medium. The high density of storage capacity of 50GB (from 700MB of CD and 4.7GB of DVD) was achieved by using a Blue laser that allows information to be stored at a greater density than is possible with the longer-wavelength red laser used for DVDs.
Holographic storage takes to the next level by using two beams to further increase the density. In my opinion, they wont be able to replace Tape as the capacity is currently limited to 300GB. A LTO-4 tape is available with the capacity of 1.6TB. Rewritability is always a concern in optical medium and so is the case with Holographic storage. This could be more of a WORM device than an alternative to existing storage medium.
Further reading: Holographic data storage, Holographic Optical Storage, Quantum Holographic Storage

DNA Storage: I truly believe in this aspiring technology as we all acknowledge that we are pure traces of our ancestors. Now, that much details gets inside minute cells and these cells have so much information to our figures!
Again back to basics, the goal of storage medium is to first read the data - modify it - then be able to read it again. Same is the case with DNA storage. We should be able to read it, Our bio folks have already been able to do analysis and determine ancestral traits of individuals, so the question of reading is fine. Scientists were able to store audio and text on fragments of DNA and then retrieved them with near-perfect fidelity—a technique that eventually may provide a way to handle the overwhelming data of the digital age.

By contrast, DNA—the molecule that contains the genetic instructions for all living things—is stable, durable and dense. Because DNA isn't alive, it could sit passively in a storage device for thousands of years. So the durability is guaranteed (no more failed HDDs so to speak). DNA method can store around 125TB of information in one cubic millimetre of DNA. Now, that's some super-dense capacity to meet future needs. I would be only worried about the seek time(or whatever the term it would be) and the throughput available for reading and writing this large amount of capacity.

The four chemicals in which DNA codes its genetic instructions: adenine (A), guanine (G), cytosine (C) and thymine (T).
Next the zeros are set into either the A or C of the DNA base pairs, and the ones into either the G or T.
Again, even this type of storage is sequential like magnetic tapes. So the seek times could be higher.
Further reading: Future of data storage, Future of data

Overall, none of these technology might not be available overnight and not even in the next 5 years. SSDs in Enterprise took a long while to arrive and they still are outrageously expensive.
Hopefully, technologies like DNA storage should fundamentally change the storage medium to a much faster, reliable, durable and less expensive alternative. These should resolve most of the problems faced today in the storage arena in the data center.

Disclaimer: The views published here are those of the author. They do not necessarily reflect his company's views neither does he gets any benefit from the company owning the products mentioned.

Thursday, March 15, 2012

Dropbox - The best 'syncing' Cloud Storage

When at home, how many times have you thought that you wanted access to a document you were editing in office? Are you among those who keep emailing attachments to yourself to preserve a copy online? Did you ever feel that you wished to have all those photos in your phone uploaded automatically and never worry about backing it up? Now, imagine the world where you use one simple folder which get automatically synchronized with all of your machines: office desktop, official laptop, personal laptop and even your mobile phone. 

I just can't stop appreciating the simple yet useful tool in the form of Cloud Storage - Dropbox. It ensures that the data which is important remains available on all of your machines across locations at all times. All that you've to do is copy the data you are interested in the Dropbox folder in your computer and whola! it gets uploaded to the Cloud and it's synced to all the machines which belongs to you. So, when you get home, you don't have to worry if you forgot to add some important lines to your document.

Of course, Microsoft's skydrive gives 25GB of storage vs Dropbox's Free 2Gb, but it isn't Cloud storage in my opinion. It is just like FBI-sued-megaupload in your account. You have to continue to do the upload-download-upload of your files manually. So is with Amazon's S3, not very impressive but useful for SMBs and not for general Public. Lets not even talk about EMC Atmos and we aren't discussing Apple Products(iCloud).

I am quite sure that this is will be a very useful service if you are one who works on many devices a day and always wished to have those important files in all of your devices and equally synced. I can confidently say that after few years, this will replace the \\fileserver\share , meaning NAS would be replaced by 'shared-folder-storage' officially called : Cloud Storage

Try Dropbox.

Disclaimer: The views published here are those of the author. They do not necessarily reflect his company's views neither does he gets any benefit from the company owning the products mentioned.

Friday, April 1, 2011

Quick Start the Public Cloud - Harvesting Amazon's EC2 Cloud in minutes

"Cloud computing? Not again! that's too much of hype which I haven't had a chance to touch and feel"
- Most of the users who know/heard but haven't 'touched' the cloud

How to Use Amazon's EC2 Public Cloud

Touch the Cloud

I am not going to talk about Cloud computing and blah blah because you can find that in various websites by simply googling it. Well, that being said, no one has a proper definition.
This is one of the leading Cloud computing provider, Amazon's definition of it's EC2 range:

"Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers."

Getting started

The goal is to get "compute" capacity from the "internet" which is "fast to provision" and "scalable".

Access a Microsoft Windows 2008 R1 SP2 Datacenter edition using your PC connected through internet exclusively for you created using Amazon's Cloud according to your specs and finally use it.

  1. Amazon account (with access to AWS management console).
  2. Any Windows machine or any machine which can use Terminal services(RDP).
  3. Hybridfox for Mozilla Firefox (optional).
You can register with Amazon Web Service(AWS) for free and till now everything I did here has been for free.
P.S: Anyone who has entered his credit card information can access the AWS management console and create micro instances for 1 year for free.

Kick-off in a nutshell


  1. Be sure you have all the Amazon credentials, all are available under Account>Security Credentials>
  2. Download your  X.509 Certificate which will usually be in the form of 'example.pem'.
  3. Keep your 'Access Keys' available which are also in the same page, you will need the 'Access Key ID' and the 'Security Access Key' to access via Hybridfox/Elasticfox.
Create the instance
  1. Login to your AWS Console using your credentials.
  2. Select EC2 tab for Compute infrastructure - Servers.
  3. Select 'Launch Instance' to select the available instances.
  4. You'll find three tabs - Quick Start, My AMIs and Community AMI.
  5. To make things easy for any user to understand, I will select a Windows 2008 Server under Quick Start. There are many instances available ranging from FreeBSD to Redhat.
  6. In the next page, you'll find options to select the number of instances and the type: 'Micro' has the lowest configuration(low RAM, 1 core,1 CPU unit) while 'High-CPU' will be maximum available configuration(large RAM, more cores, more CPU units).
  7. Select the availability zone based on your location (in US).
  8. In the next page advanced options related to monitoring, termination protection will be displayed. Use appropriately.
  9. Select a Key-Value pair for you instance for reference. A form of metadata, tags consist of a case-sensitive key/value pair, are stored in the cloud and are private to your account. You can create user-friendly names that help you organize, search, and browse your resources. For example, you could define a tag with key = Name and value = Webserver
  10. Select your existing key-pairs, the one created under 2 in the Be-prepared (above).
  11. Select/Create your security group. You might want to create one where you have RDP, SSH, SMTP etc enabled.
  12. Thats all! The next page will have a summary of your instance and a button to Launch. It might take a few minutes for the instance to be available. You will see your instance to be 'pending' and then in a few minutes it will 'running'.
Launching you instance
  1. You will need two things to launch your server(instance):
    1. Link to connect to the instance via RDP.
    2. User name and password.
  2. Generating password will take time and will be ready in a few minutes(upto 30 minutes) and can be generated by right clicking on the instance and selecting 'Get Windows password'. You will have to provide the contents of you 'example.pem' file generated under 2 of 'Be-prepared' above.
  3. Right click on the instance and find an option that says 'connect' or 'connect to instance' in Hybrid/Elasticfox. For a windows instance, you will find the RDP link which you need to enter into your RDP client and use the password that is generated above.
  4. Thats all! Now start using your server!

Hybridfox is a Firefox add-on that attempts to get the best of both worlds of popular Cloud Computing environments, Amazon EC2(public) and Eucalyptus(private). The idea is to use one Hybridfox tool, to switch seamlessly between Amazon and Eucalyptus accounts in order to manage your "Cloud Computing" environment.


Sign into AWS Console using Amazon AWS credentials
You should Screen like the one below:

Selecting and customizing your instance:

Select your instance type from a wide range of available instances

Select the Number and type on instances

Provide a user-friendly name to your Instance

Selecting the Key Pair

Select or Create your security Group

Summary of your instance. Click on Launch to Power it up

Find your instance running in your AWS Console

Same as above - but seen from Hybridfox

Paste your Private key and you will find the option to Decrypt Passoword. Note: This screen to appear will take several minutes.

Use the username/password to connect to the computer using RDP or any other equivalent tools.

The instance in your desktop in minutes!!