The Data Center of the Future is Data Driven
By Matthew Hardman, Data Solutions Group Director, Hitachi Vantara Asia Pacific
What a business craves is insights, insights on its performance, its customers, its competition, its offerings etc. These insights are going to be driven by data. Most organisations are less concerned about the ‘how its getting it’ or ‘what its hosted on,’ but are focused more on the ‘what data I am getting,’ and ‘how fast and easy can I get it?’
If you think like a business, you realise that the need for data, and hence the data itself is going to be the core thing that will define and drive the investments your organization makes in the data center today and beyond. If you accept that, then the thing you do need to understand is what technologies will enable the best ingestion, retention and accessibility to your data, and to understand that, you need to understand that not all data is the same.
Structured vs Unstructured Data
It has been very easy to think about all data as 1s and 0s at the end of the day, and you know what, it can all be broken down into that structure; however its not entirely useful to you in that format. What is important to realise is that there are two main types of data today, and in the foreseeable future;
- Structured Data: Data that exists in structures like tables in a database, with a formal and enforced format it must follow, maintained in columns and rows. Think about it in the way an Excel Spreadsheet works.
- Unstructured Data: Data that has no formalised structure, in that it can’t be broken down into rows and columns. Think about data like pictures, call logs, videos, scans etc.
What’s really interesting is that the ratio between the two is really unbalanced, and while you might think that the majority of data captured is going to be structured because you know your organization is running and spending lots of money on things like Oracle, SQL Server, DB2 etc… let me surprise you.
First, you need to understand that data is growing fast… really fast, faster than we probably expected. How fast? Well consider this quote…
“Between the dawn of civilization and 2003, we only created five exabytes; now we’re creating that amount every two days. By 2020, that figure is predicted to sit at 53 zettabytes (53 trillion gigabytes) — an increase of 50 times.” – Hal Varian, Chief Economist at Google
Where is all this data coming from? Well, let me give you a hint. Think about all those people out there today with mobile phones, what are they doing with them… pictures, chats, videos, emoticons. Every interaction is generating data, and that is just the beginning. We are seeing more and more machinery producing thousands of points of data every second. All of this data is unstructured. In fact the growth of data looks like this:
Houston we have a problem. Organisations have been spending millions of dollars building systems that manage structured data, while the growth of data has been happening in the unstructured part. To point to something that is driving this, look at what Gartner is saying.
In fact what they see happening which aligns to our “Cambrian Explosion of Data” graph, is that;
“… from 2018 onwards, cross-industry devices, such as those targeted at smart buildings (including LED lighting, HVAC and physical security systems) will take the lead as connectivity is driven into higher-volume, lower cost devices.”
All of these devices are going to produce more and more data at shorter intervals. Even though a lot of these devices will more than likely be producing data from a consumer point of view, ie smart devices etc, the reality as Gartner puts it, is businesses will spend more, and ultimately that spend will reach $3 trillion by 2020. Businesses will want to capture all and more of the data these devices create, and start utilising this data for analysis and insights to try and optimise existing processes, or realise new opportunities.
This need will drive the pressure onto IT, with the relentless pursuit of data to uncover insights. So while our data centers of tomorrow need to deal with what we have managed yesterday (structured data), they absolutely need to be ready to manage and discover insights from the deluge of data we are about to uncover tomorrow