Getting to grips with Data Liquidity

Data is more liquid than we realise. Just for the purposes of a fun analogy lets compare data to water. Water is important to the substance of life just as data is important to effective decision-making and organisational execution.

Data like water needs to be shared across the organisation. It needs to be protected and nurtured from potential dangers and if squandered could fall into the wrong hands. Data needs to get to the right organisational audience at the right time and the data needs to all be consistent, pure and up to date. Not stagnant or polluted.

Carrying on with the data to water analogy and to highlight the importance of managing water here are some world stats here on water that should server to frame the conversation:

  • There have been over 1,831 significant water related human events in the last 50 years where often conflict has arisen.
  • More than 200 international treaties have been negotiated since the 1960s.

When compared to a typical organisation’s need for data what could we observe similar trends.

For example how many times have different departments needed to share data?

What organisational conflict has arisen over data access or the interpretation of data?

How often has data influenced an important strategic decision that has resulting large scale organisational restructuring or even significant loss of employment for parts of an organisation’s employees?

So what is stopping us sharing?

Why the conflict.

Like water this resource should be shared in an organisation. Like water, without this valuable resource whole organisational departments could very easily wither and die on the vines as the may be executing on completely the wrong activities.

Abundance of Data

One fact that we need to understand is that data is becoming more and more abundant. As of 2012, about 2.5 exabytes of data are created each day, and that number is doubling every 40 months or so. More data is created across the internet every second than were stored in the entire internet just 20 years ago. So that’s encouraging, we know there is lots of data out there and seemingly we are collecting more and more of it!

“Water water everywhere and not a drop to drink!”

So lets take a look at the current situation in the organisation:

Data fragmentation

Often we see data as fragmented. Information resources are often the most important assets an enterprise holds. Information can make or break an organisations;’ ability to align teams and deliver on its core functions—from sales and operations to the marketing and delivery of products or services through to strategy objective setting. The enterprise must be able to access and gather, understand and analyse and successfully leverage information in the best way as possible.

Unfortunately, as enterprises change and grow, data is increasingly becoming more difficult to pin down. Parts of the end to end transaction become locked in disparate systems with complicated access requirements including access control, complex apis and application constraints. Corporate acquisitions and mergers, new technologies, and changing business strategies only make this situation more difficult.

Therefore we live in s constant state of data segregation, in which pockets of information become trapped in lower level functional parts of the organization. One could argue that this situation exists to require knowledge workers to operate, manage and control this data, which in turn improves quality. However the risk of letting this situation perpetuate means that the activities of each function are then only observed in part and not within the context of the wider process, service delivery or organization.

Rise of Liquid Data and SOA

So how is data managed relevant to modern day application architecture? Enter SOA. Service-Oriented Architecture arose as a prevalent architecture for IT environments through a need to become more agile and flexible when it came to managing constant changes to services. The idea was to make web services more modular and self-describing as components that can be used in a platform-independent way.

Traditionally, applications were built as large monolithic units. They dictate the way services need to be run in line with platform dependencies on users and are not easily adaptable to new types of interactions. Web services break these applications into multiple reusable components that can interact with other independent components and services in meaningful ways. For example, instead of having a single financial application, an organization may deploy one or more services that expose the business activities of the application as self-contained, callable processes. That way other systems (such as Human Resource services or applications) can easily leverage financial computing resources. Thus, SOA breaks down barriers between applications. With SOA the IT infrastructure operates in an integrated, organic fashion like a virtual application that spans the organization.

Viewed as a whole, a data service deployment constitutes a data abstraction layer between data users and sources. It normalizes diverse data and exposes a uniform, well-defined interface that data consumers can use to access data. Client developers do not have to know how to connect to a data source, what its native format is, or even where it comes from, whether from one or many underlying sources. The application can simply call a public function of a data service.

SOA is considered a loosely coupled—not completely decoupled—architecture because, while the service provider has no knowledge of the interfaces or business concerns of its consumers, the consumers do include explicit references to the service provider interface, in the form of service calls.

The benefits of loose coupling include simplified data access, reusability, and adaptability. The effect of changes to the data source are localized, requiring only relatively few changes in the data services layer rather than in every application that uses the data source. New services can be exposed and used without requiring extensive changes to existing applications. The result is a data integration layer that is highly adaptive and change tolerant.

So this sounds great so what’s the problem? Often SOA can’t work effectively if not extended into the data realm. Enter the need for Liquid Data which therefore extends SOA. With Liquid Data, you can then expose information as data services. A data services represents a self contained business entity—such as a customer, product, or order—that is reusable across the organization.

Now the problem is that more data and suddenly SOA applications need to be redesigned to manage the processing weight of all of this data.

The limitation of the schema and rise of no sql

An aside to having to manage load legacy applications are also not generally geared up for ingesting new data types primarly because they are based on Relational technology. Relational technology has been the dominant approach to working with data for decades. Typically accessed using Structured Query Language (SQL), relational databases are incredibly useful. And as their popularity suggests, they can be applied in many different situations.

But relational technology isn’t always the best approach. Suppose you need to work with very large amounts of data, for example, too much to store on a single machine. Scaling relational technology to work effectively across many servers (physical or virtual) can be challenging. Or suppose your application works with data that’s not a natural fit for relational systems, such as JavaScript Object Notation (JSON) documents. Shoehorning the data into relational tables is possible, but a storage technology expressly designed to work with this kind of information might be simpler.

NoSQL technologies have been created to address problems like these. As the name suggests, the label encompasses a variety of storage technologies that don’t use the familiar relational model. Yet because they can provide greater scalability, alternative data formats, and other advantages, NoSQL options can sometimes be the right choice. Relational databases have a good future, and they’re still the best choice in many situations. But NoSQL databases get more important every day.

The truth will set us free

So what’s the big deal about bringing all of this data together into one view? Here is an story from Kenneth Cukier that I like to describe the current situation: where he asks us what is America’s favourite pie – the answer of course is Apple. But how did he know this? Because of data? Well I guess it would be by looking at supermarket sales. It is possible to look at supermarket sales of 10 centimeter pies that are frozen, and apple wins, no contest. The majority of the sales are apple. But then supermarkets started selling smaller, 3-4 centimeter pies, and suddenly, apple fell to fourth or fifth place. Why? What happened?

When you buy a 10 centimeter pie, the whole family has to agree, and apple is everyone’s second favourite. But when you buy an individual 3 centimeter pie, you can buy the one that you want. You can get your first choice. You have more data. You can see something that you couldn’t see when you only had smaller amounts of it. Now, the point here is that more data doesn’t just let us see more, more of the same thing we were looking at. More data allows us to see new things. It allows us to see better. It allows us to see different. In this case, it allows us to see what America’s favourite pie is not apple.

So what does the future for big data hold?

Gartner highlight a number of future trends around Bigdata for the next few years:

Number One: By 2020 data will be used to completely reinvent 80% of business models and processes

Number Two: By 2017, more than 30% of data will be accessed through information brokerage services who will act as intermediaries

Number Three: By 2017, more than 20% of customer-facing analytic deployments will provide product tracking information leveraging the IoT.

What can we be doing now to get ahead?

So sounds like big data is hear to stay so how can we start getting ahead of this trend. Well first of all we need to work towards understanding the truth of things and ensure that we get access to the “big picture” as quickly as possible. There is no hiding from strong data conclusions when this data has been delivered from credibly sources. Secondly we need to understand the personal impact of data on our everyday lives. Becoming aware of what digital finger print we are leaving behind is of extreme importance if we value our privacy and understand the impact of data on our every day lives as more and more data is gathered from us. Those that control the data and are custodians of the results will always win so work hard to become that person in your organisation, department or function will become more and more important. Make sure you understand what the data is saying and ensure you are making a positive impact as much as possible!

Love to hear what you think!

This site uses Akismet to reduce spam. Learn how your comment data is processed.