It has been said that our society is living in the 4th industrial revolution, The Society of Knowledge. In the last decades, the data industry has been steadily grown as many traditional industries digitalise their processes and create new products and services. The main common point for all of them is their aim to improve the decision making process by objectivity that is base on real data while minimizing the uncertainty.
Data Concept Proliferation
Thanks to the boom of the data industry, many new words enter in our daily language as Big Data, Open Data, machine learning, etc. In the continuation, I will make the effort to list some of those concepts with a small definition for each.
Data: countable in numbers reality or reality documented in digital format.
Information: treated data- classified and organized data in a relation to some external criteria.
Data Set: a collection of data from a given source. The source can be sensors, social media, notes, etc.
Data Base: collection of data sets under one managing organization. They can be public or private but always behind them there is a managing organization responsable for the data base maintenance.
Data Lake: a large data base or a collection of data bases.
Data Ocean: hyper large data base or a collection of data pools, for example, the Google searching engine.
Big Data: a concept that refers to the data generated and collected in the Internet space.
Open Data: a concept that refers public or private data bases which can be access under open source licence.
Thick Data: a concept that refers to a small amount of data which contains an important insights.
Data Analytics: software platforms which helps the creation of a data base, the data treatment and the visualization of the data helping decision makers.
Machine Learning: a R&D field and its developed software for the automation of the data analysis.
Artificial Intelligence: a type of software which improves by itself learning by processing a huge amount of data.
I personally like the extent list of data concepts presented in the post of Sean McKenna, if you want to know more.
Concept differences
In continuation, I'm going to discuss some of the differences in the above defined concepts. It is crucial to understand them when we want to choose the type and sources of data that we need, or the data application, or our business model.
For example, there is a huge difference in the aim of different industries under the concepts of Big, Open and Thick Data. The same is happening with the technologies behind the concepts of Data Analytics, Machine Learning and Artificial Intelligence.
Looking at the data industry, its common point is the organization of the data in data bases under some criteria of classification. While, the differences come from the use that is given to that classified data. Big Data market offers enormous volume of data for prediction. Open Data market aims to create new business opportunities for the individuals. Thick Data is oriented to identify trends in the consumer behaviour. Artificial Intelligence aims to discover new knowledge beyond the establish data analysis methods.
Looking generally over the data technologies, their common point is the automation. While, the differences come from the obtained results and their applications. Data Analytics is designed to be a tool for the decision making process. Machine Learning is focus to fully automate tasks which will free human resources. Artificial Intelligence goes further in its objectives not only to automate tasks but to create new knowledge by learning.
Lastly, looking on the type of data organizational concepts: data set, base, lake, ocean. There is a common mistake that many people make when thinking that the difference is in the volume of stored data. In my opinion, the difference stay in the classification of the data as I try to point in the definitions above. The concept changes when more classes of data are recorded. This difference in the understanding comes from the focus of application that one would like to give to the stored data. I believe thinking about those concepts not in volume but in diversity can help you to chose better your data sources.
Key for successful data business model
Clearness about the differences between the data concepts can help you find better data applications and to design sustainable business models over the chosen data.
The key for a successful business model over data is the optimization of the data value vs. its cost.
To optimize that ratio, one should chose well the data sources, the technology for the data value creation and the path for the delivering of this value. I believe the understanding behind the data concepts is fundamental for this optimization.