Today, the way how we recollect and use data influences directly over the infrastructure’s needs and their cost as one of the basic characteristics of the data industry is the need of a data storage. The way, that need is satisfied, greatly affects the availability of a business model.
The great dependance between the viability of the business model and the data storage architecture comes from the cost of the data storage and the need for a cleaver optimization of the cost vs. performance ratio.
Here, I'm going to summarize our paper about the building of a sustainable data base and the key factors for building it. In this paper, we discuss the human brain as a theoretical model of a sustainable data base. However, I'm not going to present this model here.
Sustainable Data Base
For the success of the data industry, there are a few challenges related to the treatment of data which are proposed to be the following:
- Exponential growth of the recollected data volume leading to an exponential growth of the needed infrastructure.
- Need for classification and organization of the data following a certain criteria which at the moment precise of human resources.
- Extraction of a valuable knowledge that can be monetised.
Those challenges directly impact the income vs. cost ratio and validate the existence of business models behind the data.
In our paper, we proposed a conceptual solution, the so called “sustainable data base architecture”, for resolving the exponencial growth of the data storage need.
The following definition for a sustainable data base is proposed:
Sustainable data base should not grow or grow very little in time independently of the input data volume’s growth.
This definition suggests that if we imagine a data base as a closed box, it should can store data without limit to the input data volume. To resolve this spacial-volume challenge, the only solutions is to overwrite the old data with the new one. Then, although, the input data can enter infinite in time, the recorded volume can be keep constant.
In this case, we assume a constant data volume flow, but it becomes an issue if the input data volume grows overtime. Then, just overwriting won’t be enough. In this second case, it is needed some way for compressing the data.
Then, a sustainable data base has two main mechanisms: overwriting the old data with the new one under a defined criteria and compressing the stored data.
To create a sustainable data base which will not need to grow its infrastructure in time, we need to put some rules over its operation: the input, different sections, the output, etc. Replicating the basic conclusions of our model about the data treatment process of the human brain, the sustainable data base has five components, shown in the Figure 1:
- A raw data section, where the inputs are store as they are.
- First analysis zone, where the raw data is curated.
- An informational (operative) section, where the data can generate outputs.
- Second analysis zone, where the data is classified by their importance.
- A memory section, where only important data is stored.
These five components need to be well defined with rules for their operation. To evolve the concept of a common data base to a sustainable one, first, it is needed to establish the first analysis zone as a set of rules which activates with the entrance of an input. Then with this, a common data base becomes an informational one as the store data is treated to a given level.
Another common fashion in the data industry is to record everything as a historical records for further applications like statistics and predictions. This opinion can threat our effort to create a sustainable data base as the infrastructure’s needs will increase in time. At this point, we need to establish a compressing mechanism to the informational data base and to split it by the creation of the memory section. The informational data base is split in two sections by the second analysis zone which again is a set of rules.
The set of rules for the second analysis zone can be from simple sorting of real-time and historical data to a learning mechanism with capacity of recognition.
The established compressing mechanism can be used constantly or periodically in dependance with the needs of infrastructure. Then, the data base becomes sustainable.