Amit Sheth defined “Smart Data” as “realising productivity, efficiency, and effectiveness gains by using semantics to transform raw data into Smart Data.”
Clarifying “Smart Data” concept can be understood as automation of the process represented by the arrow linking “Information” and “Knowledge” entities on the classic diagram:
Smart Data provides value from harnessing the challenges posed by volume, velocity, variety, and veracity of big data, and in turn providing actionable information and improving decision making. It is about extracting value by improving human involvement in data creation, processing, and consumption, resulting in enhanced human experience.
In the most of the cases any data including Big one cannot directly become a knowledge, instead requiring first to become an information via purification, transformation and most likely pre-aggregation (with consecutive granularity loss), so most likely the next transformation is possible only in relatively simple cases:
In real-life cases realising “analytics” most likely will end up with two consecutive processes: data transformation (“T” in “ETL”) and semantic integration. In a case of Big Data numerous pre-aggregations most likely will change this sequence – first there will be required ontology alignment (assignment of means to data entities) and only after that aggregating process will deliver useful outcome.
Summarising, for successful implementation of the “Smart Data” concept in an enterprise it would require adoption of two technical processes: Data Transformation and Semantic Integration. Single use of Data Transformation that is understood under the word “Analytics” on the screenshot above in 95% of cases wouldn’t be enough. That makes pressure on design phase of a project where firstly should be planned additional time for data source research (including interaction with business users) and secondly should be prepared framework (dictionaries’ format, terminology, approvement procedure, etc) for ontology alignment.
New word into dictionary:
Semantic Integration is the process of interrelating information from diverse sources:
- various accounting systems (budgeting, cash flow, GL, etc),
- calendars and to-do lists,
- email archives,
- presence information (physical, psychological, and social),
- documents of all sorts,
- contacts (including social graphs),
- search results and/or extracted advertising and marketing relevance.