Every organization should have a data management strategy that enables them to effectively capture, store, organize and analyze the data generated by various business functions.
Data is the fuel of the digital economy. Data-centric organizations recognize distinct competitive advantages. Therefore, every organization should have a data management strategy that enables them to effectively capture, store, organize and analyze the data generated by various business functions.
However, as regulations increase, businesses and organizations in India need to ensure that everything they do in terms of data protection and governance complies with new regulations and standards.
In particular, we have seen BIS release the new 17428 standards, broken down into two parts, part 1 sets out the requirements where as part 2 sets out guidelines. This standard is important in India as India is a sovereign state, and as such not all of the contents of the EU GDPR actually make sense, as this was designed to work across borders.
We have also seen the Indian government place in front of parliament the PDP (Personal Data Protection) Bill currently it is in draft form but is expected to pass into law in late 2021 or early 2022.
The aim of the PDP bill to put it quite simply is to provide for the protection of the privacy of individuals relating to their personal data.
Organization’s data management strategy’s goal should make sure that the data in the digital systems is accurate, protected and accessible to authorized consumers. However, creating a future data management strategy is a very complex task given the rapid advances in emerging technologies such as cloud and big data systems and the rapidly changing business requirements that require real-time data, not yesterday’s data.
In order to comply with the new standards and proposed laws, inevitably some data must be made anonymous and the “right to be forgotten” must be enforced. More business users and regulators are requiring all the “factories” that provide them with data to be more transparent, which implies a more up-to-date catalog of data and metadata.
Let us understand in detail some of the critical challenges that must be considered for creating a holistic data management strategy and the data architecture that integrates it.
Access to real-time data: Organizations need to access real-time data to adapt quickly to market changes and support real-time analytics use cases such as consumer behavior monitoring, ad optimization, product recommendations, and more. This means that the data must be analyzed by the user as soon as it is generated.
However, the data architecture in most organizations is not designed to support real-time analytics. The most common approach to business intelligence and analytics adopted by most organizations is to replicate data from source systems to intermediate storage solutions such as data warehouses and data lakes using various ETL processes. While this approach is suitable for regular business reporting, it does not support real-time analytics use cases.
Therefore, organizations should adopt alternative approaches that support both traditional forms of business reporting and advanced analytics such as real-time analytics and streaming.
Big Data: To perform advanced analysis, organizations must store and analyze a wide range of big data. This diversity of big data includes, but is not limited to, text (e.g. contracts and messages on social media), voice messages (e.g. conversations between flight controllers and pilots), images (e.g. car crashes due to incidents) and videos (e.g. from security cameras and cameras at airports and retail stores).
Organizations also like to archive data resulting from monitoring new business programs that generate huge volumes of data. There are also cases of streaming data that needs to be transferred from the source to a real-time streaming application.
Data from wearables, in-game player activity, telemetry from connected devices fall into this category. Regardless of the type of analytics a business wants to perform, the volume and diversity of big data will have a direct impact on the technology in data architecture.
Cloud Platform Interoperability: Cloud computing technology is evolving faster than ever. Applications are becoming more portable, enabling processing cycles to support workloads in real time, and data integration platforms simplify connectivity and cross platform boundaries, making hybrid and multi-cloud architectures the standard of completion. Therefore, the new data architecture strategy must support cloud platform interoperability. It will also enable reporting and analysis for business cases that require extracting data from multiple cloud platforms.
Data Science: Data science enables organizations to discover hidden patterns in data by creating analytical models. These analytical models are built using techniques such as statistics, deep learning, machine learning, and AI.
However, several studies show that data scientists often spend 80% of their time on data preparation tasks such as data cleaning and exploration, and only 20% of their time on building predictive models. Therefore, modern data architecture plans must contain adequate tools that enable data scientists to focus on their core competencies.
Given the pace of change in the world, companies need agile data management strategies to adapt. Therefore, the time requirement is a logical architecture that is flexible enough to include all kinds of new sources with minimal reconfiguration and cater to multiple users and consumer applications.