Home Artificial Intelligence News Resolving data challenges for transformational AI

Resolving data challenges for transformational AI

Artificial intelligence digital brain on blue background 3D rendering

Integrating data across multiple sources and resolving data quality issues is often the most important challenge in leveraging machine intelligence. This is also the biggest challenge even if one desires to derive simple rules or insights from data.

Artificial Intelligence (AI) is possibly the most widely discussed technology trend. There are strong views about the applications and benefits of AI, while some feel that it will bring in the apocalyptic day of machine controlling human beings; a few others believe that AI will be extremely beneficial for humans. In the near term, AI and machine learning is becoming critical to solve important business and social problems. In many cases, machine intelligence is being used as a supplement to humans, thereby “augmenting” human intelligence.

It is commonly believed that “mathematically complex” algorithms are the biggest challenges for leveraging the power of machine intelligence. However, a detailed discussion with any practitioner will reveal that this is mostly not the case. Integrating data across multiple sources and resolving data quality issues is often the most important challenge in leveraging machine intelligence. This is also the biggest challenge even if one desires to derive simple rules or insights from data.

Due to data integration and data quality challenges, most data driven initiatives within organisations tend to take very long to demonstrate tangible results. This causes budget escalation and frustrations among senior leadership.
Traditionally, organizations used to adopt a linear approach to data integration, wherein detailed Extract, Transform and Load (ETL) processes were created to obtain data from various systems and establish relationship between hundreds of data tables that belong to various source systems. These data warehousing initiatives used to take atleast 18 to 24 months to complete. In addition, it was often difficult to incorporate unstructured data like images, text files, streaming data from sensors etc.

Brand Solutions

Build What’s NextIt is not common to use data from the warehouse into various front-end applications, this is because the warehouse is a single storage of the entire data, and if a front-end application (e.g. a customer facing App, a distributor portal etc.) requires to fetch data from that large storage, the response time is usually very high, which is not acceptable for most front-end applications.

An agile approach to data integration attempts to change this paradigm and bring in a use-case-driven approach to data integration. The key element of this approach involves creation of a data lake. As opposed to a data warehouse, a data lake stores data in an AS-IS format without performing any transformation of the data. One performs only Extraction and Loading (EL) of the data in the as-is format from the source system into a single repository. The data lake also stores unstructured data like text, images etc. In this case, there is no transformation or summarization of the source data, hence there is no loss of information. The dramatic reduction in storage costs have made it possible to store data in the as-is form.

Using the right platform, a data lake can be created very quickly (within 4-6 weeks). Once, the data lake has been created, an agile approach can be used to combine the few tables that are required for creating a particular use case. These tables are combined into a mini data mart. For example, one may need to identify the relationships between a handful of 10 to 15 tables only to create a data mart that is required to analyse the effectiveness of salespersons. Each data mart supports specific insight generation or machine learning model development needs. This approach ensures that organizations see initial results very quickly.

Each data mart consists of relatively fewer data elements compared to a warehouse; hence, they can be used to provide data to front-end applications, within prescribed response times. Data from the data marts and results of machine learning models are exposed as APIs (application program interface) which can be consumed by various front-end applications.

A key aspect of using data from the data mart for various front-end applications involves the frequency at which data from the source systems are refreshed into the data lake. An ideal solution is to perform a near-real-time refresh, and to capture changes where the source data is overwritten. Specific capabilities like change data capture and ability of reading updates form data base logs is critical for this purpose.

The data lake approach allows different data marts to obtain different types of summarization of the same base data. As the granular as-is data is present in the data lake, one can perform different types of summarizations from the same data. Within a data lake framework, it is critical to have utilities that can convert (or make sense) of unstructured data like images, scanned PDFs, etc. and incorporate the same with traditional structured data.
Once a specific data mart has been created for one use case, then the next set of data marts can be created whenever needed – the agile data journey. The data lake can also use existing data warehouse as a data source thereby reusing existing investments in the data warehouse.

The Author is Head Product and Engineering at Actify Data Labs

DISCLAIMER: The views expressed are solely of the author and ETCIO.com does not necessarily subscribe to it. ETCIO.com shall not be responsible for any damage caused to any person/organisation directly or indirectly.

Source link

Must Read


As the world tries to grapple with the implications of 5G, researchers from China have already started looking into 6G. 6G will operate on...

Building a Continuous Integration pipeline

What is continuous integration? In the event that you haven’t used continuous integration systems in the past, let’s do a quick run through of what...

IOHK Joins Hyperledger

Leading blockchain research and development company behind Cardano, IOHK, has joined the Hyperledger consortium. Hyperledger is an open-source community focused on developing a suite of...

Transforming the pension system using blockchain

 When teachers retire, they expect accurate pension payouts. That’s also the goal of plan administrators, who have an obligation to ensure pension system integrity.Still,...

Business utilities of Machine Learning & Predictive Analytics

What’s the first thing that comes to mind when you hear “artificial intelligence” (AI)? While I-Robot was a great film, it doesn’t count. Many don’t realize how...

Google Meet gets AI based noise cancellation for video calls

Google has added a new noise cancellation feature on Google Meet that uses Artificial Intelligence (AI) to cancel out the noise in the background...

Highlighting AI Bias

On Monday, IBM made a monumental announcement: the company is getting out of the facial recognition business, citing racial justice concerns and the need...

Understanding Federal IT

http://www.podcastone.com/downloadsecurity?url=aHR0cHM6Ly9wZHN0LmZtL2UvY2h0YmwuY29tL3RyYWNrL0UyRzg5NS9hdy5ub3hzb2x1dGlvbnMuY29tL2xhdW5jaHBvZC9hZHN3aXp6LzE3MDYvMDYwOWZlZGVyYWx0ZWNodGFsa19wb2RjYXN0X21scDJfYWQyNzk4OWMubXAzP2F3Q29sbGVjdGlvbklkPTE3MDYmYXdFcGlzb2RlSWQ9N2UwNDEzYWItZmEyZi00YTdjLWJlMWItZmQwZmFkMjc5ODljKip8MTU5MjM4Nzc5NTM2OCoqfA==.mp3This week on Federal Tech Talk, host John Gilroy interviews Chase Cunningham, principal analyst serving security and risk professionals at Forrester Research. Cunningham has four patents,...

Artificial Brains Need Sleep Too

 States that resemble sleep-like cycles in simulated neural networks quell the instability that comes with uninterrupted self-learning in artificial analogs of brains.No one can...

Differenciating Bitcoin and Electronic Money

Bitcoin has the largest market share among virtual currencies, and is already being used on a daily basis overseas. Since it is a virtual...
banner image