Using Machine Learning to Speed Up Data Retrieval Hashing

Machine learning has been used by a multi-institutional team of academics, lead by MIT, to discover a new method for accelerating data retrieval in massive datasets.

To create more effective hash functions, the researchers applied machine learning. Online databases employ hashing as a fundamental activity to speed up data retrieval by applying hash functions that produce code to indicate where data is kept.

Hash functions have the drawback of randomly generating codes, which might result in collisions when two bits of data are hashed with the same value. Several data can be signaled with the same hash value, causing collisions that make searches less effective. Although there are specific hash algorithms made to reduce collisions, writing them takes more effort and time.

According to an MIT News article, the study team constructed machine learning models made by running an algorithm on a dataset to capture particular attributes in order to decrease collisions for particular scenarios. The group discovered that these models were computationally more effective than other kinds of hash functions.

What we discovered in our work is that, in some cases, we can find a better way to balance the collisions we would encounter with the computation of the hash function. According to Ibrahim Sabek, a postdoc in the MIT Data Systems Group of the Computer Science and AI Laboratory (CSAIL), in an article published by MIT News, In these circumstances, the computation time for the hash function can be slightly increased, but at the same time its collisions can be reduced very significantly.

According to MIT News, the research team aims to employ machine learning models to build hash functions for additional sorts of data and is interested in learning hashing for databases that allow for the addition and deletion of data.

Sabek remarked that they would like to promote the use of machine learning in more fundamental data structures and algorithms within the community. They  have the chance to employ machine learning to collect data attributes and improve performance for any type of fundamental data structure and they still have a lot of ground to cover.

Source link