Using NLP and AI in the HPC Cloud

Customers and buyers have benefited greatly from advances in Internet connectivity in recent years. Rapidly growing e-commerce companies have produced real big data as a result of these developments. The enormous popularity of big data on social media allows buyers to express their opinions and views on a wide range of topics, such as the economic situation, to express your dissatisfaction with certain products or services or to express your pleasure with your purchases.

A significant number of consumer reviews and product reviews provide a wealth of useful information and have recently emerged as an important resource for consumers and businesses alike. Consumers frequently seek quality information from online reviews before purchasing a product, and many businesses use online reviews as crucial input for your products, marketing and customer relationship management. Hence, understanding the psychology of online consumer behavior has become key to competing in today’s markets characterized by increased competition and globalization.

Sentiment analysis and text analysis are applications of big data analysis aimed at aggregating and extracting emotions and sentiments from many types of ratings. This exponentially growing big data is mostly in an unstructured format that is impossible to interpret by humans power. Therefore, it is crucial to use machine learning with natural language processing (NLP), which focuses on gathering facts and opinions from the vast amount of information on the internet. Applying a machine learning NLP model to predict sentiment based on consumer product reviews received from social media and e-commerce websites. The NLP process consists of several steps:

1.Data preprocessing and feature extraction, which converts your text into a predictable and predictable format for your task, it can also help you extract features to understand the layout of your review text. Marking parts of speech are some of the steps involved in data preprocessing and feature extraction.

2.At each rating, a sentiment analysis is performed, categorized as excellent or poor, and then feelings are generated. The sentiment score is a function of polarity and subjectivity. Both parameters are extracted from the review text using NLP algorithms to understand the overall sentiment.
The polarity score sign is often used to infer whether the overall mood is positive, neutral, or negative. Polarity is a floating point value in the range [1,1], where 1 represents a positive statement and 1 represents a negative statement. Sentences generally refer to personal opinions, emotions, or judgments, while objective sentences refer to factual information.
3.Topic modelling allows search engines to focus on the most important topics in documents. The Latent Dirichlet Allocation (LDA) algorithm, a type of unsupervised learning that regards a document as a lot of words, is used to analyze topics and the likelihood of occurrence of Topics into a document based on the words.
Of all the steps in the overall NLP process, topic modeling (the LDA algorithm) is by far the most computationally intensive part of the process, while the other steps (data cleansing and feature engineering, data visualization, sentiment analysis, and predictive analysis)) are nearly independent of the number of If there are ratings, the effort for modeling topics increases exponentially with the number of ratings. Therefore, we were looking for a highly parallel version of the LDA algorithm that can work highly efficiently on on-premise HPC systems or in the HPC cloud (e.g. AWS, Azure, Google GCP, see below).
4. Developing algorithms or creating a predictive model that can predict and classify any input verification statement using machine learning techniques that use statistical methods to compute sentiment scores, refine their own rules through repeated training based on the training data provided. Accuracy and validation are becoming critical criteria for algorithm selection. It is a useful resource for evaluating affective information on social platforms and e-commerce channels as it is based not only on domain-specific keywords, but also on common sense, which enables cognitive and affective information related to the text to be expressed in a more natural way Extrapolate language.

Performance Benchmarking on Workstation and HPC Cloud

The NLP – Machine Learning Algorithm for E-Commerce is a very computationally intensive technique, especially the LDA algorithm, as already mentioned above. To complete the study, we first carried out a performance analysis with a high-performance desktop computer with 16 CPU cores. and 32 GB of RAM. The performance analysis was conducted to examine the computer system requirements to process up to 20 million verification data with the following benchmark results:

The effort involved in modeling topics increases exponentially due to the LDA algorithm. To overcome this disadvantage, we have found parallel LDA theme modeling methods based on the MapReduce architecture using a distributed programming model, the parallel implementation of the model LDA theme using the parallel computing platform Hadoop. The results show that with a large number of patches, this parallel approach can achieve a well-suited near-linear acceleration for local HPC and HPC resources in the cloud. The HPC environment has the Python-based Anaconda platform, which is supported in data analysis and predictive modeling. As we have shown, dealing with such large amounts of data is a real challenge for this NLP project and requires significant computing power. Ideally, such a huge amount of data is possible by scaling the algorithm to HPC in the cloud.

Further experiments, carried out in HPC’s cloud environment, will demonstrate the ability to remotely configure and run big data analytics and build AI models in the cloud. The configuration requirements for the AI ​​machine learning model are then pre-installed in the HPC application containers on the Uber Cloud Engineering Simulation Platform, which allows the user to access and run the NLP workflow without installing any pre-configuration.

Source link

Most Popular