Researchers showcase new method for improving the detection of fake websites.
Machine learning models trained to visually represent a website’s code can help improve the accuracy and speed of phishing website detection, according to a document (PDF) from security researchers at the University of Plymouth and the University of Portsmouth, UK. They aim to address the shortcomings of existing detection methods that are either too slow or not accurate enough.
Turning web code into images
The technique that uses “binary visualization” libraries developed by the researchers to turn the markup and code on web pages into images. Using this method, they created a record of legitimate images and website phishing.
The dataset was then used to train a machine learning model to rank legitimate and phishing websites based on differences in their binary display. To test a new website, the code of the target website is transformed by binary visualization and executed by the trained model. To improve the performance of the model, the researchers used MobileNet, a neural network optimized to operate on devices with limited resources rather than cloud servers. The system also gradually builds a database of legitimate and phishing websites to avoid drawing undue and unnecessary inferences.
Accurate detection of phishing websites
According to the researchers’ experiments, the model achieved 94% accuracy in detecting phishing websites and because it uses a very small neural network, it can run on user devices and provide results in near real time.
“We have tested the technique with actual phishing and legit sites,” Stavros Shiaeles, one of the paper’s co-authors, told The Daily Swig.
This isn’t the first time binary visualization and machine learning have been used in cybersecurity. In 2019, Shiaeles, a professor of cybersecurity at the University of Portsmouth, co-wrote another technique that uses ML and binary visualization to detect malware. with promising results. After testing the phishing website detection system, the team is now taking the next step to prepare the technology for adoption.
“We are working on a new extended method and we are trying to apply for a patent,” Shiaeles said. “Based on the results we initially have I don’t see the point not to be adopted. The accuracy is 100%.”