The latest proof that targeting software developers with this kind of attack isn’t just a passing trend came with the upload of more than 400 malicious packages to PyPI (Python Package Index), the official code repository for the Python programming language.
The nearly identical malicious payloads revealed in all 451 of the freshly discovered packages by security company Phylum were uploaded in bursts that were closely spaced apart. As soon as they are installed, the packages produce a malicious JavaScript extension that loads every time a browser is opened on the infected device, giving the virus persistence despite reboots.
If any cryptocurrency addresses are copied to the infected developer’s clipboard, the JavaScript checks for them. When an address is discovered, the malware changes it to one that belongs to the attacker. Intercepting money that the developer was about to send to a different party was the goal.
In November, Phylum discovered dozens of packages that secretly performed the same action using highly encrypted JavaScript and were downloaded hundreds of times. Particularly, it:
- Put a textarea on the page
- Copied any contents from a clipboard over to it
- A variety of regular expressions were employed to look for typical cryptocurrency address formats
- Inserted the attacker-controlled addresses in the previously constructed textarea to replace any detected addresses.
- Copied the textarea to the clipboard.
The malicious software would replace the wallet address with an attacker-controlled address if a compromised developer copies a wallet address at any time, according to Phylum Chief Technical Officer Louis Lang’s post from November. The user will unintentionally pay money to the attacker as a result of this covert find/replace.
Novel obfuscation technique
The most recent effort not only greatly increases the quantity of infected packages published, but also significantly alters how it hides its trail. The new packages write function and variable identifiers in what seem to be random 16-bit combinations of Chinese language ideographs seen in the following table, as opposed to the packages revealed in November that employed encoding to hide the behaviour of the JavaScript:
UNICODE CODE POINT IDEOGRAPH DEFINITION
0x4eba 人 man; people; mankind; someone else
0x5200 刀 knife; old coin; measure
0x53e3 口 mouth; open end; entrance, gate
0x5973 女 woman, girl; feminine
0x5b50 子 child; fruit, seed of
0x5c71 山 mountain, hill, peak
0x65e5 日 sun; day; daytime
0x6708 月 moon; month
0x6728 木 tree; wood, lumber; wooden
0x6c34 水 water, liquid, lotion, juice
0x76ee 目 eye; look, see; division, topic
0x99ac 馬 horse; surname
0x9a6c 马 horse; surname
0x9ce5 鳥 bird
0x9e1f 鸟 bird
Using this table, the line of code
Phylum researchers explained:
We can see a series of these kinds of calls oct.__str__()[-3 << 0]
. The [-3 << 0]
evaluates to [-3]
and oct.__str__()
evaluates to the string '<built-in function oct>'
. Using Python’s index operator []
on a string with a -3
will grab the 3rd character from the end of the string, in this case '<built-in function oct>'[-3]
will evaluate to 'c'
. Continuing with this on the other 2 here gives us 'c' + 'h' + 'r'
and simply evaluating the complex bitwise arithmetic tacked on to the end leaves us with:
The getattr(__builtins__, 'c' + 'h' + 'r')
just gives us the built-in function chr
and then it maps chr
to the list of ints [119, 105, 110, 51, 50]
and then joins it all together into a string ultimately giving us 'win32'
. This technique is continued throughout the entirety of the code.
The researchers said that although while the technique appears to produce highly obfuscated code, it is ultimately simple to overcome by simply watching what the code actually does when it is executed.
By downloading one of these trustworthy programmes, the most recent batch of malicious packages tries to profit from creators’ typos:
- bitcoinlib
- ccxt
- cryptocompare
- cryptofeed
- freqtrade
- selenium
- solana
- vyper
- websockets
- yfinance
- pandas
- matplotlib
- aiohttp
- beautifulsoup
- tensorflow
- selenium
- scrapy
- colorama
- scikit-learn
- pytorch
- pygame
- pyinstaller
Packages that target the legitimate vyper package, for instance, used 13 file names that omitted or duplicated a single character or transposed two characters of the correct name:
- yper
- vper
- vyer
- vype
- vvyper
- vyyper
- vypper
- vypeer
- vyperr
- yvper
- vpyer
- vyepr
- vypre
The researchers noted, “This method is trivially simple to automate using a script (we leave this as an exercise for the reader) and as the length of the legal package’s name rises, so do the potential typosquats. For instance, 38 typosquats were found in the cryptocompare package that was submitted almost simultaneously by the user pinigin.9494, according to our system.
Since at least 2016, when a college student uploaded 214 booby-trapped packages to the PyPI, RubyGems, and NPM repositories bearing slightly altered names of legitimate packages, malicious packages have been available in legitimate code repositories that closely resemble the names of legitimate packages. The end result: More than 45,000 instances of the imposter code were run on more than 17,000 distinct domains, and more than half were granted full administrative authority. Since then, so-called typosquatting attacks have increased.
Anyone who planned to acquire one of the safe packages targeted should verify to make sure they didn’t unintentionally obtain a harmful lookalike.