Home Data Engineering Data News Microsoft leaked 38TB of confidential data

Microsoft leaked 38TB of confidential data

September 19, 2023

Recently, it was discovered that 38TB of private data was unintentionally uploaded to the company’s GitHub page by Microsoft researchers, where it could have been viewed by anybody. A backup of the workstations of two former employees was among the data hoard and it comprised more than 30,000 private Teams messages as well as keys, passwords, and other confidential information.

The leak, according to cloud security company Wiz, was unintentionally added to a batch of open-source training data and posted on Microsoft’s GitHub repository for artificial intelligence (AI). That implies that downloads were encouraged, which raises the possibility that it may have repeatedly ended up in the wrong hands.

Even while there are many other types of data breaches, the fact that this one was initiated by Microsoft’s own AI researchers would be particularly damaging for the company. According to The Wiz, Microsoft uploaded the data utilizing Shared Access Signature (SAS) tokens, an Azure feature that enables users to share data through Azure Storage accounts.

The repository instructed visitors to get the training data from a certain URL. The web address, however, permitted visitors to peruse files and folders that were not supposed to be made available to the public, in addition to the planned training data.

Full control

It becomes worse. Wiz stated that the access token that made all of this possible was incorrectly configured to grant full control permissions rather than more limiting read-only restrictions. In reality, that meant that anyone who went to the URL could not only access the files, but also remove and overwrite them.

Wiz argues that there may have been serious repercussions from this. The repository had a wealth of AI training data, all of which was intended to be downloaded and fed into a script by users in order to enhance their own AI models.

But because of the improperly established permissions, it was vulnerable to manipulation. As a result, Wiz claims, a hacker could have injected malicious code into all the AI models in this storage account, and every user who trusts Microsoft’s GitHub repository would’ve been infected by it.

Potential disaster

According to the research, there is no way for an administrator to be aware that an SAS token exists or where it circulates because the process of creating them, which grants access to Azure Storage folders like this one, leaves no paper trail. The outcomes can be potentially disastrous when a token has complete access permissions, as this one did.

Wiz notes that it notified Microsoft of the problem in June 2023, which is a blessing. Microsoft replaced the faulty SAS token in July, and its internal investigation was finished in August. To give time to thoroughly address the security issue, it has just recently been made public knowledge.

It serves as a reminder that even actions that seem harmless could result in data breaches. Fortunately, the problem has been fixed, although it’s not known if any critical user information was accessed by hackers before it was deleted.

Source link