Home Artificial Intelligence Artificial Intelligence News AI that automatically categorises documents

AI that automatically categorises documents

Audio version of the article

Ignite Microsoft’s SharePoint Syntex is a new feature of SharePoint online that promises to extract metadata from documents automatically, making it easier to find and categorise information.

SharePoint Syntex, currently in preview but with general availability promised for 1 October, is the first product based on a wider technology unveiled at the 2019 Ignite event called Project Cortex.

The core idea is to use AI to parse content stored in Microsoft’s cloud, drawing not only on the words, images, and links in the documents, but also on other signals in the Microsoft Graph, such as who is engaging with the content and what departments they are in.

Syntex can drive document workflows such as approval after categorising documents though AI-powered analysis, presented in the Content Center

Microsoft said that after seeing how Project Cortex was used with preview customers, it has decided to have multiple projects based on the technology, rather than one. SharePoint Syntex is the first, a premium add-on for SharePoint online which is focused on using AI to automate content understanding and automation, such as routing a document to the right person for approval.

This is not the first time we have seen AI applied to SharePoint content. Microsoft introduced Office Delve in 2014, also based on the Office Graph, the theory being that it automatically shows users the documents that are most relevant to them. Delve has had little impact – will Syntex be different?

It is early days, but Syntex is more ambitious than Delve. Delve was focused on surfacing relevant content for a user, whereas Syntex can add metadata to documents that in theory could save substantial manual effort. Syntex could parse a purchase order, for example, work out the monetary value, the customer, and the region where the customer is based, and another process could forward it to the appropriate team to progress the order.

According to general manager Seth Patton, Syntex processes three different types of content: images, forms, and unstructured documents. It will tag images with “thousands of commonly recognized objects”, make tags by recognising handwritten text, and read the fields in forms including parsing of dates, numbers, names, and addresses.

Syntex documents are surfaced in a new Content Center, which sorts documents into libraries and shows the metadata it has extracted as columns. Syntex tagging can also be used for compliance, adding retention or sensitivity labels, and setting things like encryption, sharing restrictions, and conditional access policies.

Creating a custom model in Syntex by training based on files which have labelled content, identifying the metadata

The most intriguing part of Syntex is the ability to train new models for extracting metadata from documents. Every business has its own terms and categories. Syntex has a model creation feature where you can define entities, such as “Contractor” or “Fee amount”, mark existing documents with labels identifying the values for these entities, and submitting these to train a model that will enable AI to extract them automatically from new documents.

As few as five files to train the model

Naomi Moneypenny, director of program management for Syntex, said at Ignite that as few as five files could be sufficient for training, particularly if users supply both positive and negative examples of a particular content type. Form processing, which should be the easiest type of content from which to extract metadata, has a specific form processing engine.

Content processed by Syntex does not have to live in SharePoint, but can also be sucked in from other sources via Microsoft Graph content connectors. Examples of such sources include file shares, Azure SQL, Box, Amazon S3, Google Drive, SharePoint on-premises, and Salesforce.

Microsoft spoke at Ignite about new features planned for Syntex early next years, which include expanded model types, central model management, Syntex-based solutions for business processed, and more integration between Syntex and “knowledge improvements across Microsoft 365”.

All a bit vague, but you get the impression that the company sees AI-driven content analysis as a significant piece in its 365 offering.

Whereas Delve was free for licensed SharePoint users, Syntex is a paid-for service available to E3 or E5 subscribers to Microsoft 365. The pricing looks complex, being per-user and limited to “500 items indexed by content connector, pooled”, according to a slide presented at Ignite. Customers also get credits for form processing. Presumably additional fees apply if these limits are exceeded.

The problem with all the above is whether the company is over-promising when it comes to the benefits of Syntex. Considering the complexity of the underlying data science, the company’s ability to simplify the usage of AI services, whether in Syntex or its other Cognitive Services portfolio, is not in doubt.

AI is an inherently imperfect technology, though, which is worrying in a business context if organisations depend on it too much, for example, to decide whether or not a document is confidential. As a paid-for service, Syntex will have to deliver high enough accuracy to justify its cost.

Whether or not Syntex flies, you can bet Microsoft, like others in the document management area, will continue to apply AI technology in the hope of making better sense of these repositories of unstructured data.

This article has been published from a wire agency feed without modifications to the text. Only the headline has been changed.

Source link

- Advertisment -

Most Popular

Improving Robots’ performance with Machine Learning

A small drone takes a test flight through a space filled with randomly placed cardboard cylinders acting as stand-ins for trees, people or structures....

Combination of Robots, AI and Data Analytics in your local Supermarket

Robots patrolling grocery store aisles and warehouses; so-called dark stores dedicated to online-only orders; data crunched in the cloud that allows retailers to identify and even...

5 (Most Common) Mistakes New Data Scientists Must Avoid

Emerging technologies like data science, machine learning, artificial intelligence are exploding by giving new dimensions to its applications. With business booming into data-driven technologies...

Using Blockchain to manage the supply chain COVID-19 vaccine

Blockchain could play an essential role in the distribution of the COVID-19 vaccine. Tackling COVID-19 will require the first-ever deployment of blockchain in the...

Role of Artificial intelligence in IVF

IVF is a physically and emotionally draining process and success isn’t guaranteed. But machine learning technology could improve the odds for couples trying to...

Machine Learning a major part of Google Sheets

It’s been a while since the first version of BigML’s add-on for Google Sheets. The post announcing it described how one could add predictions...
- Advertisment -