Updated: May 24, 2020
You may be asking “Why would I want to index and classify content when these capabilities already exist within M365?” This is really the question – is it better to classify-then-move or better to move-then-classify? M365 has a number of classification capabilities that can perform similar functions to file analysis tools. These classification mechanisms include finding content with:
Keywords or metadata values (keyword query language)
Previously identified patterns of sensitive information like social security, credit card, or bank account numbers (sensitive information types)
Recognition of an item because it's a variation on a template (document finger printing)
The presence of exact strings (exact data match)
Or if you are using “Trainable classifiers,” this method of classification is more about training a classifier to identify an item based on what the item is, not by elements that are in the item (pattern matching)
Artificial Intelligence, currently in preview stage as Cortex, helps build training sets based on known document characteristics and structures.
Indeed, if your content already resides in M365, these capabilities are available with the right license. The right approach comes down to three important factors: speed, scope and accuracy.
File analysis tools allow you to do many of these same things on shared drives and many other ungoverned locations with much less effort and time. Remember that indexing with file analysis tools is substantially faster and easier than migrating content to the cloud in order to get the same result. Any rules, queries or definitions you create in preparation of migration can also be easily adapted to be used from a point-forward basis for new content in M365.
For scope, think about classification tools and data mapping around when and where you need to classify. Some business functions, like eDiscovery, fraud investigations, FOIA requests, Subject Access Requests for GDPR, need to be performed across all your corporate data, structured and unstructured, governed and ungoverned, at the same time. If your classification tools only exist in M365 in the cloud, you may be missing a lot. If, however, you ONLY need to take a point-forward approach to classifying records in SharePoint online, that may be a better way to go.
Whether you classify first or migrate first depends on the method you intend to use to check and validate false positives and negatives. File analysis tools often have robust and mature methods for distributed visual review of results. You will need to determine if there is content you can classify into a retention category, especially when your first-pass attempt misses it. Often your SME’s will be the only ones who will know and you need to make it easy for them to review.
Whatever tool you use, a planned preparation work on your shared drives can help to improve accuracy and reduce the effort involved in bringing shared drives into the corpus of enterprise information. These recommendations do not require technology, but the more time in advance of an M365 project you can start to take on these tasks, the better and more accurate your final result will be.