Do not miss new blog posts! Subscribe to new posts, news, and updates.

  • Brian Tuemmler

How to plan content migration with auto-categorization/indexing tools?

Updated: Sep 6

A wealth of information about an organization's history, knowledge, and value resides on shared network drives and other ungoverned locations. Most organizations have no idea how to tap into that wealth to benefit the enterprise. When establishing an information governance program for O365, companies face a myriad of issues related to this legacy unstructured information:

  • How much cannot or should not be migrated, to save costs and effort? Most organizations are keeping 30-40% of useless content.

  • What has already been captured or not captured and how compliant are they?

  • What can this information tell us about metadata, labels, usage, ownership, risk and other O365 design issues before the plan begins?

  • What will the real costs and benefits from a good governance program be based on actual data?

  • Where should they start and how should they prioritize deployment and migration?

Infotechtion Information Management services focus on content residing outside of traditional governed locations like O365.  The primary goals are to assess the most appropriate ways to minimize data, organize it for effective classification, and move it to more secured and protected environment, particularly O365.  We use enterprise class indexing and inventory tools and bring years of experience to bear when guiding organizations to better governance programs.


1. Requirements and scope definition

  • Review existing O365 Governance, taxonomy, labels, content scope and functionality

  • Review coverage for existing or opportunistic regulatory examination including PII, health, financials, energy audit or HR requirements

  • Review and revise policy authority, controls, and retention schedule definitions as necessary to ensure appropriate deletion activities are compliant

  • Review security policy and scope to identify definitions for valuable and “protectable” content

2. Setup, test and install indexing software tools

  • Establish operational platform and locations, taking into account speed and performance issues, GDPR cross border data requirements and technical resources

  • Determine indexing speed requirements for license sizing

  • Gather and document credentialing accounts and access authorization for shared drives, emails, cloud accounts, social media, structured data as necessary

  • Install and test.  Create sample data collection based on above requirements

  • Determine data sets and groupings for effective analysis

3. Configure indexing tools to meet requirements

  • Build, test, validate query architecture

  • Add/create regular expressions, word lists, dashboards, scripts

  • Determine and program remediation policy, actions, and documentation

4. Run Index – Context index with no text on the first pass

  • Monitor high level index(es) of data sources

  • Analysis of file types, volumes, risk areas, ownership file handling practices, duplicates, migratable content, any other issues.

  • Tagging and remediation (exclusion, deletion, sequestering) of content as necessary

  • Recommendations for deep analysis

5. Run Index – Selected full text and recommendations

  • Deep index of files to extract text, multimedia details, audio details, OCR text, sensitive patterns, etc.

  • Analysis of formats, clusters, word lists, classifiers, topics, discussions, timelines

  • Clarify and validate findings with stakeholders

  • Review analysis and results, and present to management

  • Provide recommendations for O365 governance planning including optimal site functionality, technology, deployment priority, metadata and labeling, volume and quota metrics, security, ownership and responsibility

6. Classify content

  • Selected retention categoriesSelected regulatory and security categories

  • Tag and label content for disposition, loss protection, migration.

  • Disposition reports usually include:

  1. Content to be deleted/confirmed respecting a policy

  2. Content to be deleted/confirmed after departmental review

  3. Content to be deleted/confirmed by individuals if appropriate

7. Develop and export load file details for migration

  • Create load scripts based on site functionality and deployment schedule

  • Include value added labels and additional file metadata

8 Respond to additional IG specific file management activities relating to unstructured and ungoverned data sets

  • GDPR subject access requests

  • Mergers and divestitures

  • Litigation

  • Internal security or fraud investigations

Tasks 4 and 5 are iterative and on-going based on volumes and priorities.

Tasks 6 and 7 are iterative and based on site deployment schedule.


Feel free to contact me if you need help establishing a content migration strategy and plan.

© Infotechtion                                                                                 Privacy Policy