How to plan content migration with auto-categorization/indexing tools?
Updated: Sep 6, 2020
A wealth of information about an organization's history, knowledge, and value resides on shared network drives and other ungoverned locations. Most organizations have no idea how to tap into that wealth to benefit the enterprise. When establishing an information governance program for O365, companies face a myriad of issues related to this legacy unstructured information:
How much cannot or should not be migrated, to save costs and effort? Most organizations are keeping 30-40% of useless content.
What has already been captured or not captured and how compliant are they?
What can this information tell us about metadata, labels, usage, ownership, risk and other O365 design issues before the plan begins?
What will the real costs and benefits from a good governance program be based on actual data?
Where should they start and how should they prioritize deployment and migration?
Infotechtion Information Management services focus on content residing outside of traditional governed locations like O365. The primary goals are to assess the most appropriate ways to minimize data, organize it for effective classification, and move it to more secured and protected environment, particularly O365. We use enterprise class indexing and inventory tools and bring years of experience to bear when guiding organizations to better governance programs.
1. Requirements and scope definition
Review existing O365 Governance, taxonomy, labels, content scope and functionality
Review coverage for existing or opportunistic regulatory examination including PII, health, financials, energy audit or HR requirements
Review and revise policy authority, controls, and retention schedule definitions as necessary to ensure appropriate deletion activities are compliant
Review security policy and scope to identify definitions for valuable and “protectable” content
2. Setup, test and install indexing software tools
Establish operational platform and locations, taking into account speed and performance issues, GDPR cross border data requirements and technical resources
Determine indexing speed requirements for license sizing
Gather and document credentialing accounts and access authorization for shared drives, emails, cloud accounts, social media, structured data as necessary
Install and test. Create sample data collection based on above requirements
Determine data sets and groupings for effective analysis
3. Configure indexing tools to meet requirements
Build, test, validate query architecture
Add/create regular expressions, word lists, dashboards, scripts
Determine and program remediation policy, actions, and documentation
4. Run Index – Context index with no text on the first pass
Monitor high level index(es) of data sources
Analysis of file types, volumes, risk areas, ownership file handling practices, duplicates, migratable content, any other issues.
Tagging and remediation (exclusion, deletion, sequestering) of content as necessary
Recommendations for deep analysis
5. Run Index – Selected full text and recommendations
Deep index of files to extract text, multimedia details, audio details, OCR text, sensitive patterns, etc.
Analysis of formats, clusters, word lists, classifiers, topics, discussions, timelines
Clarify and validate findings with stakeholders
Review analysis and results, and present to management
Provide recommendations for O365 governance planning including optimal site functionality, technology, deployment priority, metadata and labeling, volume and quota metrics, security, ownership and responsibility
6. Classify content
Selected retention categoriesSelected regulatory and security categories
Tag and label content for disposition, loss protection, migration.
Disposition reports usually include:
Content to be deleted/confirmed respecting a policy
Content to be deleted/confirmed after departmental review
Content to be deleted/confirmed by individuals if appropriate
7. Develop and export load file details for migration
Create load scripts based on site functionality and deployment schedule
Include value added labels and additional file metadata
8 Respond to additional IG specific file management activities relating to unstructured and ungoverned data sets
GDPR subject access requests
Mergers and divestitures
Internal security or fraud investigations
Tasks 4 and 5 are iterative and on-going based on volumes and priorities.
Tasks 6 and 7 are iterative and based on site deployment schedule.
Feel free to contact me if you need help establishing a content migration strategy and plan.