Digital Transformation and Cloud Migration – The Trouble with PST files*
In the midst of a shared drive digital transformation project or a M365 cloud migration project, one of the biggest and most confounding pieces you will come across is the PST file. This article addresses the issues of bulk migration to a SharePoint online (rather than email) environment, but the learnings apply to other types of migrations. If your organization has just adopted M365 or SharePoint online or OneDrive and is using the lift-and-shift approach to get content into the cloud, from common shares or personal drives for example, please be conscious of PST file issues.
If you are not aware, the term PST comes from the file extension of an email archive file found in many places in today’s corporations. The PST archive contains emails, calendar events, contacts, notes and other things.
In many organizations in the past, when your email inbox became full, the IT department would let you know to delete your emails, or they would simply limit your inbox size. A common solution for many people was to begin archiving old emails into a PST file. The file was stored on common network shares or personal shares. The email application would periodically move older emails into the PST without any limits.
Your three options for cleaning or migrating PST content to the cloud are:
Open each PST to migrate each email to an Exchange environment. Assuming you can easily segregate the PSTs from the shared drives, that you are migrating and not just cleaning, and that you have a well-governed M365 environment in the cloud, this could be a reasonable but time-consuming option for end-users
Understand your PST content with file analysis tools to clean and cull on-premise, then migrate the good stuff
Lift-and-shift all files found on a network share to SharePoint, this is the most dangerous.
There are a number of serious problems (and solutions) with PST files:
1. They can become HUGE over time. Archiving to a PST does not solve the problem of email bloat, it just shifts it out of Exchange to a different, more hidden location. It also makes the PST a file system migration issue rather than an exchange migration issue.
2. All of this hugeness will appear to be extremely “active” on the file system. As an example, a 20-year-old email that shows up inside a PST file that was updated just last week will look active according to the file system. At one client, this issue made it appear that 72% of the ALL information stored on the network share was current as of the last 4 weeks (See the chart).
The last two bars on this chart contained many old and expired records. When you dig into the PST, you can actually see the item date but not otherwise. If your M365 migration strategy includes filters for leaving behind older content, you will not be filtering this archived data.
3. PSTs contain records. The PST format inhibits or prevents records management categorization, aging identification, disposition, and compliance. For autoclassification to work in M365, you will need to migrate individual email items and not just the higher-level PST files. Some organizations create project-based email boxes, which were smart from the RIM perspective. When the content relating to that project reaches retention the entire PST can be disposed of. Grouping communications by topic is one of the benefits of a Teams channel. In general, ensure your digital transformation or cloud migration project can examine and categorize the content of PST files so that you minimize the load and effort. If you have 50,000 PST files (or Lotus Notes databases, or old SharePoint sites for that matter) An examination of the content may allow you to exclude a large number from needing any action other than disposal.
4. Duplicates are prolific. Within the PST format, you will find around 20% - usually threads and in “sent items.” Between the PST and external systems, the numbers can be even greater, and it is the attachments that are duplicated. Someone would create a PowerPoint record, store it appropriately, then email a copy which eventually gets archived into a PST. Your project needs to look within PSTs and compare duplicates with external systems like network shares, legacy ECM, and older SharePoint sites.
5. PSTs contain risky data. Not so long ago, sending your credit card number in an email was a common way to pay for something like corporate travel. A few minutes browsing through the Enron Email sample data also brings up a lot of content that can be harmful and risky for a number of reasons.
6. PSTs contain a lot of junk. A duplicate thread analysis, non-business domain analysis, subscription analysis, and Out of Office analysis are common ways to cull the garbage out of an email data set. Also, consider what to do with content stored in “Deleted Items” – I’d recommend deleting it. Build some defensible culling practices into your digital transformation project.
7. All of this information stored on your shared drives is ironically the epitome of "un-sharable" content. Nobody knows what is inside another’s PST, nobody would ever want to open another’s PST. Users cannot collaborate on content stored in PSTs. When an employee leaves the organization, often that data becomes a large abandoned collection of useless risk.
Closing your eyes to the harmful and risky data does not absolve you of wasting storage resources and exposing your organization to risk. You shouldn’t ignore them, you shouldn’t lift-and-shift them, and you shouldn’t migrate without some good triage and filtering.
If you are trying to figure out a strategy about dealing with PST files, please reach out to our consulting team.
* Trouble with a capital "T" and that rhymes with "P" and that stands for PST!