Cross-posted with my company’s blog:
http://www.mimosasystems.com/blog/archiving/data-management-challenges-with-sharepoint/
As a veteran Microsoft Exchange and SharePoint expert I have witnessed the exponential growth of SharePoint over the last few years. As a consultant, before I joined Mimosa, I strongly encouraged my clients to stop using file shares and to stop emailing out 20 MB PowerPoint attachments that are going to be out of date within a few moments. SharePoint is a great collaboration product and I’ve seen some amazing solutions built around it.
Microsoft did an excellent job with WSS 3.0 & MOSS 2007, but they missed a few key things from a data management standpoint. We finally got a Recycle Bin in the 2007 version to help on “oops, I didn’t mean to delete that” issues. However, if you’re a SharePoint site admin and you delete a site or items have been purged from the Recycle Bin, then they are gone. Once a site is gone you could recover it from a site level export (kind of like brick level backup with Exchange), but in most cases one is not going to exist. So in those cases, someone in IT is going to have to restore the entire SharePoint farm and then export the site or individual item. This is not an easy process; first an entire sever must be built, then the IIS settings must all be recovered, then the SharePoint database, and finally SharePoint must be re-installed and configured before any of the data can be accessed. Lastly, the only built-in option to recover list items is to copy and paste their contents.
SharePoint also has great versioning, but we all know people don’t clean-up their digital data well. While SharePoint does have the ability to configure a document library to only save X versions, this requires the person who created the document library to first know to set this option. So if you were a good e-mail citizen and saved that 20MB PPT to SharePoint and then 10 people made minor edits, that “one” PPT is now taking up 200MB in SharePoint. This is because SharePoint doesn’t save the deltas of the versions. In addition, SharePoint doesn’t have single instancing support, so if 10 people were to save the same 20MB document to 10 different document libraries, sites, or even folders in the same library, 200MB would be used up. As well, files in document libraries may be attached to list items and this is another situation that causes duplication. To make this storage challenge an even a bigger issue, files are stored in the SQL database, with all of the other metadata for SharePoint. So as more files and versions get created you can certainly expect your backup and recovery times to increase. Adding to the issue, SharePoint does not provide a way to easily clean-up old/retired data or a way to remove rarely used files from SQL without removing all traces of them. So those 10 versions of the PPT will continue to take up space in SQL well after the project they were created for is finished and forgotten.
Lastly, SharePoint does not support replication. Until Microsoft adds this support I think it’s going to be very hard to completely replace Public Folders in Exchange, which do support replication. This is a major challenge for those geographically dispersed organizations with limited bandwidth at some locations. In order to address this issue today those organizations are forced to keep using Exchange, file shares, or local SharePoint servers to share this data. This of course added to the duplication of data and the challenges of managing distributed data.
Today Mimosa can help address some of those data duplication challenges with our Exchange Archiving and File System Archiving solutions built around Mimosa NearPoint. Both of these solutions support data de-duplication via global single instancing in the archiving and stubbing of attachments and files.
For organizations that really invest in SharePoint, these data management challenges will normally be an unexpected cost. But even with those costs it’s still much better than using file shares and sending large attachments in email.
What other data management challenges in relation to SharePoint are out there? Add a comment and let us know your thoughts