The growth of dark data

Published by Chris Butterworth
on 06/07/2022

Gartner defines dark data as “the information assets organisations collect, process and store during regular business activities, but generally fail to use for other purposes”.

Comparisons can be drawn to the physics principle of dark matter, which makes up 50% of all matter, but dark data can actually represent 90% of a business’ data according to Carnegie Mellon University’s Heinz College. This is backed up by IDC, which also stated that dark data could even comprise up to 97% of all data, and that number could move past 99% if this trend continues. 

The types of dark data 

Dark data can be comprised of many different types of data, from multiple sources, whether from running a digital product or service, or general day-to-day operations. This can include: 

  • Logs: Pretty much every digital service gets recorded in some way. Whether a request for a file on a web server, loading a web page, tracking events in an analytics service, or anything else similar. These logs can be incredibly useful when measuring the success of campaigns and how better to shape your product or service but are rarely used to their full potential. 
  • Messages and Emails: Every instant message you’ve ever sent is most likely saved somewhere. Imagine that; going in to send a quick “ok” to a colleague and being able to read back an entire two-year conversation. With the popularisation of near unlimited storage for online office services, the same can be said for emails. While a handful of these messages may be useful, having them saved in a separate file, database or note-taking app is more useful, simply by being easier to find.  
  • Old files/notes: While we’ve literally just said that notes are better than emails, notes can still become dark data when they are no longer relevant. Other files can quickly become dark data too. Old presentations, old contracts, old files become redundant and can take up space when they are no longer necessary. Other examples of dark data within this category are file duplication and video call recordings. Having multiple copies of the same file can lead to a huge confusion, and often tends to mean there’s another duplicate marked or named as the final version. Recording video calls to guarantee nothing is missed, to be transcribed later or to ensure anybody who missed it can catch up is a great idea, but nobody makes the decision to delete the file, so it stays there forever (or at least until space runs out or there’s an overhaul.)  
  • Other data: Other types of dark data can include call records, survey data, financial records, geolocation data (if any is collected), or even surveillance footage for your office building if any recording is done. Again, these are collected, recorded, stored and kept most of the time, even after becoming redundant - surveillance may be an exception to this, when running a closed circuit or reusing the storage to prevent old footage from being kept. 

The benefits of reducing dark data 

When it comes to emails and files, streamlining them actually offers a functional benefit, as it makes things easier to find, as removing old and redundant files means there are less files to search through. 

Dark data is becoming a big issue when it comes to storage, as we are soon predicted to reach 180 zettabytes (180,000,000,000,000,000,000,000 bytes) of data, which requires a huge level of resource and energy to store, with a lot being considered waste given the amount of dark data being stored. 

While efforts are being made to create new storage methods to reduce the physical space and resources needed, having policies and procedures for deleting or archiving (taking offline) these files is definitely the most effective way of reducing this dark data.  

Help streamline your organisation’s cloud storage, and you’ll not only make the organisation more efficient, but you’ll also reduce the carbon emissions that storage causes. 

It’s time to step out of the dark.