In this insight, we look at how to best to avoid redacted text from being ‘unredacted’ by certain software tools, and we look at what researchers advise based on recent experiments.
The Problem
For businesses and organisations, the increased need for data sharing and/or making some data public can mean that certain (sensitive) parts of documents need to be obscured/obfuscated/censored for legal or security purposes (and to stop data leaks and fines). There are several different methods for achieving this in a document, including blurring, swirling, or pixelating letters and images. The issue is that some of these methods may not be effective enough and could, possibly, lead to the text being recovered/de-obfuscated using certain tools and methods e.g., the Depix tool or the ‘Unredacter’ tool. A python program like Depix, for example, is designed to recover censored text to a readable format via a simple command, and this type of tool in the wrong hand could potentially lead to a security breach.
Challenge Issued
The challenge of testing the level of security of pixelated text is something that researchers have focused on for some time. For example, researchers at a company called Jumpsec tested the Depix tool to see if it could recover text that has been pixelated. The results broadly showed that:
– Using the supplied examples, text redaction with Depix was possible to a reasonable degree.
– Using original content (not the author’s supplied example), and after taking a long time, Depix failed to recover the obfuscated text.
It was concluded that The Depix tool poses minimal risk to security at present, as it requires specific criteria to be met to be effective BUT there is a small chance that users can depixelate images using the tool.
Jumpsec then issued (2021) an Internet challenge for someone to develop a tool that could effectively recover censored text to a readable format.
Bishop Fox Research
The challenge was accepted by Dan Petro, Lead Researcher at US security company Bishop Fox. Mr Petro built his own ‘Unredacter’ tool and tested it in a similar way to the Depix tool.
Mr Petro noted that pixelation tools use an algorithm to divide an image into a grid of a given block size (e.g. 8×8) and, for each block, the redacted image’s colour is set to be equal to the average colour of the original for that same area. This “smears” the information of the image out across each block and, although it can work, it has several problems. These include characters not lining up with the blocks and bleeding over, problems with white spacing, and problems with variable-width fonts, and font inconsistency.
The ‘Unredacter’ Tool
The ‘Unredacter’ Tool created by the Bishop Fox researchers, however, solved many of the problems that the Depix tool had encountered, and was able to recover the text in a test image to a reasonable degree.
The Conclusions
The conclusions of both the Jumpsec Labs and the Bishop Fox text recovery tool experiments were the same. Both advise that, when redacting text, only use black bars covering the whole text. Never use other methods such as pixelisation, blurring, fuzzing, or swirling, and edit the text as an image. Bishop Fox’s Mr Petro also advises that using black background with black text in a Word document means that the text can still be read that just by highlighting it. This means that is not a secure method and could lead to the accidental leak of sensitive information because of an insecure redaction technique.
What Does This Mean For Your Business?
There are now so many ways that a data security breach could happen and although using an insecure redaction technique may seem like a more unusual one, the result could be just as devastating as other more popular types of breaches. The lessons for businesses resulting from this research are that software could possibly be used to uncover redacted text and that relying upon fast methods such as using a black background with black text is ineffective and very risky. The research shows that businesses can best protect themselves from this threat by editing the text as an image and by only using black bars covering the whole text.