You are on page 1of 15

The Leading Provider of Data Leakage Prevention,

Hidden Data Removal, and Privacy Software Solutions.


http://www.digitalconfidence.com

BatchPurifier 7.6
User Guide
Table of Contents

Introduction…..................................................................................................................................... 3

Using BatchPurifier............................................................................................................................. 5

Appendix A. Hidden Data Types........................................................................................................ 9

Appendix B. Frequently Asked Questions........................................................................................ 14

© 2020 Digital Confidence Ltd. All rights reserved.


Introduction
BatchPurifier™ is a batch hidden data removal tool. It is powered by Digital Confidence
DataDistiller™ Engine which can remove the following types of hidden data:
Supported File Formats Extensions Supported Hidden Data Types

Microsoft® Word Document .docx;.docm Document Properties; Comments; Comment


Authors; Tracked Changes; Hidden Text; Custom
XML; Printer Settings

Microsoft Excel® Workbook .xlsx;.xlsm Document Properties; Comments; Comment


Authors; Tracked Changes; Hidden Worksheets;
Hidden Rows and Columns; Custom XML;
Printer Settings

Microsoft PowerPoint® Presentation .pptx;.pptm;.ppsx Document Properties; Comments; Comment


Authors; Slide Notes; Hidden Slides; Off-Slide
Content; Custom XML

OpenDocument Text .odt Document Properties; Comments; Tracked


Changes; Versions

OpenDocument Spreadsheet .ods Document Properties; Comments; Tracked


Changes; Versions

OpenDocument Presentation .odp Document Properties; Slide Notes; Hidden Slides;


Versions

OpenDocument Graphics .odg Document Properties; Versions

PDF Document (partial support*) .pdf Metadata (including XMP)

JPEG Image .jpeg;.jpg EXIF (including thumbnail); Photoshop image


resources (including IPTC); XMP; Comments;
ICC Profile; Adobe APP14 tag; JFIF header;
Other Hidden Data

JPEG 2000 Image .jp2 Metadata (including native metadata, XMP, and
other hidden data)

PNG Image .png Metadata (including XMP)

SVG Image .svg;.svgz Metadata

AVI Video .avi INFO; XMP; Other Hidden Data

MP3 Audio .mp3 ID3v1 tag and Lyrics3v2; ID3v2 tag (including
XMP); APE tag

MP4 File .mp4;.m4a; .m4v;.m4 Metadata (including native metadata, XMP, and
b other hidden data)

3GP File .3gp Metadata

WAVE / BWF Audio .wav INFO; EXIF; Broadcast Audio Extension;


AXML; iXML; XMP; ID3; Cart Chunk; Other
Hidden Data

F4V File .f4v Metadata (including native metadata, XMP, and


Supported File Formats Extensions Supported Hidden Data Types

other hidden data)

AIFF Audio .aif;.aiff Native Metadata; ID3; Other Hidden Data

Monkey's Audio .ape APE tag

Musepack .mpc;.mpp;.mp+ APE tag

OptimFROG .ofr;.ofs APE tag

WavPack .wv ID3v1 tag; APE tag

Tom's Audio Kompressor Audio .tak APE tag

XML/XSD/XSL File .xml;.xsd;.xsl Comments

* Metadata can be removed only from some of the PDF versions, which are the most commonly used. See Appendix A.
Hidden Data Types for more details.

For detailed explanation about the hidden data & metadata types that Digital Confidence
DataDistiller™ Engine, which powers BatchPurifier™, is able to remove, see Appendix A. Hidden
Data Types.
Using BatchPurifier
To remove hidden data from files using BatchPurifier™, follow four simple steps:
1. Select the files to be purified
2. Select the hidden data types to be removed
3. Select metadata to be preserved
4. Specify output options for the purified files
BatchPurifier™ will then inspect the files for the hidden data types that the user chose to remove,
remove them while keeping the rest of the data intact, and save the purified files according to the
output options.
In case the user chose to save the purified files in an output folder, instead of overwriting the
scanned files, and a file with the same name of a scanned file already exist in that folder, the
purified file name will be appended with a number.

The Hidden Data Filters Selection screen of BatchPurifier

In the Hidden Data Filters Selection screen, you can select the hidden data types to be removed
from the chosen files. You can save a selection as a Preset, for easy future re-selection.
Purification Report
When the files purification is finished, a report is presented to the user with a list of Files
Successfully Purified in one tab. In case some files couldn't be purified, they appear in the Files
Couldn't be Purified list in a second tab along with the reason. Possible reasons are:
• Unexpected format – e.g. the file is corrupted, or the file extension doesn't match its true
type
• Unauthorized access – e.g. the file is cannot be written to a particular folder due to lack of
security privileges
• Inaccessible file – e.g. the file is open by another application
• Read-only file – the file is marked as Read-only and you chose to overwrite the input files
with the output files
• Hidden file – the file is marked as Hidden and you chose to overwrite the input files with
the output files
• Encrypted file – the file is encrypted
• Unsupported PDF version – there are several versions of PDF files. BatchPurifier™ can
remove metadata only from some of the PDF versions, which are the most commonly used.
These include PDF files generated by Microsoft Office 2007-2019 and OpenOffice. PDF
files generated by some PDF writing software in the market cannot be cleaned with
BatchPurifier™. In particular, PDF files generated by the latest Adobe software, such as
Adobe Acrobat Pro cannot be cleaned with BatchPurifier™.
• Unsupported AVI version – BatchPurifier™ does not support AVI files which use a feature
called Multipart OpenDML AVI, which is necessary for AVI files larger than 2 GB. (but may
be used for smaller files too)
Configuring BatchPurifier
You can configure BatchPurifier by clicking the Options button on the lower left corner. There are
two Options tabs: General and Advanced.
The General tab lets you configure BatchPurifier to use certain purification options by default.

The General Options tab

The Advanced tab lets you include three hidden data filters for JPEG files. These hidden data
generally does not include private information, and removing them may affect the appearance of the
image.

The Advanced Options tab


Using BatchPurifier from Shell Menu
Files and folders can also be sent to BatchPurifier™ for purification from the “Send To” menu. To
open the “Send To” menu, select the files that you want to purify, right click on them with the
mouse, and select “BatchPurifier” from the “Send To” sub-menu.

Sending files to BatchPurifier for purification


Appendix A. Hidden Data Types
Digital Confidence DataDistiller™ Engine, which powers BatchPurifier™, is able to remove
metadata & hidden data from the following file types:
• Microsoft® Word Documents
• Microsoft Excel® Workbooks
• Microsoft PowerPoint® Presentations
• OpenDocument Text Documents
• OpenDocument Spreadsheets
• OpenDocument Presentations
• OpenDocument Graphics
• PDF Documents
• JPEG Images
• JPEG 2000 Images
• PNG Images
• SVG Images
• AVI Video Files
• MP3 Audio Files
• MP4 Files
• 3GP Files
• WAVE / BWF Audio
• F4V Files
• AIFF Audio Files
• Monkey's Audio
• Musepack
• OptimFROG
• WavPack
• Tom's Audio Kompressor Audio
• XML/XSD/XSL Files

Digital Confidence DataDistiller™ Engine is able to remove metadata & hidden data of the
following types:

Document Properties
Applicable to Microsoft® Word, Microsoft Excel®, Microsoft PowerPoint®, OpenDocument Text, OpenDocument
Spreadsheet, OpenDocument Presentation, and OpenDocument Graphics.
Metadata that includes details such as author name, title, subject, keywords, category, status,
comments, revision number, and total editing time. Document properties may also include user
defined custom properties, a non-standard metadata that can be added to a document.
Comments
Applicable to Microsoft® Word, Microsoft Excel®, Microsoft PowerPoint®, OpenDocument Text, and OpenDocument
Spreadsheet.
Comments that were added to the document. With each comment, the name of the user who added it
and the date and time in which it was added are also saved.

Tracked Changes
Applicable to Microsoft® Word, Microsoft Excel®, OpenDocument Text, and OpenDocument Spreadsheet.
Tracked changes are changes made to the document while the Track Changes option was enabled.
This include inserted, deleted, modified, and moved text. Every change is saved with the name of
the user who made the change, as well as the date and time in which the change occurred. If the
tracked changes are not removed from the document, previous versions of the document can still be
viewed.

Hidden Text
Applicable to Microsoft® Word.
Text can be formatted as hidden so it won't be printed. Hidden text will not appear on the screen as
well unless the application is specifically set to show it.

Slide Notes
Applicable to Microsoft PowerPoint® and OpenDocument Presentation.
Slide notes are notes that were added to the slides for oral presentation and are not visible in the
slides themselves.

Hidden Slides
Applicable to Microsoft PowerPoint®.
Hidden slides are slides that were marked as hidden are not presented in the slide show.

Off-Slide Content
Applicable to Microsoft PowerPoint®.
Off-slide content is content that have been placed outside the slide area and is not presented in the
slide show.

Hidden Worksheets
Applicable to Microsoft Excel®.
Hidden worksheets are worksheets that were marked as hidden. Hidden worksheets will not appear
on screen.

Hidden Rows and Columns


Applicable to Microsoft Excel®.
Hidden rows and columns are rows and columns that were marked as hidden. Hidden rows and
.columns will not appear on screen
Custom XML
Applicable to Microsoft® Word, Microsoft Excel®, and Microsoft PowerPoint®.
Custom XML data can be added to a document by another application like a document management
system. Custom XML can include Custom XML properties.

Printer Settings
Applicable to Microsoft® Word, and Microsoft Excel®.
Contains information about a printer or a display device, including its name.

Versions
Applicable to OpenDocument Text, OpenDocument Spreadsheet, OpenDocument Presentation, and OpenDocument
Graphics.
Several versions of the document can be saved by the user in a single file.

PDF Metadata
Applicable to PDF documents.
PDF documents typically contain document information and XMP metadata. In addition, due to
performance considerations, deleted objects are sometimes left in the file and only marked as
deleted. Although this makes them invisible when viewed in a standard PDF reader, it is still
possible to retrieve them from the file.
There are several versions of PDF files. Currently, DataDistiller™ Engine can remove hidden data
only from some of the PDF versions, which are the most commonly used. This includes PDF files
generated by Microsoft Office, OpenOffice, and PDFCreator. PDF files generated by minority of
the PDF writing software in the market today cannot be cleaned with the current version of
DataDistiller™ Engine. In particular, PDF files generated by the latest Adobe software, such as
Adobe Acrobat Pro cannot be cleaned with the current version of DataDistiller™ Engine.

JPEG Metadata
Applicable to JPEG images.
JPEG images may contain the following types of hidden data: EXIF (Exchangeable image file
format), IPTC Information Interchange Model, XMP (Extensible Metadata Platform), comments,
and ICC Profile. JPEG may contain additional non-standard proprietary hidden data.
JPEG metadata are added automatically by digital cameras, scanners, and image processing
software. This metadata often contains information such as the exact date and time the photograph
was taken, the digital camera manufacturer, model, and unique serial number, the camera settings,
and the location (if GPS-enabled camera was used). Furthermore, a thumbnail of the image often
exist in the JPEG file, and many image manipulation software fail to update this thumbnail when
the original image is modified. So even if the image was cropped, or otherwise modified to hide
certain parts in it, the removed parts may still be visible in the thumbnail.
DataDistiller™ Engine can remove metadata from JPEG files without degrading the image quality.

PNG Metadata
Applicable to PNG images.
PNG metadata can contain various details about the image, such as the author, the editing software,
and the time and date in which it was created. The metadata can also be structured within XMP
(Extensible Metadata Platform).
DataDistiller™ Engine can remove metadata from PNG files without degrading the image quality.

SVG Metadata
Applicable to SVG images.
SVG metadata can contain various details about the image, such as the author, the editing software,
and the time and date in which it was created.
DataDistiller™ Engine can remove metadata from SVG files without degrading the image quality.

AVI Metadata
Applicable to AVI video files.
AVI metadata can contain various details about the video, such as the author, the camera and
software used, and the time and date in which it was created.
DataDistiller™ Engine can remove metadata from AVI files without degrading the video quality.

ID3v1 Tag
Applicable to MP3 and WavPack.
ID3v1 tag typically contains information such as title, artist, album, genre, and track number.

ID3v2 Tag
Applicable to MP3, WAVE, and AIFF.
ID3v1 tag typically contains information such as title, artist, album, genre, and track number.

WAVE / BWF Metadata


Applicable to WAVE / Broadcast Wave Format audio files.
The metadata of WAVE / Broadcast Wave Format (BWF) files can contain various details about the
audio files, such as the author, the software used, and the time and date in which they were created.
The metadata can be structured in INFO, EXIF, XMP, iXML, axml, Broadcast Audio Extension
(Bext), ID3, and Cart chunk formats.
DataDistiller™ Engine can remove metadata from WAVE / BWF files without degrading the sound
quality.

AIFF Metadata
Applicable to AIFF audio files.
AIFF metadata can contain various details about the audio file, such as the author, the software
used, and the time and date in which it was created. The metadata can also be structured in ID3
format.
DataDistiller™ Engine can remove metadata from AIFF files without degrading the sound quality.

MP4 Metadata
Applicable to MP4 files.
MP4 metadata can contain various details about the file author, the software used in its creation, and
the time and date in which it was created. The metadata can also be structured in XMP format.
Removing metadata from MP4 files will not degrade the video or audio qualities.

F4V Metadata
Applicable to F4V files.
F4V metadata can contain various details about the file author, the software used in its creation, and
the time and date in which it was created. The metadata can also be structured in XMP format.
Removing metadata from F4V files will not degrade the media quality.

APE Tag
Applicable to Monkey's Audio, MP3, Musepack, OptimFROG, WavPack, and Tom's Audio Kompressor Audio.
APE tag typically contains information such as title, artist, album, genre, and track number.

XML/XSD/XSL Comments
Applicable to XML/XSD/XSL files.
Comments that were added to the file.
Appendix B. Frequently Asked Questions

What Are Metadata And Hidden Data?


The word metadata means "data about other data". Metadata is commonly embedded in electronic
files and contain various types of information such as the creator name and the organization he
belongs to, the creation and modification times, and the appliance or software used in the creation
and processing of the file. This metadata is usually generated automatically by the software or
appliance used to create the file, often without the user is even aware of it.
Hidden data in a file refers to every type of data that is not visible at all when using a standard
viewer, or under certain settings, even though it does reside in the file and can be viewed by
changing the viewer setting or by using special software to reveal the hidden data. Common hidden
data types include comments, document revision history, and presentation notes. Many applications
also embed various application specific hidden data.
Strictly speaking, in most applications metadata is one type of hidden data, however, often the two
terms are used interchangeably.

Where Does Metadata And Hidden Data Can Be Found?


Virtually every popular file format contains hidden data and metadata, including Microsoft Word,
Excel®, and PowerPoint® documents, OpenOffice documents, and PDF documents. Metadata can
also be found in various image and media file types such as JPEG, JPEG 2000, PNG, SVG, AVI,
WAV, AIFF, MP3, and MP4.

What Risks Does Metadata And Hidden Data Pose?


While hidden data and metadata are useful for finding files and reviewing documents, they pose
privacy and confidentiality risks when the files are shared. The hidden data often contains private
and sensitive information, that if unintentionally exposed can cause the document creator and his
organization embarrassment with possible financial and legal implications.

Why Does Document Inspector Indicates That Document


Properties Are Present In a Purified Document?
The Document Inspector that is included with Microsoft Office® 2007 will indicate that “Document
properties and Personal Information” are present in a document, even if that document was cleaned
with BatchPurifier. But this is a false alarm. Document Inspector and BatchPurifier clean
documents in a different manner, so when Document Inspector is used to inspect a document that
was cleaned with BatchPurifier, the document will not meet its “expectation” even if the Document
Properties are completely gone. In fact, BatchPurifier removes Document Properties more
thoroughly than Document Inspector.
Why Does The Windows' Properties Viewer Shows
Properties Like “Owner” And “Computer” of a Purified File?
The details under the “File” section in “Details” tab of
Windows' file properties viewer, that can be accessed by right-
clicking on a file and selecting Properties, such as “Date
created”, “Date modified”, “Owner”, and “Computer”, are not
part of the file itself. If the file will be sent to another
computer, the shown details will be different. Thus, they
cannot be removed from the file as they were never part of the
file to begin with. These properties are like labels on a book
shelf. If a book is taken out, the labels will not go along.

Properties under the “File” section are not


part of the file itself

How Is BatchPurifier Different Than Windows' Built-in


Remove Properties and Personal Information Feature?
BatchPurifier supports much more file types and hidden data types and cleans files much more
thoroughly than the Remove Properties and Personal Information feature that was introduced in
Windows Vista. BatchPurifier also provides better convenience in cleaning multiple files of
different types from different locations at once.

Can BatchPurifier Remove Hidden Data from the Old Office®


XP/2003 (e.g. .doc, .xls, .ppt) Files?
BatchPurifier is able to remove hidden data & metadata from the new Open XML format, which
was introduced as a native format in Microsoft Office® 2007, and its files are recognized by their
four letter extensions which ends with 'x' or 'm' (e.g. .docx, .xlsx, .pptx). The older binary formats
that were used in Office® XP/2003 (e.g. .doc, .xls, .ppt) are not supported.

You might also like