Electronic Document Files

Electronic Document Files

Many electronic file formats contain extra information about the document itself, regardless of what the contents of the document actually are. This metadata usually gives clues as to the physical hardware and possibly the actual user account information of the creator or editors of a document, together with time / date stamps etc, any of which may be hazardous to the anonymity of a whistleblower.

On the other hand, this metadata embedded within a leaked document may provide the strongest clues as to its authenticity.

  1. Adobe .pdf documents have been published online, where some of the personal details e,g, email addresses have been "blacked out" using Adobe .pdf software , which has effectively simply put an extra layer on top of the supposedly censored words. Simply copying and pasting into say Windows Notepad or Wordpad or Word etc. has revealed the hidden data.

    Anybody publishing such stuff online needs to be aware of this, to protect their Home Office or other sources.

  2. See this Adobe Technical Note:Technical Redaction of Confidential Information in Electronic Documents - How to safely remove sensitive information from Microsoft Word documents and PDF Documents Using Adobe Acrobat (.pdf) (or from our local mirror copy here at ht4w)

  3. Similarly Adobe .pdf documents or Microsoft Word documents, Excel spreadsheets etc. may well have Meta information (see the Document Properties) showing the author of the leaked document (which may in turn lead back to the "leak source").

  4. Microsoft Word Documents, especially draft documents worked on by several people, often have the Version feature enabled. Sometimes examining the changes made to a document, and by whom gives extra clues about policies or coverups etc.

    The same feature on a whistleblower's own computer, could, of course betray their identity, by adding their default name properties to any document which they edit or view, before passing it on.

  5. Older versions of Microsoft Word (and other Office products like Excel or PowerPoint) can also betray the MAC Address of the Ethernet card of the computer on which a document was created or edited on, as part of the Global Unique ID data, embedded in the document. Most people will not have changed the MAC addresses of their computers (often possible through software), and there are likely to be inventory records or network logfiles which will pin point which MAC address belongs to which computer either at work or at home.

  6. Microsoft do now make available some tools to remove such GUID and other hidden meta data, versions, comments etc. from final published Microsoft Office products. e.g. the Microsoft Office 2003/XP Remove Hidden Data Add-in which removes most of, but not quite all of the Hidden File Data in Microsoft Word, Excel, and PowerPoint files. N.B. this does not work on Office 2007 files, but there seem to be built in Document Inspector settings, which do this as standard, but not by default.

    Types of data this add-in can remove

    The following types of data are removed automatically.

    * Comments.
    * Previous authors and editors.
    * User name.
    * Personal summary information.
    * Revision marks. The tool accepts all revisions specified in the document. As a result, the contents of the document will correspond to the Final Showing Markup view on the Reviewing toolbar.
    * Deleted text. This data is removed automatically.
    * Versions.
    * VB Macros. Descriptions and comments are removed from the modules.
    * The ID number used to identify your document for the purpose of merging changes back into the original document.
    * Routing slips.
    * E-mail headers.
    * Scenario comments.
    * Unique identifiers (Office 97 documents only).

    Note The Remove Hidden Data tool also turns on the Remove Personal Information feature. For more information on this feature, please search for "Remove Personal Information" in the application Help.

  7. The US National Security Agency has published a technical report: Redacting with Confidence: How to Safely Publish Sanitized Reports Converted From Word to PDF (.pdf) - (or from our I733-028R-2008.pdf local ht4w copy )

  8. See also Microsoft's Knowledge Base article KB223396 pointing to other articles about meta data in various Microsoft Office products: How to minimize metadata in Office documents

Obviously any journalist or blogger should double check that what they make available online does not contain identifiable clues to their anonymous sources, not just on the face of the published document, but within any "track changes" previous versions of a document, or document template as well.

Track Changes and Versions

  1. Remember that Microsoft Word has a "track changes" facility, which is useful when different versions of a document are written, edited or approved by more than one person. Several politically embarrassing Government leaks have happened because previously edited versions of words or paragraphs have been revealed by the public simply turning on the "show changes" option when they read it in Microsoft Word.

    The Liberal Democrat blog Home Office Watch reports on how the extremely controversial secret policy document regarding plans for "Big Brother" surveillance of millions of innocent people was revealed because someone forgot to turn off "track changes".

    As more journalists and political activists are becoming familiar with this feature or vulnerability, this may perhaps sometimes be a useful covert channel for information to be leaked to the media and the public, with a certain amount of "plausible deniability" for insider whistleblowers i.e. one document is effectively hidden within another, to a casual observer.

  2. The more recent versions Microsoft Word i.e. 2003, 2007 have a couple of Security / Privacy options which are worth enabling under the Tools / Options / Security menu.

    • Remove personal information from file properties on save (off by default)

    • Warn before printing, saving or sending a file that contains tracked changes or comments (off by default)

    • Store a random number to improve merge accuracy (on by default) - supposedly a harmless random number, but worth switching off if you are not merging documents with anything else.

    • Make hidden markup visible when opening or saving (on by default) - worth keeping on to let you check that you have successfully erased identifying personal data if necessary.

    N.B. "Authoring references not entered by the application are not removed automatically. For instance, those references entered through the use of field codes are not removed or changed. Or, if hidden text was used to tag a line, and the author of the hidden text embedded his or her initials or name in the hidden text, this reference is not removed because it is not an identified author reference."

Examples of Inept "Redaction" or Censorship

  1. Sometimes digital files simply copy and magnify the errors which are the result of people being under a time pressure deadline. See the inept redaction / censorship with a marker pen of a legal Exhibit document in the Bank Julius Baer versus Wikileaks court case in February 2008. The plaintiff's lawyers took a digital screendump of a web page, which they then printed out and tried to hide the name of one of their clients former customers, by using a black marker pen, and the digitally scanning the result and submitting it electronically to the Court, as an Adobe .pdf document. Apart from failing to redact or censor the postal address of the customer and the name of the customer in the heading of a page (printed in the largest typeface used in the document), they also failed to cover all the descending tails of the lower case letters in the name, which could have led to some intelligent guesswork. By digitally zooming in on the .pdf image scan, the name could be read through the fading marker pen ink overlay.

    See Lavely & Singer demonstrate how not to protect the confidentiality of customers of Bank Julius Baer

  2. Sometimes (.pdf) files have been "Redacted" or Censored by using the Drawing facility within the software to "paint" thick black lines over the text as an overlay. This has led to several "whistleblower leaks" of the hidden data, through the simple technique of copy and pasting the text out of the (.pdf) viewer software into a another application programme such as a text editor or word processor, which has then revealed the underlying words which have supposedly been hidden. e.g. the failed attempt to hide the IP Addresses of military and government computers, in a (.pdf) copy of a US Grand Jury indictment against the alleged UK computer hacker Gary McKinnon

  3. Sometimes the encryption and "protection" features used to hide information in an Adobe (.pdf) file can be overcome through password guessing etc. e.g. the Wikileaks.org publication of an unredacted version of South African Competition Commission's final Report on Banking, 12 Dec 2008

Document File MetaData

  1. The ExifTool Perl scripts or Windows binary executable which reads the meta data of image files, also displays it for Microsoft Word .doc, Excel .xls, Powerpoint .pps and Adobe .pdf files etc. as well. - see the Photo Image Files section

  2. You can examine (but not change or delete) such photo or document image metadata via this website, which is powered by the ExifTool perl script software: Jeffrey's Exif Viewer

Remember that sometimes a whistleblower or journalist or blogger needs to read and understand this sort of hidden meta data or document change history, to help to determine if the leaked document is genuine..

If the leaked document has not been edited on a computer which is linked in anyway to the whistleblower, then sometimes, the hidden meta data and "track changes" edits are in fact the main point of the whistleblower leak, perhaps showing evidence of a last minute reversal of Government or Corporate policy, or the censorship of independent expert advice, or even the outright fabrication of "facts" by political spin doctors in the final version of a document etc.

About this blog

We know that there are decent, honest, trustworthy individual politicians, civil servants, law enforcement, intelligence agency personnel and broadcast, print and internet journalists etc., who often feel powerless or trapped in the system. They need the assistance of external, detailed, informed, public scrutiny to help them to resist deliberate or unthinking policies, which erode our freedoms and liberties.

Some of these people will, in the public interest, act as whistleblowers, and may try to leak documents or information to the mainstream media, or to political blog websites etc.

Here are some Spy Blog "Hints and Tips", giving some basic preecautions, and some more obscure technical tips, which both whistleblowers, journalists, and bloggers need to be aware of, in order to help preserve the anonymity of whisteleblowing or other journalistic sources, especially in the United Kingdom, but applicable in other countries as well.

Whistleblower anonymity may not always be possible, or even necessary, forever into the future, but it is usuially crucial during at least the early stages of a "leak", whilst it is being evaluated by others, to see if it merits wider publication and publicity.

Email & PGP Contact

Please feel free to email your views about this blog, or news about the issues it tries to comment on.


Our PGP public encryption key is available for those correspondents who wish to send us news or information in confidence, and also for those of you who value your privacy, even if you have got nothing to hide.

You can download a free copy of the PGP encryption software from www.pgpi.org
(available for most of the common computer operating systems, and also in various Open Source versions like GPG).

We look forward to the day when UK Government Legislation, Press Releases and Emails etc. are Digitally Signed so that we can be assured that they are not fakes. Trusting that the digitally signed content makes any sense, is another matter entirely.


Tag Cloud

CryptoParty London

CryptoParty London

Most months there is a CryptoParty London event. where some of these Hints and Tips and other techniques are demonstrated and taught.

Usually at:

Juju's Bar and Stage 15 Hanbury St, E1 6QR, London

Follow on Twitter: @CryptoPartyLDN

Syndicate this site (XML):


Campaign Button Links

Watching Them, Watching Us, UK Public CCTV Surveillance Regulation Campaign
UK Public CCTV Surveillance Regulation Campaign

NO2ID Campaign - cross party opposition to the NuLabour Compulsory Biometric ID Card
NO2ID Campaign - cross party opposition to the NuLabour Compulsory Biometric ID Card and National Identity Register centralised database.

Gary McKinnon is facing extradition to the USA under the controversial Extradition Act 2003, without any prima facie evidence or charges brought against him in a UK court. Try him here in the UK, under UK law.
Gary McKinnon is facing extradition to the USA under the controversial Extradition Act 2003, without any prima facie evidence or charges brought against him in a UK court. Try him here in the UK, under UK law.

FreeFarid.com- - Kafkaesque extradition of Farid Hilali under the European Arrest Warrant to Spain

Peaceful resistance to the curtailment of our rights to Free Assembly and Free Speech in the SOCPA Designated Area around Parliament Square and beyond

Parliament Protest blog - resistance to the Designated Area restricting peaceful demonstrations or lobbying in the vicinity of Parliament.

Petition to the European Commission and European Parliament against their vague Data Retention plans
Data Retention is No Solution Petition to the European Commission and European Parliament against their vague Data Retention plans.

Open Rights Group

renew for freedom - renew your passport in 2006
Renew For Freedom - renew your Passport in the Summer Autumn of 2006.

The Big Opt Out Campaign - opt out of having your NHS Care Record medical records and personal details stored insecurely on a massive national centralised database.

Tor - the onion routing network
Tor - the onion routing network - "Tor aims to defend against traffic analysis, a form of network surveillance that threatens personal anonymity and privacy, confidential business activities and relationships, and state security. Communications are bounced around a distributed network of servers called onion routers, protecting you from websites that build profiles of your interests, local eavesdroppers that read your data or learn what sites you visit, and even the onion routers themselves."

Tor - the onion routing network
Anonymous Blogging with Wordpress and Tor - useful Guide published by Global Voices Advocacy with step by step software configuration screenshots (updated March 10th 2009).

Amnesty International's irrepressible.info campaign

BlogSafer - wiki with multilingual guides to anonymous blogging

NGO in a box - Security Edition privacy and security software tools

Home Office Watch blog, "a single repository of all the shambolic errors and mistakes made by the British Home Office compiled from Parliamentary Questions, news reports, and tip-offs by the Liberal Democrat Home Affairs team."

Reporters Without Borders - Reporters Sans Frontières - campaign for journalists 'and bloggers' freedom in repressive countries and war zones.

Committee to Protect Bloggers - "devoted to the protection of bloggers worldwide with a focus on highlighting the plight of bloggers threatened and imprisoned by their government."

Wikileaks.org - the controversial "uncensorable, anonymous whistleblowing" website based currently in Sweden.

Public Concern at Work - "(PCaW) is the independent authority on public interest whistleblowing. Established as a charity in 1993 following a series of scandals and disasters, PCaW has played a leading role in putting whistleblowing on the governance agenda and in developing legislation in the UK and abroad. All our work is informed by the free advice we offer to people with whistleblowing dilemmas and the professional support we provide to enlightened organisations."