The Symbolic Maze!
October 7, 2010
In this post we’ll exercise the symbolic execution engine KLEE over a funny ASCII Maze (yet another toy example)!
The match is between a tiny maze-like game coded in C versus the full-fledged LLVM based symbolic execution engine, KLEE.
How many solutions do you think it has?
The Maze
The thing is coded in C and the impatient can download it from here. This simple ASCII game asks you first to feed it with directions. You should enter them as a batch list of actions. As “usual”; a is Left, d is Right, w is Up and s is Down. It has this looks …
Player pos: 1x4 Iteration no. 2. Action: s. +-+---+---+ |X| |#| |X| --+ | | |X| | | | |X+-- | | | | | | +-----+---+
It’s really small I know! But the code hides a nasty trick, and at the end, you’ll see, it has more than one way to solve it.
The KLEE
KLEE is a symbolic interpreter of LLVM bitcode. It runs code compiled/assembled into LLVM symbolically. That’s running a program considering its input(or some other variables) to be symbols instead of concrete values like 100 or “cacho”. In very few words, a symbolic execution runs through the code propagating symbols and conditions; forking execution at symbol dependant branches and asking the companion SMT solver for path feasibility or counter-examples. For more info on this check out this, this or even this.
Find it interesting? Keep reading!
Read the rest of this entry »
PDF stats
August 26, 2010
This is an example use of the opaflib. The script described here use opaflib to get some statistics about the different PDF objects that appear in you file stash. This 2 charts show the appearing frequencies of Filters and Object types in a 10Mbyte small database of a random pdf selection.
So it is better for your fuzzing base that this numbers seem even, otherwise you’ll be testing the same thing over and over.
Keep reading for more exciting details!!! WEEEEEEEEEEEE!
Read the rest of this entry »
Opaf!
August 23, 2010
It’s an Open PDF Analysis Framework!
Its written in python using PLY parser generator. The project page is here and you can get the code from here:
Keep reading for a test run…
Read the rest of this entry »
PDF sequential parsing
August 22, 2010
As discussed in earlier posts the problem with PDF is that we can not apply an out-of-the-box scanner/parser design pattern. It won’t let you scan it properly. The size of a PDF stream is hard to be decided at scanner/lexer time. I’ve suggested the solution of escaping the “endstream” keyword. Also other patches emerged like, forcing the /Length keyword to be direct. Or calculate every object size using XREFs pointers (assuming not garbage between the objs (which in fact is what the spec says)).
Well in any case if you manage to run a lexer and tokenize it here you have the parsing grammar … weeee!!
object : NAME | STRING | HEXSTRING | NUMBER | TRUE | FALSE | NULL | R | dictionary | array dictionary : DOUBLE_LESS_THAN_SIGN dictionary_entry_list DOUBLE_GREATER_THAN_SIGN dictionary_entry_list : NAME object dictionary_entry_list | empty array : LEFT_SQUARE_BRACKET object_list RIGHT_SQUARE_BRACKET object_list : object object_list | empty indirect : indirect_object_stream | indirect_object indirect_object : OBJ object ENDOBJ indirect_object_stream : OBJ dictionary STREAM_DATA ENDOBJ xref : indirect_object_stream | XREF TRAILER dictionary pdf : HEADER pdf_update_list pdf_update_list : pdf_update_list body xref pdf_end | body xref pdf_end body : body indirect_object | body indirect_object_stream | empty pdf_end : STARTXREF EOF
f/
PDF, A broken Spec!
August 14, 2010
(Or why I can’t parse a PDF)
This post is about the difficulties I ran into when trying to write a PDF parser. It’s my opinion that
PDF specification is broken because it permits the token “endstream” inside a stream! |
![]() |
Summary:
There are 4 ways of deciding the size of a PDF stream:
[+] Scanning for the “endstream” token
[1] Scanning for the endstream token
[2] Get the size from the direct \Length entry
[3] Get the indirect \Length using the normal xref
[4] Calculate the size from the starting marks pointed from the Normal cross-reference
What happens in actual PDF implementations if:
[+] Cross-reference is broken?
[+] Cross-reference point to overlapped objects
[+] Streams contains the endstream token
[+] Streams contains some evil endstream/endobj token combination
[+] If all the 4(or more) ways of parsing a PDF stream are present, should they be all consistent?
And finally, is this file PDF compliant? I bet someone may construct an obfuscation method based in this “issues”.
If you still think this is worth reading check out the following details and please comment if you find bug if you have a solution for the problems I stated here.
Lexing PDF, just for the not-fun of it.
August 6, 2010
In an attempt to irrevocably declare my insanity I went into the details of making a PDF lexer the most strict to the specification I can. This post is about making a Portable File Format lexer in python using the PLY parser generator. This lexer is based on the ISO 32000-1 standard. Yes! PDF is an ISO standard, see.
In a PDF we have hexstrings and strings, numbers, names, arrays, references and null, booleans, dictionaries, streams and the file structure entities (the header, the trailer dictionary, the eof mark, the startxref mark and the crossreference). We are going to describe in detail all the tokens needed to define the named entities. You’ll probably want to take a look on how a parser is written in PLY at this simple example.
QUICK DEMO
Before we go into the really really really boring stuff, let’s do a quick demonstration of it’s value…
Let’s pick a random PDF out there… hmm.. for example jailbrakeme.pdf. Then grab the already done lexer here and run it like this…
it should output something like this…
iPhone3,1_4.0.pdf LexToken(HEADER,'1.3',1,0) LexToken(OBJ,('4', '0'),1,22) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,45) LexToken(STREAM_DATA,'q Q q 18 750 576 24 re W n /C ... ( ) Tj ET Q Q',1,48) LexToken(ENDOBJ,'endobj',1,696) LexToken(OBJ,('2', '0'),1,703) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,797) LexToken(ENDOBJ,'endobj',1,800) LexToken(OBJ,('6', '0'),1,807) LexToken(DOUBLE_LESS_THAN_SIGN,'<<',1,815) LexToken(NAME,'ProcSet',1,818) LexToken(LEFT_SQUARE_BRACKET,'[',1,827) LexToken(NAME,'PDF',1,829) LexToken(NAME,'Text',1,834) LexToken(RIGHT_SQUARE_BRACKET,']',1,840) LexToken(NAME,'ColorSpace',1,842) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,868) LexToken(NAME,'Font',1,871) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,892) LexToken(DOUBLE_GREATER_THAN_SIGN,'>>',1,895) LexToken(ENDOBJ,'endobj',1,898) LexToken(OBJ,('3', '0'),1,905) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,978) LexToken(ENDOBJ,'endobj',1,981) LexToken(OBJ,('12', '0'),1,988) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,1028) LexToken(ENDOBJ,'endobj',1,1031) LexToken(OBJ,('13', '0'),1,1038) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,1102) LexToken(STREAM_DATA,'x\x9c\xed}\rXT\xd7\xd5\xee\x1e...",1,1105) LexToken(ENDOBJ,'endobj',1,11834) LexToken(OBJ,('15', '0'),1,11841) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,12058) LexToken(ENDOBJ,'endobj',1,12061) LexToken(OBJ,('16', '0'),1,12068) LexToken(LEFT_SQUARE_BRACKET,'[',1,12077) LexToken(NUMBER,'556',1,12079) LexToken(RIGHT_SQUARE_BRACKET,']',1,12083) LexToken(ENDOBJ,'endobj',1,12085) LexToken(OBJ,('9', '0'),1,12092) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,12254) LexToken(ENDOBJ,'endobj',1,12257) LexToken(OBJ,('18', '0'),1,12264) LexToken(NUMBER,'9332',1,12273) LexToken(ENDOBJ,'endobj',1,12278) LexToken(OBJ,('20', '0'),1,12285) LexToken(LEFT_SQUARE_BRACKET,'[',1,12294) LexToken(NUMBER,'316',1,12296) LexToken(NUMBER,'0',1,12300) . LexToken(NUMBER,'613',1,12516) LexToken(RIGHT_SQUARE_BRACKET,']',1,12520) LexToken(ENDOBJ,'endobj',1,12522) LexToken(OBJ,('1', '0'),1,12529) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,12540) LexToken(ENDOBJ,'endobj',1,12543) LexToken(XREF,[((0, 29), [(0, 65535, 'f'),...(17744, 0, 'n')])],1,12550) LexToken(TRAILER,'trailer',1,13140) LexToken(DOUBLE_LESS_THAN_SIGN,'<>',1,13263) LexToken(STARTXREF,17942,1,13266) LexToken(EOF,'%%EOF\n',1,13282)
It marks the position of every object!!! WOW!!!!!!
See you in the Adobe playground…
June 16, 2010
Wacharooga!!
Let’s see how to run an external Adobe Reader process from a pdf file that’s being displayed in a web browser.
This *technique* is a derivate of the pdf-into-pdf embedding post. It also uses the GotoE action to jump away to an embed pdf. I just discovered that doing this from a browser viewed pdf it runs a different process of the Adobe Reader. The ability of running a new, fresh and separated process has some interesting exploitability implications.
In older Reader version (previous to 9.2.3?) doing this also served as a way to bypass DEP optIn, but by now we have to settle with just this two facts:
[+] Whatever happens in the separate Reader will not crash the browser, potentially enabling other chances to exploit it.
[+] It makes it possible to develop exploits for highly predictable memory layouts.
Launch PDF Action Mega Abuse !PATCH!
March 31, 2010
@DidierStevens has released a way to partially “control” the message showed by Adobe Reader when it launches an application from inside a pdf file with the PDFAction “/Launch”. Check it out here
I think it’s about time to start calling the application Launching capability of Adobe (and friends) a VULNERABILITY.
Here you have a python script for PATCHING the affected dll and cripple the Launch Action.
#Megapatch for Didier Launch action abuse #http://blog.didierstevens.com/2010/03/29/escape-from-pdf/ version="9.0" path = "C:\\Program Files\\Adobe\\Adobe Reader %s\\Reader\\"%version #path = "./" data = file(path+"AcroRd32.dll","rb").read() file(path+"AcroRd32.dll.bak","wb").write(data) while data.find("Launch")!=-1: data = data.replace("Launch","Felipe") file(path+"AcroRd32.dll","wb").write(data)
I tested it in W7 / Adobe Reader 9.3 but it should work for every version/OS/Arch mixture. In some OS you may experience some trouble replacing the dll.
(((( An untested improvement… s/Felipe/######/g ))))
Felipe/
Filling Adobe’s heap …
February 15, 2010
Flash on a PDF with miniPDF.py…
February 11, 2010
Due to the recent advances in exploitation techniques it became really important to put flash every were we can.
Flash AHAHHHHHHHHHHHHHHH!!!! In this post we are going to show how to add a swf(Flash) file to a PDF file using our miniPDF.py lib. |
![]() |
Flash support is relatively new in PDF and come into the scene primary for doing the PDF portable collection thing and such. We’ll follow the steps described in Adobe® Supplement to the ISO 32000 , so you probably need to grab it and keep it close to you. In the case you’ve missed the previous posts here you have a copy of the miniPDF.py so you can take a quick look. We are going to use that lib mainly as we did in earlier posts and start adding PDF objects until… –FLASH!– we end up with a one paged PDF with a running embedded SWF. OK, so lets start…
Read the rest of this entry »