Discussion:
[zathura] Zathura can not open valid PDF file
XVilka Haos of System
2016-07-03 09:11:13 UTC
Permalink
Hello!
If you heard about PoC||GTFO e-zine, you know that they're playing
with produced PDF format a lot, keeping it still a valid. Since
zathura based on poppler, I've checked also those files with evince.
Seems evince parse them properly.

Take e.g. this file https://www.alchemistowl.org/pocorgtfo/pocorgtfo11.pdf
Zathura show nothing, evince show all pages properly.

You also can check on other files from this page too:
https://www.alchemistowl.org/pocorgtfo/

Kind regards,
XVilka.
Leonardo Taccari
2016-07-03 10:17:55 UTC
Permalink
Hello XVilka!
Post by XVilka Haos of System
Hello!
If you heard about PoC||GTFO e-zine, you know that they're playing
with produced PDF format a lot, keeping it still a valid. Since
zathura based on poppler, I've checked also those files with evince.
Seems evince parse them properly.
Take e.g. this file https://www.alchemistowl.org/pocorgtfo/pocorgtfo11.pdf
Zathura show nothing, evince show all pages properly.
https://www.alchemistowl.org/pocorgtfo/
The problem doesn't seem the PDF plugin used (I can reproduce the same
with zathura-pdf-mupdf) but the fact that zathura uses libmagic(3) (or
similar methods) to invoke the proper plugin, e.g.:

$ zathura pocorgtfo11.pdf
error: Unknown file type: 'application/octet-stream'

...indeed:

$ file --mime-type pocorgtfo11.pdf
pocorgtfo11.pdf: application/octet-stream

Giving a look to the first 112 bytes of it we can see:

$ hexdump -C -n 112 pocorgtfo11.pdf
00000000 72 65 71 75 69 72 65 20 27 6a 73 6f 6e 27 0a 72 |require 'json'.r|
00000010 65 71 75 69 72 65 20 27 73 6f 63 6b 65 74 27 0a |equire 'socket'.|
00000020 72 65 71 75 69 72 65 20 27 75 72 69 27 0a 3d 62 |require 'uri'.=b|
00000030 65 67 69 6e 0a 25 50 44 46 2d 31 2e 35 0a 25 d0 |egin.%PDF-1.5.%.|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^
...PDF starts here!..............
00000040 d4 c5 d8 0a 39 39 39 39 20 30 20 6f 62 6a 0a 3c |....9999 0 obj.<|
00000050 3c 0a 2f 4c 65 6e 67 74 68 20 31 30 39 34 34 0a |<./Length 10944.|
00000060 3e 3e 0a 73 74 72 65 61 6d 0a 3d 65 6e 64 0a 70 |>>.stream.=end.p|
00000070

Getting rid of the first 53 bytes:

$ tail -c $(expr $(wc -c < pocorgtfo11.pdf) - 53) pocorgtfo11.pdf > pocorgtfo11-pdf.pdf
$ file --mime-type pocorgtfo11-pdf.pdf
pocorgtfo11-pdf.pdf: application/pdf

...we can correctly view it:

$ zathura pocorgtfo11-pdf.pdf

I'm not sure if maybe an option to force a particular plugin (and
avoid using libmagic(3) and any similar methods) can be useful or
not. In these cases it is!
XVilka Haos of System
2016-07-27 11:05:47 UTC
Permalink
Hello!
Any decision on this?
I'd suggest fallback to search PDF signature at any offset, if
libmagic failed, since the standard allows that.
Kind regards,
XVilka.
Post by Leonardo Taccari
Hello XVilka!
Post by XVilka Haos of System
Hello!
If you heard about PoC||GTFO e-zine, you know that they're playing
with produced PDF format a lot, keeping it still a valid. Since
zathura based on poppler, I've checked also those files with evince.
Seems evince parse them properly.
Take e.g. this file https://www.alchemistowl.org/pocorgtfo/pocorgtfo11.pdf
Zathura show nothing, evince show all pages properly.
https://www.alchemistowl.org/pocorgtfo/
The problem doesn't seem the PDF plugin used (I can reproduce the same
with zathura-pdf-mupdf) but the fact that zathura uses libmagic(3) (or
$ zathura pocorgtfo11.pdf
error: Unknown file type: 'application/octet-stream'
$ file --mime-type pocorgtfo11.pdf
pocorgtfo11.pdf: application/octet-stream
$ hexdump -C -n 112 pocorgtfo11.pdf
00000000 72 65 71 75 69 72 65 20 27 6a 73 6f 6e 27 0a 72 |require 'json'.r|
00000010 65 71 75 69 72 65 20 27 73 6f 63 6b 65 74 27 0a |equire 'socket'.|
00000020 72 65 71 75 69 72 65 20 27 75 72 69 27 0a 3d 62 |require 'uri'.=b|
00000030 65 67 69 6e 0a 25 50 44 46 2d 31 2e 35 0a 25 d0 |egin.%PDF-1.5.%.|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^
...PDF starts here!..............
00000040 d4 c5 d8 0a 39 39 39 39 20 30 20 6f 62 6a 0a 3c |....9999 0 obj.<|
00000050 3c 0a 2f 4c 65 6e 67 74 68 20 31 30 39 34 34 0a |<./Length 10944.|
00000060 3e 3e 0a 73 74 72 65 61 6d 0a 3d 65 6e 64 0a 70 |>>.stream.=end.p|
00000070
$ tail -c $(expr $(wc -c < pocorgtfo11.pdf) - 53) pocorgtfo11.pdf > pocorgtfo11-pdf.pdf
$ file --mime-type pocorgtfo11-pdf.pdf
pocorgtfo11-pdf.pdf: application/pdf
$ zathura pocorgtfo11-pdf.pdf
I'm not sure if maybe an option to force a particular plugin (and
avoid using libmagic(3) and any similar methods) can be useful or
not. In these cases it is!
_______________________________________________
zathura mailing list
https://lists.pwmt.org/mailman/listinfo/zathura
Sebastian Ramacher
2016-11-10 09:54:45 UTC
Permalink
Post by XVilka Haos of System
Hello!
Any decision on this?
I'd suggest fallback to search PDF signature at any offset, if
libmagic failed, since the standard allows that.
Kind regards,
XVilka.
If those are valid PDF files, then libmagic should detect them correctly.

Cheers
(Sorry for the delay.)
--
Sebastian Ramacher
Loading...