Saving documents with broken zip streams (Re: minutes of ESC call ...)

classic Classic list List threaded Threaded
7 messages Options
Luboš Luňák Luboš Luňák
Reply | Threaded
Open this post in threaded view
|

Saving documents with broken zip streams (Re: minutes of ESC call ...)

On Monday 03 of June 2019, Caolán McNamara wrote:

> On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote:
> > On Thursday 30 of May 2019, Michael Meeks wrote:
> > >       + some in the zip area - assuming they are threading related.
> >
> >  Is this about those documents such
> > as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt
> > ? How
> > can I reproduce that problem? If I try to fetch
> > [hidden email]:
> > an/opendocument_stack_overflow_2.odt ,
> > it doesn't exist.
>
> I attach two of the examples here. The input name was foo.sample, the
> output to odt name appears higher up in the bt during the export.
>
> ./instdir/program/soffice.bin --headless --convert-to odt
> opendocument_stack_overflow.sample


 Ok, so it's not a problem with my code, my changes just happened to show the
problem, and the problem is that those documents are broken. If you try to
unzip the documents, it will complain about incorrect CRC (although it still
will uncompress them). And what happens is that when we try to save the file,
apparently only by that point we'll read those zip streams, there will be a
ZipException about that, and the code in package/ is not exception-safe. So
ZipOutputStream::writeLOC() gets called but not the matching
ZipOutputStream::rawCloseEntry().

 But this is actually broken on several levels. If I make the code to catch
the exception better, I'll need to make it somehow handle the fact that
writeLOC() prepared for writing en entry, but then there's nothing to write.
But that's actually not important, since ZipPackageStream::saveChild() will
still return failure, so ZipPackageFolder::saveContents() will throw an
exception, making the whole document saving fail. Which in turn means this
whole save business is irrelevant, as there's just no way to save the
document, even though we can load it and we can edit it. Which seems rather
lame.

 Any idea what to do about that? Is it really ok that we just refuse to save
it? Or should we save it even though the contents may be broken?

--
 Luboš Luňák
 [hidden email]
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Jan-Marek Glogowski Jan-Marek Glogowski
Reply | Threaded
Open this post in threaded view
|

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

Am 03.06.19 um 21:52 schrieb Luboš Luňák:

> On Monday 03 of June 2019, Caolán McNamara wrote:
>> On Mon, 2019-06-03 at 12:23 +0200, Luboš Luňák wrote:
>>> On Thursday 30 of May 2019, Michael Meeks wrote:
>>>>       + some in the zip area - assuming they are threading related.
>>>
>>>  Is this about those documents such
>>> as /srv/crashtestdata/files/caolan/opendocument_stack_overflow_2.odt
>>> ? How
>>> can I reproduce that problem? If I try to fetch
>>> [hidden email]:
>>> an/opendocument_stack_overflow_2.odt ,
>>> it doesn't exist.
>>
>> I attach two of the examples here. The input name was foo.sample, the
>> output to odt name appears higher up in the bt during the export.
>>
>> ./instdir/program/soffice.bin --headless --convert-to odt
>> opendocument_stack_overflow.sample
>
>
>  Ok, so it's not a problem with my code, my changes just happened to show the
> problem, and the problem is that those documents are broken. If you try to
> unzip the documents, it will complain about incorrect CRC (although it still
> will uncompress them). And what happens is that when we try to save the file,
> apparently only by that point we'll read those zip streams, there will be a
> ZipException about that, and the code in package/ is not exception-safe. So
> ZipOutputStream::writeLOC() gets called but not the matching
> ZipOutputStream::rawCloseEntry().
>
>  But this is actually broken on several levels. If I make the code to catch
> the exception better, I'll need to make it somehow handle the fact that
> writeLOC() prepared for writing en entry, but then there's nothing to write.
> But that's actually not important, since ZipPackageStream::saveChild() will
> still return failure, so ZipPackageFolder::saveContents() will throw an
> exception, making the whole document saving fail. Which in turn means this
> whole save business is irrelevant, as there's just no way to save the
> document, even though we can load it and we can edit it. Which seems rather
> lame.
>
>  Any idea what to do about that? Is it really ok that we just refuse to save
> it? Or should we save it even though the contents may be broken?

IMHO the only sane solution would be to detect the broken CRCs on read and
report a broken file to the user. Eventually we could offer some recovery
option: mark the broken CRCs to be recalculated and keep the stuff or drop the
broken ZIP entries. I guess most users can't make this decision, so I would opt
for optional recovery of the entries, ignoring the CRCs.

Easier solution: we just deny / abort loading the file and tell the user babout
the broken file.

It really strange that the broken CRCs are just detected on write and the
document loads without a problem. Or do we ignore all ZIP entries, which we
don't know, which would be strange too?
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Luboš Luňák Luboš Luňák
Reply | Threaded
Open this post in threaded view
|

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

On Monday 03 of June 2019, Jan-Marek Glogowski wrote:
> Am 03.06.19 um 21:52 schrieb Luboš Luňák:
> >  Ok, so it's not a problem with my code, my changes just happened to show
> > the problem, and the problem is that those documents are broken. If you
> > try to unzip the documents, it will complain about incorrect CRC
...
> >  Any idea what to do about that? Is it really ok that we just refuse to
> > save it? Or should we save it even though the contents may be broken?
>
> IMHO the only sane solution would be to detect the broken CRCs on read and
> report a broken file to the user.

 That's not so easy. We do not detect broken CRCs on load, because we load on
demand. And removing that seems like a bad trade-off. Finding that a CRC
stream has a broken CRCs means uncompressing everything and checking, even
things we otherwise do not care about. I think we do not want to make loading
of everything possibly slower just to detect that virtually all documents are
correct.

--
 Luboš Luňák
 [hidden email]
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Caolán McNamara Caolán McNamara
Reply | Threaded
Open this post in threaded view
|

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

In reply to this post by Luboš Luňák
On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote:
> Any idea what to do about that? Is it really ok that we just refuse
> to save it? Or should we save it even though the contents may be
> broken?

For what its worth those sample documents are not "realworld" user
documents, but the output of fuzzing engines so any non-catastrophic
outcome is acceptable IMO

_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Luboš Luňák Luboš Luňák
Reply | Threaded
Open this post in threaded view
|

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

On Thursday 06 of June 2019, Caolán McNamara wrote:
> On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote:
> > Any idea what to do about that? Is it really ok that we just refuse
> > to save it? Or should we save it even though the contents may be
> > broken?
>
> For what its worth those sample documents are not "realworld" user
> documents, but the output of fuzzing engines so any non-catastrophic
> outcome is acceptable IMO

 I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/ .
Given that it's (hopefully) very unlikely to find real documents with broken
zip internals, I find that good enough.

--
 Luboš Luňák
 [hidden email]
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Wols Lists Wols Lists
Reply | Threaded
Open this post in threaded view
|

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

On 07/06/19 10:07, Luboš Luňák wrote:

> On Thursday 06 of June 2019, Caolán McNamara wrote:
>> On Mon, 2019-06-03 at 21:52 +0200, Luboš Luňák wrote:
>>> Any idea what to do about that? Is it really ok that we just refuse
>>> to save it? Or should we save it even though the contents may be
>>> broken?
>>
>> For what its worth those sample documents are not "realworld" user
>> documents, but the output of fuzzing engines so any non-catastrophic
>> outcome is acceptable IMO
>
>  I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/ .
> Given that it's (hopefully) very unlikely to find real documents with broken
> zip internals, I find that good enough.
>
Bear in mind I don't know the background to this ...

My immediate reaction was "we can't refuse to let the user save their
document, so could we disable 'save' and do a 'save as'?".

As for unlikely to find broken documents, it's too long ago for me to
remember the details, but I remember salvaging a broken calc document by
unzipping it and recovering the data portion. So real-world broken
documents do happen (although I think in this case it was broken such
that LO refused to open it ...)

Cheers,
Wol
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Luboš Luňák Luboš Luňák
Reply | Threaded
Open this post in threaded view
|

Re: Saving documents with broken zip streams (Re: minutes of ESC call ...)

On Friday 07 of June 2019, Wols Lists wrote:

> On 07/06/19 10:07, Luboš Luňák wrote:
> >> For what its worth those sample documents are not "realworld" user
> >> documents, but the output of fuzzing engines so any non-catastrophic
> >> outcome is acceptable IMO
> >
> >  I have avoided the assert with https://gerrit.libreoffice.org/#/c/73646/
> > . Given that it's (hopefully) very unlikely to find real documents with
> > broken zip internals, I find that good enough.
>
> Bear in mind I don't know the background to this ...
>
> My immediate reaction was "we can't refuse to let the user save their
> document, so could we disable 'save' and do a 'save as'?".

 That doesn't make a difference here. The code can't save such a broken
document, period. Regardless of where it is being saved to.

> As for unlikely to find broken documents, it's too long ago for me to
> remember the details, but I remember salvaging a broken calc document by
> unzipping it and recovering the data portion. So real-world broken
> documents do happen (although I think in this case it was broken such
> that LO refused to open it ...)

 Manually unzipping and zipping back properly would work here too. Or you can
improve the saving code to cope with such problems somehow, feel free to.

--
 Luboš Luňák
 [hidden email]
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice