a few more comments ...

classic Classic list List threaded Threaded
8 messages Options
Michael Meeks-5 Michael Meeks-5
Reply | Threaded
Open this post in threaded view
|

a few more comments ...

Hi Mohammed,

        Just a few more esthetic comments =) on a nice patch.

+    size_t mnPos;
+    size_t mnStreamSize;
+
+    Buffer maInUseBuffer;
+    int mnOffset;

        I'd love to see some doxygen comments:

        size_t mnPos; /// position in stream
        int mnOffset; /// position in maInUseBuffer

        etc. added to this header =) makes it easier to read.

        Also - since we wrote all this code ourselves - we don't need to use
the ALv2 variant header - lets just use the MPLv2 header from
TEMPLATE.SOURCECODE.HEADER in the top-level for the two new files.

        Now I read it again, I'm rather skeptical that:

+        if( !maUsedBuffers.empty() )
+        {
+            pProducedBuffer = maUsedBuffers.front();
+            maUsedBuffers.pop();
+        }

        is helpful. I'm not sure that we can re-use these buffers efficiently
here - perhaps its better just to allocate new ones in the stream read
method; can you profile with and without that (would love to get rid of
the extra complexity if possible).

+class UnzippingThread: public salhelper::Thread
+{
+    XBufferedThreadedStream *mxStream;

        I'd use a reference - to clarify lifecycle: so "&mrStream" - but of
course, with std::thread we could use a lambda for this thing I guess
and capture it ...

        Gotcha - I think I found the bad bit which is this:

commit 4ae705d02df0ddf75b97d0e94add6994626f487e
Author: Kohei Yoshida <[hidden email]>
Date:   Fri Jan 13 20:47:46 2017 -0500

    tdf#97597: Ensure that each parsing thread has its own buffer.

        The package/ code is (I think) thread-safe, but is not safe wrt. the
file-pointer underlying itself. I -think- it believes that only one
thread is read at a time.

        Kohei's change here - I think is to make each stream fully de-compress
itself into memory before parsing it - which fixed some horrible calc
problem.

        Hmm - so we're going to need to fix the root cause here I think, not
least because de-compressing the whole stream can be really very
expensive: ODF eg. is -incredibly- verbose with the XML taking up far
more memory than the equivalent calc data structures.

        Sooo ... I think your code is golden; but we're going to need to fix
the package/ code to avoid the bug.

https://bugs.documentfoundation.org/show_bug.cgi?id=97597#c28

        Suggests (input from Kohei appreciated) - that we may need to do a new
'seek' each time we come to reading some bytes from the underlying
package/ stream - to ensure that no other thread is messing with our
underlying stream position while we are reading compressed data from it =)

        I guess we should also write a unit test for this - that is heavily
threaded - 4-8 threads doing back-to-back reads for quite some time
might prolly catch this.

        So - could you look into the package/ impl ?

        Thanks !

                Michael.

--
[hidden email] <><, Pseudo Engineer, itinerant idiot
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Noel Grandin-2 Noel Grandin-2
Reply | Threaded
Open this post in threaded view
|

Re: a few more comments ...



On 5 June 2017 at 20:40, Michael Meeks <[hidden email]> wrote:

        Sooo ... I think your code is golden; but we're going to need to fix
the package/ code to avoid the bug.

https://bugs.documentfoundation.org/show_bug.cgi?id=97597#c28

        Suggests (input from Kohei appreciated) - that we may need to do a new
'seek' each time we come to reading some bytes from the underlying
package/ stream - to ensure that no other thread is messing with our
underlying stream position while we are reading compressed data from it =)


Perhaps we need a new file abstraction which deliberately has no seek position in it's API, forcing client code to always specify a position when doing read()/write()?
Then when we pass such a handle/object between threads, its use will be correct by construction.
 

_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Kohei Yoshida-7 Kohei Yoshida-7
Reply | Threaded
Open this post in threaded view
|

Re: a few more comments ...

In reply to this post by Michael Meeks-5
On Mon, 2017-06-05 at 19:40 +0100, Michael Meeks wrote:
>         Suggests (input from Kohei appreciated) - that we may need to
> do a new
> 'seek' each time we come to reading some bytes from the underlying
> package/ stream - to ensure that no other thread is messing with our
> underlying stream position while we are reading compressed data from
> it =)

Sorry, but can someone give me a brief summary of what you guys are
discussing?  What exactly is the problem, and what type of input are
you expecting from me?

Thanks,

Kohei
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Michael Meeks-5 Michael Meeks-5
Reply | Threaded
Open this post in threaded view
|

Re: a few more comments ...

Hi Kohei,

On 05/06/17 20:54, Kohei Yoshida wrote:

> On Mon, 2017-06-05 at 19:40 +0100, Michael Meeks wrote:
>>         Suggests (input from Kohei appreciated) - that we may need to
>> do a new
>> 'seek' each time we come to reading some bytes from the underlying
>> package/ stream - to ensure that no other thread is messing with our
>> underlying stream position while we are reading compressed data from
>> it =)
>
> Sorry, but can someone give me a brief summary of what you guys are
> discussing?  What exactly is the problem, and what type of input are
> you expecting from me?

        The patch is gerrit:

        https://gerrit.libreoffice.org/#/c/38135/

        And the basic conflict is that your fix solved the bug by doing all the
unzipping ahead of time, and that's rather in conflict with threading
and parallelizing the unzipping - which is the goal of this work =)

        So - advice on re-doing the fix to improve package to cope with
multiple threads doing seeking etc. would be great if possible =)

        Thanks !

                Michael.

--
[hidden email] <><, Pseudo Engineer, itinerant idiot
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Kohei Yoshida-7 Kohei Yoshida-7
Reply | Threaded
Open this post in threaded view
|

Re: a few more comments ...

On Mon, 2017-06-05 at 21:14 +0100, Michael Meeks wrote:

> Hi Kohei,
>
> On 05/06/17 20:54, Kohei Yoshida wrote:
> >
> > On Mon, 2017-06-05 at 19:40 +0100, Michael Meeks wrote:
> > >
> > >         Suggests (input from Kohei appreciated) - that we may
> > > need to
> > > do a new
> > > 'seek' each time we come to reading some bytes from the
> > > underlying
> > > package/ stream - to ensure that no other thread is messing with
> > > our
> > > underlying stream position while we are reading compressed data
> > > from
> > > it =)
> > Sorry, but can someone give me a brief summary of what you guys are
> > discussing?  What exactly is the problem, and what type of input
> > are
> > you expecting from me?
> The patch is gerrit:
>
> https://gerrit.libreoffice.org/#/c/38135/
>
> And the basic conflict is that your fix solved the bug by doing
> all the
> unzipping ahead of time, and that's rather in conflict with threading
> and parallelizing the unzipping - which is the goal of this work =)
>
> So - advice on re-doing the fix to improve package to cope with
> multiple threads doing seeking etc. would be great if possible =)

I'm afraid I don't have any advice here that I can give off the top of
my head.  I have no clue what work would involve threading and
parallelizing the unzip to begin with, much less advice to fix what I
fixed while achieving your goal...

I would probably have to spend some considerable amount of time
researching before I could even come up with something tangible. :-(

Kohei

_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Michael Meeks-5 Michael Meeks-5
Reply | Threaded
Open this post in threaded view
|

Re: a few more comments ...


On 05/06/17 21:20, Kohei Yoshida wrote:
> I'm afraid I don't have any advice here that I can give off the top of
> my head.

        Fair enough =)

> I would probably have to spend some considerable amount of time
> researching before I could even come up with something tangible. :-(

        No problem; we'll read the code & defeat the dragon ourselves :-)

        Thanks,

                Michael.

--
[hidden email] <><, Pseudo Engineer, itinerant idiot
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Mohammed Abdul Azeem Mohammed Abdul Azeem
Reply | Threaded
Open this post in threaded view
|

Re: a few more comments ...

In reply to this post by Michael Meeks-5
Hi Michael,

On Tue, Jun 6, 2017 at 12:10 AM, Michael Meeks <[hidden email]> wrote:
Hi Mohammed,

        Now I read it again, I'm rather skeptical that:

+        if( !maUsedBuffers.empty() )
+        {
+            pProducedBuffer = maUsedBuffers.front();
+            maUsedBuffers.pop();
+        }

        is helpful. I'm not sure that we can re-use these buffers efficiently
here - perhaps its better just to allocate new ones in the stream read
method; can you profile with and without that (would love to get rid of
the extra complexity if possible).

Here you go.


Regards,
Azeem


_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice
Michael Meeks-5 Michael Meeks-5
Reply | Threaded
Open this post in threaded view
|

Re: a few more comments ...

Hi Mohammed,

On 06/06/17 13:07, Mohammed Abdul Azeem wrote:
>             Now I read it again, I'm rather skeptical that:
>
>     +        if( !maUsedBuffers.empty() )
...

>             is helpful. I'm not sure that we can re-use these buffers
>     efficiently
>     here - perhaps its better just to allocate new ones in the stream read
>     method; can you profile with and without that (would love to get rid of
>     the extra complexity if possible).
>
> Here you go.
>
> with UsedBuffers:
> https://demo.collaboracloudsuite.com/tdf/index.php/s/Hbsp5rI8FE7CHmF
> without
> UsedBuffers: https://demo.collaboracloudsuite.com/tdf/index.php/s/PIcWD261zVcwpWj

        Interesting =) well it seems to save a small amount of time - which is
surprising; the uno_type_sequence_reference2One seems to take rather a
long time - I imagine we could be more careful there.

        Anyhow - thanks for poking at it,

        ATB,

                Michael.

--
[hidden email] <><, Pseudo Engineer, itinerant idiot
_______________________________________________
LibreOffice mailing list
[hidden email]
https://lists.freedesktop.org/mailman/listinfo/libreoffice