diff of 2 docx files

classic Classic list List threaded Threaded
12 messages Options
berenger.morel berenger.morel
Reply | Threaded
Open this post in threaded view
|

diff of 2 docx files

Hello.

Is it possible to have a diff between 2 .docx files in libreoffice? Or
with any other tool, I do not really mind...
I am not fond of office suites, but at my work they sent me an outdated
document describing what I have to do, and when I asked about some
"details" ( or errors, in fact ) in it, they sent me a different ( but
still with errors... ) version of the same document.
With a simple text file, I could have made a diff ( lot of tools for
that ) but I have no idea about how to do it with docx files. I have
obviously tried to unzip them, and use diff and meld on resulting files,
but those files are so messy, that I thought that, maybe, they are
encrypted of whatever!
Now that I think about it, I could do a diff of files made through a
copy/paste of the text... but that would be quite dirty.

Thanks for any suggestions.

PS: I did not registered to this list, so please add me in CC.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
pbw pbw
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

If you write the files out as .fodt (Flat XML) files from within LO, you
will have straight XML files to compare.

Beyond that, you could get the tika-app.jar from the Apache Tika
project, which will let you extract plain text from the .fodt files and
directly from the .odt files.
On 4/02/2014 6:36 pm, [hidden email] wrote:

> Hello.
>
> Is it possible to have a diff between 2 .docx files in libreoffice? Or
> with any other tool, I do not really mind...
> I am not fond of office suites, but at my work they sent me an outdated
> document describing what I have to do, and when I asked about some
> "details" ( or errors, in fact ) in it, they sent me a different ( but
> still with errors... ) version of the same document.
> With a simple text file, I could have made a diff ( lot of tools for
> that ) but I have no idea about how to do it with docx files. I have
> obviously tried to unzip them, and use diff and meld on resulting files,
> but those files are so messy, that I thought that, maybe, they are
> encrypted of whatever!
> Now that I think about it, I could do a diff of files made through a
> copy/paste of the text... but that would be quite dirty.
>
> Thanks for any suggestions.
>
> PS: I did not registered to this list, so please add me in CC.
>

--
Peter West
"...and a sword will pierce through your own soul also..."

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
berenger.morel berenger.morel
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files



Le 04.02.2014 11:08, Peter West a écrit :
> If you write the files out as .fodt (Flat XML) files from within LO,
> you will have straight XML files to compare.
>
> Beyond that, you could get the tika-app.jar from the Apache Tika
> project, which will let you extract plain text from the .fodt files
> and directly from the .odt files.

The problem is that I am not the author of those files. But you are
true, I could simply copy the text in a file of better format... I
should have thought about that myself btw... so obvious.

Thanks.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
som som
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by berenger.morel


> On Tuesday, 4 February 2014 3:28 PM, "[hidden email]" <[hidden email]> wrote:
> > Hello.
>
> Is it possible to have a diff between 2 .docx files in libreoffice?

i am not sure whether it works with docx but did you try the "compare document" option under "edit" (Edit -> Compare Document).

regards,

som

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
som som
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by berenger.morel


> On Tuesday, 4 February 2014 3:28 PM, "[hidden email]" <[hidden email]> wrote:
> > Hello.
>
> Is it possible to have a diff between 2 .docx files in libreoffice?

i am not sure whether it works with docx but did you try the "compare document" option under "edit" (Edit -> Compare Document).

regards,

som

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Brian Barker Brian Barker
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by berenger.morel
At 09:36 04/02/2014 +0100, Morel Bérenger wrote:
>Is it possible to have a diff between 2 .docx
>files in libreoffice? I am not fond of office
>suites, but at my work they sent me an outdated
>document describing what I have to do, and when
>I asked about some "details" ( or errors, in
>fact ) in it, they sent me a different ( but
>still with errors... ) version of the same document.

As has been suggested, try the Compare Document facility:
o Open the newer document.
o Go to Edit | Compare Document... .
o Browse to and insert the older document.

You can accept or reject the identified changes,
or cancel the dialogue if you wish merely to view them.

I trust this helps.

Brian Barker


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

nabbler nabbler
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by pbw
On 04/02/2014, Peter West <[hidden email]> wrote:

> If you write the files out as .fodt (Flat XML) files from within LO, you
> will have straight XML files to compare.
>
> Beyond that, you could get the tika-app.jar from the Apache Tika
> project, which will let you extract plain text from the .fodt files and
> directly from the .odt files.
> On 4/02/2014 6:36 pm, [hidden email] wrote:
>> Hello.
>>
>> Is it possible to have a diff between 2 .docx files in libreoffice? Or
>> with any other tool, I do not really mind...

Similarly, you could use the command terminal:

libreoffice -convert-to fodt /path/to/m$file

Then apply your diff tool. In addition, you could then use xslt to
extract the elements/text of the fodt files.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
pbw pbw
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by pbw
On 4/02/2014 8:08 pm, Peter West wrote:
> If you write the files out as .fodt (Flat XML) files from within LO, you
> will have straight XML files to compare.
>
> Beyond that, you could get the tika-app.jar from the Apache Tika
> project, which will let you extract plain text from the .fodt files and
> directly from the .odt files.

Make that ".docx files."

> On 4/02/2014 6:36 pm, [hidden email] wrote:
>> Hello.
>>
>> Is it possible to have a diff between 2 .docx files in libreoffice? Or
>> with any other tool, I do not really mind...
>> I am not fond of office suites, but at my work they sent me an outdated
>> document describing what I have to do, and when I asked about some
>> "details" ( or errors, in fact ) in it, they sent me a different ( but
>> still with errors... ) version of the same document.
>> With a simple text file, I could have made a diff ( lot of tools for
>> that ) but I have no idea about how to do it with docx files. I have
>> obviously tried to unzip them, and use diff and meld on resulting files,
>> but those files are so messy, that I thought that, maybe, they are
>> encrypted of whatever!
>> Now that I think about it, I could do a diff of files made through a
>> copy/paste of the text... but that would be quite dirty.
>>
>> Thanks for any suggestions.
>>
>> PS: I did not registered to this list, so please add me in CC.
>>
>

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
null null
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by nabbler
Hello,
     For some reason, on my Windows PC, the "libreoffice" command is not
present.  So, if you choose to use the command terminal, and
"libreoffice" is not recognized, the following command should work:
soffice --convert-to fodt path\to\file.docx
path\to\file.docx would be the path to the docx file.  The path may use
/ instead of \ depending on the operating system.
The command "libreoffice" might work for you, though.  I guess it
depends on the installation, operating system, PATH variable
configuration, or some other factor.

Regards,
[hidden email]

On 2/4/2014 2:22 PM, e-letter wrote:

> On 04/02/2014, Peter West <[hidden email]> wrote:
>> If you write the files out as .fodt (Flat XML) files from within LO, you
>> will have straight XML files to compare.
>>
>> Beyond that, you could get the tika-app.jar from the Apache Tika
>> project, which will let you extract plain text from the .fodt files and
>> directly from the .odt files.
>> On 4/02/2014 6:36 pm, [hidden email] wrote:
>>> Hello.
>>>
>>> Is it possible to have a diff between 2 .docx files in libreoffice? Or
>>> with any other tool, I do not really mind...
> Similarly, you could use the command terminal:
>
> libreoffice -convert-to fodt /path/to/m$file
>
> Then apply your diff tool. In addition, you could then use xslt to
> extract the elements/text of the fodt files.
>


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
berenger.morel berenger.morel
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by som


Le 04.02.2014 13:49, som a écrit :

>> On Tuesday, 4 February 2014 3:28 PM, "[hidden email]"
>> <[hidden email]> wrote:
>> > Hello.
>>
>> Is it possible to have a diff between 2 .docx files in libreoffice?
>
> i am not sure whether it works with docx but did you try the "compare
> document" option under "edit" (Edit -> Compare Document).
>
> regards,
>
> som

I did it. But... well, when I speak about a comparison, I also speak
about a good way to immediately notice the differences. Take, for
example, the command-line tool diff, or graphical meld, winmerge (
useful when I have to use windows... ).
Even a one char change is obvious.

I just retried it, to be sure. The changes between the 2 documents
were... say, almost inexistent, and so minor ( and they named the
document v2 and v3... those people just changed the version number...
pfff... or maybe "yet another error" I guess. No comment. ) which is
probably why I did not noticed them when selecting the changes the first
time. But now at least I know how libreoffice show comparisons.
It may be the more effective for that kind of work ( working on
formated text ) , I do not know, but I think that this dialog box is not
very explicit*. But maybe it's only a question of habit from a
programmer point of view, and one which likes a lot old tools like
terminals and ncurses applications.
For example, a small text to explicitly explain what are the lines, or
a help button, tooltips, I do not know what. But something that even
dumb people could be able to notice. After this failure, I tried on the
web and in the doc but was not able to find anything anew ( probably a
bad choice of keywords ). I ended by decompressing the files to compare
those damned xml things by hand...

But thanks for the help anyway. Problem solved.

*: in my situation, a very minor difference ( "Version 2.0" became
"Version 3.0" ) in the documents, on the 1st page - so, no automatic
move to the change which would have gave me a hint about what to take a
look at - and using it for the 1st time made me not understanding at all
what were those 2 lines "insertion" and "removal".

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
berenger.morel berenger.morel
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by nabbler


Le 04.02.2014 20:22, e-letter a écrit :

> On 04/02/2014, Peter West <[hidden email]> wrote:
>> If you write the files out as .fodt (Flat XML) files from within LO,
>> you
>> will have straight XML files to compare.
>>
>> Beyond that, you could get the tika-app.jar from the Apache Tika
>> project, which will let you extract plain text from the .fodt files
>> and
>> directly from the .odt files.
>> On 4/02/2014 6:36 pm, [hidden email] wrote:
>>> Hello.
>>>
>>> Is it possible to have a diff between 2 .docx files in libreoffice?
>>> Or
>>> with any other tool, I do not really mind...
>
> Similarly, you could use the command terminal:
>
> libreoffice -convert-to fodt /path/to/m$file
>
> Then apply your diff tool. In addition, you could then use xslt to
> extract the elements/text of the fodt files.

Sounds like an interesting solution, I'll think about it next time
those guys send me their junk.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
berenger.morel berenger.morel
Reply | Threaded
Open this post in threaded view
|

Re: diff of 2 docx files

In reply to this post by Brian Barker


Le 04.02.2014 16:55, Brian Barker a écrit :

> At 09:36 04/02/2014 +0100, Morel Bérenger wrote:
>>Is it possible to have a diff between 2 .docx files in libreoffice? I
>> am not fond of office suites, but at my work they sent me an outdated
>> document describing what I have to do, and when I asked about some
>> "details" ( or errors, in fact ) in it, they sent me a different ( but
>> still with errors... ) version of the same document.
>
> As has been suggested, try the Compare Document facility:
> o Open the newer document.
> o Go to Edit | Compare Document... .
> o Browse to and insert the older document.
>
> You can accept or reject the identified changes, or cancel the
> dialogue if you wish merely to view them.
>
> I trust this helps.
>
> Brian Barker

To be honest, I did found that feature without asking here. But the
changes were so minor* that I did not understood that this tool showed
them, since the results were not obvious at all and the visual changes
almost imperceptible to a user which is not used to GUIs.
Maybe one could make the hint more obvious, for example by replacing
the selected background color by one which jumps more to eyes, or by
showing some circles or something bigger around the change when it is
too small? The dialog box itself could be enhanced, by being more
explicit ( there are no bubble tooltips, no help button and no
description text for now ).
Those are just some ideas to enhance your tool. I do not mind a lot
about this, because I have to admit that I really do not like using
office suites, and my activities rarely imply me to edit some of them.

Problem solved anyway, thanks to have shared some time for this.

*: at the bottom of the first page ( so, no cursor's move, and
highlight in an area which does not jump to eyes ) the author changed a
'2' to make it a '3' ( version change...yes...sounds a bit like new
firefox version system :) ). Note that the quantity of changes in the
xml files were quite impressive, the kind of dirty results which makes
me more and more convinced that xml is over-used. And that it becomes
even worse when Microsoft tries to use it. Can you believe that with
that so minor change, compressed files have a delta size of 66 bytes?
One char => 66 bytes compressed! So inefficient... but not your fault,
ms is responsible for that.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted