# diff of 2 docx files

 Classic List Threaded
12 messages
Reply | Threaded
Open this post in threaded view
|

## diff of 2 docx files

 Hello. Is it possible to have a diff between 2 .docx files in libreoffice? Or with any other tool, I do not really mind... I am not fond of office suites, but at my work they sent me an outdated document describing what I have to do, and when I asked about some "details" ( or errors, in fact ) in it, they sent me a different ( but still with errors... ) version of the same document. With a simple text file, I could have made a diff ( lot of tools for that ) but I have no idea about how to do it with docx files. I have obviously tried to unzip them, and use diff and meld on resulting files, but those files are so messy, that I thought that, maybe, they are encrypted of whatever! Now that I think about it, I could do a diff of files made through a copy/paste of the text... but that would be quite dirty. Thanks for any suggestions. PS: I did not registered to this list, so please add me in CC. -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 If you write the files out as .fodt (Flat XML) files from within LO, you will have straight XML files to compare. Beyond that, you could get the tika-app.jar from the Apache Tika project, which will let you extract plain text from the .fodt files and directly from the .odt files. On 4/02/2014 6:36 pm, [hidden email] wrote: > Hello. > > Is it possible to have a diff between 2 .docx files in libreoffice? Or > with any other tool, I do not really mind... > I am not fond of office suites, but at my work they sent me an outdated > document describing what I have to do, and when I asked about some > "details" ( or errors, in fact ) in it, they sent me a different ( but > still with errors... ) version of the same document. > With a simple text file, I could have made a diff ( lot of tools for > that ) but I have no idea about how to do it with docx files. I have > obviously tried to unzip them, and use diff and meld on resulting files, > but those files are so messy, that I thought that, maybe, they are > encrypted of whatever! > Now that I think about it, I could do a diff of files made through a > copy/paste of the text... but that would be quite dirty. > > Thanks for any suggestions. > > PS: I did not registered to this list, so please add me in CC. > -- Peter West "...and a sword will pierce through your own soul also..." -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 Le 04.02.2014 11:08, Peter West a écrit : > If you write the files out as .fodt (Flat XML) files from within LO, > you will have straight XML files to compare. > > Beyond that, you could get the tika-app.jar from the Apache Tika > project, which will let you extract plain text from the .fodt files > and directly from the .odt files. The problem is that I am not the author of those files. But you are true, I could simply copy the text in a file of better format... I should have thought about that myself btw... so obvious. Thanks. -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 In reply to this post by berenger.morel > On Tuesday, 4 February 2014 3:28 PM, "[hidden email]" <[hidden email]> wrote: > > Hello. > > Is it possible to have a diff between 2 .docx files in libreoffice? i am not sure whether it works with docx but did you try the "compare document" option under "edit" (Edit -> Compare Document). regards, som -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 In reply to this post by berenger.morel > On Tuesday, 4 February 2014 3:28 PM, "[hidden email]" <[hidden email]> wrote: > > Hello. > > Is it possible to have a diff between 2 .docx files in libreoffice? i am not sure whether it works with docx but did you try the "compare document" option under "edit" (Edit -> Compare Document). regards, som -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 In reply to this post by berenger.morel At 09:36 04/02/2014 +0100, Morel Bérenger wrote: >Is it possible to have a diff between 2 .docx >files in libreoffice? I am not fond of office >suites, but at my work they sent me an outdated >document describing what I have to do, and when >I asked about some "details" ( or errors, in >fact ) in it, they sent me a different ( but >still with errors... ) version of the same document. As has been suggested, try the Compare Document facility: o Open the newer document. o Go to Edit | Compare Document... . o Browse to and insert the older document. You can accept or reject the identified changes, or cancel the dialogue if you wish merely to view them. I trust this helps. Brian Barker -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

 In reply to this post by pbw On 04/02/2014, Peter West <[hidden email]> wrote: > If you write the files out as .fodt (Flat XML) files from within LO, you > will have straight XML files to compare. > > Beyond that, you could get the tika-app.jar from the Apache Tika > project, which will let you extract plain text from the .fodt files and > directly from the .odt files. > On 4/02/2014 6:36 pm, [hidden email] wrote: >> Hello. >> >> Is it possible to have a diff between 2 .docx files in libreoffice? Or >> with any other tool, I do not really mind... Similarly, you could use the command terminal: libreoffice -convert-to fodt /path/to/m$file Then apply your diff tool. In addition, you could then use xslt to extract the elements/text of the fodt files. -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted Reply | Threaded Open this post in threaded view | ## Re: diff of 2 docx files  In reply to this post by pbw On 4/02/2014 8:08 pm, Peter West wrote: > If you write the files out as .fodt (Flat XML) files from within LO, you > will have straight XML files to compare. > > Beyond that, you could get the tika-app.jar from the Apache Tika > project, which will let you extract plain text from the .fodt files and > directly from the .odt files. Make that ".docx files." > On 4/02/2014 6:36 pm, [hidden email] wrote: >> Hello. >> >> Is it possible to have a diff between 2 .docx files in libreoffice? Or >> with any other tool, I do not really mind... >> I am not fond of office suites, but at my work they sent me an outdated >> document describing what I have to do, and when I asked about some >> "details" ( or errors, in fact ) in it, they sent me a different ( but >> still with errors... ) version of the same document. >> With a simple text file, I could have made a diff ( lot of tools for >> that ) but I have no idea about how to do it with docx files. I have >> obviously tried to unzip them, and use diff and meld on resulting files, >> but those files are so messy, that I thought that, maybe, they are >> encrypted of whatever! >> Now that I think about it, I could do a diff of files made through a >> copy/paste of the text... but that would be quite dirty. >> >> Thanks for any suggestions. >> >> PS: I did not registered to this list, so please add me in CC. >> > -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted Reply | Threaded Open this post in threaded view | ## Re: diff of 2 docx files  In reply to this post by nabbler Hello, For some reason, on my Windows PC, the "libreoffice" command is not present. So, if you choose to use the command terminal, and "libreoffice" is not recognized, the following command should work: soffice --convert-to fodt path\to\file.docx path\to\file.docx would be the path to the docx file. The path may use / instead of \ depending on the operating system. The command "libreoffice" might work for you, though. I guess it depends on the installation, operating system, PATH variable configuration, or some other factor. Regards, [hidden email] On 2/4/2014 2:22 PM, e-letter wrote: > On 04/02/2014, Peter West <[hidden email]> wrote: >> If you write the files out as .fodt (Flat XML) files from within LO, you >> will have straight XML files to compare. >> >> Beyond that, you could get the tika-app.jar from the Apache Tika >> project, which will let you extract plain text from the .fodt files and >> directly from the .odt files. >> On 4/02/2014 6:36 pm, [hidden email] wrote: >>> Hello. >>> >>> Is it possible to have a diff between 2 .docx files in libreoffice? Or >>> with any other tool, I do not really mind... > Similarly, you could use the command terminal: > > libreoffice -convert-to fodt /path/to/m$file > > Then apply your diff tool. In addition, you could then use xslt to > extract the elements/text of the fodt files. > -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 In reply to this post by som Le 04.02.2014 13:49, som a écrit : >> On Tuesday, 4 February 2014 3:28 PM, "[hidden email]" >> <[hidden email]> wrote: >> > Hello. >> >> Is it possible to have a diff between 2 .docx files in libreoffice? > > i am not sure whether it works with docx but did you try the "compare > document" option under "edit" (Edit -> Compare Document). > > regards, > > som I did it. But... well, when I speak about a comparison, I also speak about a good way to immediately notice the differences. Take, for example, the command-line tool diff, or graphical meld, winmerge ( useful when I have to use windows... ). Even a one char change is obvious. I just retried it, to be sure. The changes between the 2 documents were... say, almost inexistent, and so minor ( and they named the document v2 and v3... those people just changed the version number... pfff... or maybe "yet another error" I guess. No comment. ) which is probably why I did not noticed them when selecting the changes the first time. But now at least I know how libreoffice show comparisons. It may be the more effective for that kind of work ( working on formated text ) , I do not know, but I think that this dialog box is not very explicit*. But maybe it's only a question of habit from a programmer point of view, and one which likes a lot old tools like terminals and ncurses applications. For example, a small text to explicitly explain what are the lines, or a help button, tooltips, I do not know what. But something that even dumb people could be able to notice. After this failure, I tried on the web and in the doc but was not able to find anything anew ( probably a bad choice of keywords ). I ended by decompressing the files to compare those damned xml things by hand... But thanks for the help anyway. Problem solved. *: in my situation, a very minor difference ( "Version 2.0" became "Version 3.0" ) in the documents, on the 1st page - so, no automatic move to the change which would have gave me a hint about what to take a look at - and using it for the 1st time made me not understanding at all what were those 2 lines "insertion" and "removal". -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 In reply to this post by nabbler Le 04.02.2014 20:22, e-letter a écrit : > On 04/02/2014, Peter West <[hidden email]> wrote: >> If you write the files out as .fodt (Flat XML) files from within LO, >> you >> will have straight XML files to compare. >> >> Beyond that, you could get the tika-app.jar from the Apache Tika >> project, which will let you extract plain text from the .fodt files >> and >> directly from the .odt files. >> On 4/02/2014 6:36 pm, [hidden email] wrote: >>> Hello. >>> >>> Is it possible to have a diff between 2 .docx files in libreoffice? >>> Or >>> with any other tool, I do not really mind... > > Similarly, you could use the command terminal: > > libreoffice -convert-to fodt /path/to/m\$file > > Then apply your diff tool. In addition, you could then use xslt to > extract the elements/text of the fodt files. Sounds like an interesting solution, I'll think about it next time those guys send me their junk. -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted
Reply | Threaded
Open this post in threaded view
|

## Re: diff of 2 docx files

 In reply to this post by Brian Barker Le 04.02.2014 16:55, Brian Barker a écrit : > At 09:36 04/02/2014 +0100, Morel Bérenger wrote: >>Is it possible to have a diff between 2 .docx files in libreoffice? I >> am not fond of office suites, but at my work they sent me an outdated >> document describing what I have to do, and when I asked about some >> "details" ( or errors, in fact ) in it, they sent me a different ( but >> still with errors... ) version of the same document. > > As has been suggested, try the Compare Document facility: > o Open the newer document. > o Go to Edit | Compare Document... . > o Browse to and insert the older document. > > You can accept or reject the identified changes, or cancel the > dialogue if you wish merely to view them. > > I trust this helps. > > Brian Barker To be honest, I did found that feature without asking here. But the changes were so minor* that I did not understood that this tool showed them, since the results were not obvious at all and the visual changes almost imperceptible to a user which is not used to GUIs. Maybe one could make the hint more obvious, for example by replacing the selected background color by one which jumps more to eyes, or by showing some circles or something bigger around the change when it is too small? The dialog box itself could be enhanced, by being more explicit ( there are no bubble tooltips, no help button and no description text for now ). Those are just some ideas to enhance your tool. I do not mind a lot about this, because I have to admit that I really do not like using office suites, and my activities rarely imply me to edit some of them. Problem solved anyway, thanks to have shared some time for this. *: at the bottom of the first page ( so, no cursor's move, and highlight in an area which does not jump to eyes ) the author changed a '2' to make it a '3' ( version change...yes...sounds a bit like new firefox version system :) ). Note that the quantity of changes in the xml files were quite impressive, the kind of dirty results which makes me more and more convinced that xml is over-used. And that it becomes even worse when Microsoft tries to use it. Can you believe that with that so minor change, compressed files have a delta size of 66 bytes? One char => 66 bytes compressed! So inefficient... but not your fault, ms is responsible for that. -- To unsubscribe e-mail to: [hidden email] Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/Posting guidelines + more: http://wiki.documentfoundation.org/NetiquetteList archive: http://listarchives.libreoffice.org/global/users/All messages sent to this list will be publicly archived and cannot be deleted