Better PDF import?

classic Classic list List threaded Threaded
6 messages Options
Mateusz Zasuwik Mateusz Zasuwik
Reply | Threaded
Open this post in threaded view
|

Better PDF import?

Hello TDF members!

I noticed you guys heavily work over many dead formats. But did anyone
think about better PDF Import? I lost my data lately and I had open some
old PDF files which turn out to be normal PDF (not hybrid PDF/ODT). I
opened it via LO but document was rendered within Draw, not Writer or
Impress as it should be done. You also should take account of fact that MS
Office opens every PDF file and easily convert it on its native format what
makes this software more useful for normal users. Hybrid PDF is cool
feature, but probably it won't become major file format for long time. The
market is dominated by normal PDF and in this area LireOffice just suck. So
maybe someone who assigns aims for future (TDF Board?) should regroup
thoughts once again and concentrates for features needed on a daily basis?

Have a nice day, hey
M.Z.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted
italovignoli italovignoli
Reply | Threaded
Open this post in threaded view
|

Re: Better PDF import?

On 21/06/14 11:07, Mateusz Zasuwik wrote:

> I noticed you guys heavily work over many dead formats. But did anyone
> think about better PDF Import? I lost my data lately and I had open some
> old PDF files which turn out to be normal PDF (not hybrid PDF/ODT). I
> opened it via LO but document was rendered within Draw, not Writer or
> Impress as it should be done. You also should take account of fact that MS
> Office opens every PDF file and easily convert it on its native format what
> makes this software more useful for normal users. Hybrid PDF is cool
> feature, but probably it won't become major file format for long time. The
> market is dominated by normal PDF and in this area LireOffice just suck. So
> maybe someone who assigns aims for future (TDF Board?) should regroup
> thoughts once again and concentrates for features needed on a daily basis?

The right behaviour is to open PDFs in Draw.

Have you ever used MS Office 2013 to open a PDF? It sucks at least as
much as any other software.

PDF is based on PostScript, which is a page description language, and as
such is extremely difficult to convert into an editable document.

--
Italo Vignoli - [hidden email]
mob IT +39.348.5653829 - mob EU +39.392.7481795
sip [hidden email] - skype italovignoli
GPG Key ID - 0xAAB8D5C0
DB75 1534 3FD0 EA5F 56B5 FDA6 DE82 934C AAB8 D5C0

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted
Italo Vignoli
Director - The Document Foundation
Terrence Enger Terrence Enger
Reply | Threaded
Open this post in threaded view
|

Re: Better PDF import?

On Sat, 2014-06-21 at 11:22 +0200, Italo Vignoli wrote:
> PDF is based on PostScript, which is a page description language, and as
> such is extremely difficult to convert into an editable document.

I have heard about OCR, but I have never been desperate enough to try
it.  

Likely there is some OCR package with a licenced which would let us
incorporate it into LibreOffice.  I do not have an opinion about
whether it would be worth the increase in the size of our project and
the ongoing maintenance burden.

Terry.



--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: Better PDF import?

On 6/21/2014 11:30 AM, Terrence Enger wrote:

 > I have heard about OCR,

OCR usually manages to remove all presentation markup. It can also
remove sentence, and paragraph differentiators.

 From my POV, if the content is in PDF format, and the person needs
another format, they can OCR it.

jonathon

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted
Mateusz Zasuwik Mateusz Zasuwik
Reply | Threaded
Open this post in threaded view
|

Re: Better PDF import?

In reply to this post by italovignoli
2014-06-21 11:22 GMT+02:00 Italo Vignoli <[hidden email]>:

> On 21/06/14 11:07, Mateusz Zasuwik wrote:
>
> > I noticed you guys heavily work over many dead formats. But did anyone
> > think about better PDF Import? I lost my data lately and I had open some
> > old PDF files which turn out to be normal PDF (not hybrid PDF/ODT). I
> > opened it via LO but document was rendered within Draw, not Writer or
> > Impress as it should be done. You also should take account of fact that
> MS
> > Office opens every PDF file and easily convert it on its native format
> what
> > makes this software more useful for normal users. Hybrid PDF is cool
> > feature, but probably it won't become major file format for long time.
> The
> > market is dominated by normal PDF and in this area LireOffice just suck.
> So
> > maybe someone who assigns aims for future (TDF Board?) should regroup
> > thoughts once again and concentrates for features needed on a daily
> basis?
>
> The right behaviour is to open PDFs in Draw.
>

Why is it right behavior? Bigger part of documents floating around in
Internet are documents crated in text editor. Later in the queue we
have presentations
or spreadsheets. The likelihood that user will need open "PDF-ed text
document" is incomparably greater than anything else and in spite of it,
these documents are opened in "picture module".


> Have you ever used MS Office 2013 to open a PDF? It sucks at least as
> much as any other software.
>

I did and it worked much better than LibreOffice does. There is no
faultless software but in this case, LibreOffice just sucks more than MS
Office.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted
Robinson Tryon Robinson Tryon
Reply | Threaded
Open this post in threaded view
|

Re: Better PDF import?

On Sun, Jun 22, 2014 at 6:42 AM, Mateusz Zasuwik <[hidden email]> wrote:
> 2014-06-21 11:22 GMT+02:00 Italo Vignoli <[hidden email]>:
>
> Why is it right behavior? Bigger part of documents floating around in
> Internet are documents crated in text editor.

Most documents on the Internet (or that I've been sent) have been
created in a word processor (usually MS-Word). Frustratingly, most of
these docs don't need any markup, and *should've* been created in a
text editor, but that's a user-education topic for another time... :-)

> Later in the queue we
> have presentations
> or spreadsheets. The likelihood that user will need open "PDF-ed text
> document" is incomparably greater than anything else and in spite of it,
> these documents are opened in "picture module".

I'm certainly not the prototypical office worker, but I can only
remember a handful of times I've had to open a PDF for editing
purposes. Documents that should be edited should be sent in editable
formats.

Using LibreOffice as a PDF-editor is a neat trick -- in fact, it's
something I highlighted in my talk at Texas Linux Fest a week ago[1]
-- but while I encourage the improvement of our PDF import
capabilities, I'm reticent to go so far as to describe PDF as a
first-class editable format on the same order as ODT, DOC, DOCX, etc.

>> Have you ever used MS Office 2013 to open a PDF? It sucks at least as
>> much as any other software.
>
> I did and it worked much better than LibreOffice does. There is no
> faultless software but in this case, LibreOffice just sucks more than MS
> Office.

Perhaps we should look into the use cases for editing PDFs and try a
multi-pronged approach:
1) Above all else, promote education so that people choose the right
format for the job (including editable formats such as TXT, ODT,
DOC(X), and Hybrid-PDF).
2) Collect a set of (permissively-licensed) PDFs that demonstrate
reduced fidelity when opened in LibreOffice (so that we can improve
our import support)
3) Determine what aspects of PDF markup present the most problems for
LibreOffice (C* fonts? Tables?) and make those into useful targets.


Cheers,
--R


[1] http://texaslinuxfest.org/content/libreoffice-tools-and-tricks-making-your-work-easier

--
Robinson Tryon
LibreOffice Community Outreach Herald
Senior QA Bug Wrangler
The Document Foundation
[hidden email]

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.documentfoundation.org/www/discuss/
All messages sent to this list will be publicly archived and cannot be deleted