anyone knows how to find/replace not-printable glyphs?

classic Classic list List threaded Threaded
9 messages Options
krackedpress krackedpress
Reply | Threaded
Open this post in threaded view
|

anyone knows how to find/replace not-printable glyphs?



I have a problem with a old book text file.  There are 3 or more lines
between paragraphs.  I want to reduce them down to only one.

When I turned the option to view non-printed characters, I get the
"PI-ish" looking symbol - ¶ - that is at the end of every paragraph.  
Each extra line has that symbol as its first character.  The character
listed in the last line of this text looks like the non-printing
character, but must be different from it, since it gives a "no match
found" error when using it.

So, is there a way to use that end of paragraph symbol/character to look
for three in a row and replace it with only two?  Since some of the
paragraphs have more than 3 extra lines between paragraphs, I could run
that find/replace several times.  All I want to show is the symbol at
the end of the paragraph and the one between paragraphs.  Since there
are at least 300 pages to the book - in text format - it would not be
practicable to do this manually.

Actually, I am trying to fix a bad .epub book.  I found that if I
convert it to a text file and the convert it back to an e-book format,
most of the formatting issues go away when viewing it with my tablet's
.epub readers [Nook and Pocketbook].

So has anyone tried to do something like this, removing blank
paragraphs, i.e. blank lines between paragraphs, as an automatic
find/replace option?

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Regina Henschel Regina Henschel
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

Hi,

if this is about removing all empty paragraphs, there is a special
feature for it besides Search and Replace.

Goto Tools > AutoCorrect > AutoCorrect Options. On tab 'Options' remove
all checks in column [M], but check 'Remove blank paragraphs'. Go back
to the text and select the whole area where you want to remove empty
paragraphs. Goto Tools > AutoCorrect > Apply. That`s all :)

Kind regards
Regina

Tim---Kracked_P_P---webmaster schrieb:

>
>
> I have a problem with a old book text file.  There are 3 or more lines
> between paragraphs.  I want to reduce them down to only one.
>
> When I turned the option to view non-printed characters, I get the
> "PI-ish" looking symbol - ¶ - that is at the end of every paragraph.
> Each extra line has that symbol as its first character.  The character
> listed in the last line of this text looks like the non-printing
> character, but must be different from it, since it gives a "no match
> found" error when using it.
>
> So, is there a way to use that end of paragraph symbol/character to look
> for three in a row and replace it with only two?  Since some of the
> paragraphs have more than 3 extra lines between paragraphs, I could run
> that find/replace several times.  All I want to show is the symbol at
> the end of the paragraph and the one between paragraphs.  Since there
> are at least 300 pages to the book - in text format - it would not be
> practicable to do this manually.
>
> Actually, I am trying to fix a bad .epub book.  I found that if I
> convert it to a text file and the convert it back to an e-book format,
> most of the formatting issues go away when viewing it with my tablet's
> .epub readers [Nook and Pocketbook].
>
> So has anyone tried to do something like this, removing blank
> paragraphs, i.e. blank lines between paragraphs, as an automatic
> find/replace option?
>


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Piet van Oostrum-2 Piet van Oostrum-2
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

In reply to this post by krackedpress
Tim---Kracked_P_P---webmaster wrote:

 > I have a problem with a old book text file.  There are 3 or more lines
 > between paragraphs.  I want to reduce them down to only one.
 >
 > When I turned the option to view non-printed characters, I get the
 > "PI-ish" looking symbol - ¶ - that is at the end of every paragraph.  
 > Each extra line has that symbol as its first character.  The character
 > listed in the last line of this text looks like the non-printing
 > character, but must be different from it, since it gives a "no match
 > found" error when using it.
 >
 > So, is there a way to use that end of paragraph symbol/character to look
 > for three in a row and replace it with only two?  Since some of the
 > paragraphs have more than 3 extra lines between paragraphs, I could run
 > that find/replace several times.  All I want to show is the symbol at
 > the end of the paragraph and the one between paragraphs.  Since there
 > are at least 300 pages to the book - in text format - it would not be
 > practicable to do this manually.
 >
 > Actually, I am trying to fix a bad .epub book.  I found that if I
 > convert it to a text file and the convert it back to an e-book format,
 > most of the formatting issues go away when viewing it with my tablet's
 > .epub readers [Nook and Pocketbook].
 >
 > So has anyone tried to do something like this, removing blank
 > paragraphs, i.e. blank lines between paragraphs, as an automatic
 > find/replace option?

As you mention in your last paragraph, it is about blank paragraphs. It is not blank lines between paragraphs. These come from the bad practice to insert so called "blank lines" as paragraph separators, but as you correctly observe, they are additional blank paragraphs. The proper way to get spacing between paragraphs is to set additional space after the paragraph with Format > Paragraph.

To get rid of the blank paragraph, use Edit > Find and Replace, Click Other options, and check Regular expressions. In the search field enter ^$

Now you can find/replace the empty paragraphs.
--
Piet van Oostrum <[hidden email]>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

libreoffice-ml.mbourne libreoffice-ml.mbourne
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

In reply to this post by krackedpress
Tim---Kracked_P_P---webmaster wrote:
> I have a problem with a old book text file.  There are 3 or more lines
> between paragraphs.  I want to reduce them down to only one.
>
> When I turned the option to view non-printed characters, I get the
> "PI-ish" looking symbol - ¶ - that is at the end of every paragraph.
> Each extra line has that symbol as its first character.  The character
> listed in the last line of this text looks like the non-printing
> character, but must be different from it, since it gives a "no match
> found" error when using it.

It's just represented like that when showing non-printing characters;
it's not actually inserted as that character (which is itself a
printable character).

> So, is there a way to use that end of paragraph symbol/character to look
> for three in a row and replace it with only two?  Since some of the
> paragraphs have more than 3 extra lines between paragraphs, I could run
> that find/replace several times.  All I want to show is the symbol at
> the end of the paragraph and the one between paragraphs.  Since there
> are at least 300 pages to the book - in text format - it would not be
> practicable to do this manually.

I haven't been able to easily find a way to search for consecutive
paragraph breaks. However, you can find empty paragraphs by searching
for "^$" (without the quotes), and ticking "Regular expressions" under
"Other options". Leave "Replace With" blank and click "Replace All", and
all the empty paragraphs will be removed.

I don't know if that helps, since it will remove all empty paragraphs,
not just those where there are 3 or more together. Unfortunately
searching for "$^$^" doesn't seem to work to find 2 consecutive empty
paragraphs...

Why do you want to leave 2 consecutive paragraph breaks anyway? If it's
to get the spacing, you should remove the extra blank paragraphs and add
the spacing by setting the spacing above/below paragraph in the
paragraph formatting (or even better, in a paragraph style which is
applied to the appropriate paragraphs).

> Actually, I am trying to fix a bad .epub book.  I found that if I
> convert it to a text file and the convert it back to an e-book format,
> most of the formatting issues go away when viewing it with my tablet's
> .epub readers [Nook and Pocketbook].

Before going too far with the spacing above/below paragraph, you might
want to check that it actually gets applied when converted into .epub
format. I'd have thought it should, but you never know.

> So has anyone tried to do something like this, removing blank
> paragraphs, i.e. blank lines between paragraphs, as an automatic
> find/replace option?

--
Mark.


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
remygauthier remygauthier
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

Le mardi 15 mars 2016 à 21:32 +0000, libreoffice
-[hidden email] a écrit :

> Tim---Kracked_P_P---webmaster wrote:
> > I have a problem with a old book text file.  There are 3 or more
> > lines
> > between paragraphs.  I want to reduce them down to only one.
> >
> > When I turned the option to view non-printed characters, I get the
> > "PI-ish" looking symbol - ¶ - that is at the end of every
> > paragraph.
> > Each extra line has that symbol as its first character.  The
> > character
> > listed in the last line of this text looks like the non-printing
> > character, but must be different from it, since it gives a "no
> > match
> > found" error when using it.
>
> It's just represented like that when showing non-printing characters;
> it's not actually inserted as that character (which is itself a
> printable character).
>
> > So, is there a way to use that end of paragraph symbol/character to
> > look
> > for three in a row and replace it with only two?  Since some of the
> > paragraphs have more than 3 extra lines between paragraphs, I could
> > run
> > that find/replace several times.  All I want to show is the symbol
> > at
> > the end of the paragraph and the one between paragraphs.  Since
> > there
> > are at least 300 pages to the book - in text format - it would not
> > be
> > practicable to do this manually.
>
> I haven't been able to easily find a way to search for consecutive
> paragraph breaks. However, you can find empty paragraphs by searching
> for "^$" (without the quotes), and ticking "Regular expressions"
> under
> "Other options". Leave "Replace With" blank and click "Replace All",
> and
> all the empty paragraphs will be removed.
>
> I don't know if that helps, since it will remove all empty
> paragraphs,
> not just those where there are 3 or more together. Unfortunately
> searching for "$^$^" doesn't seem to work to find 2 consecutive empty
> paragraphs...
>
> Why do you want to leave 2 consecutive paragraph breaks anyway? If
> it's
> to get the spacing, you should remove the extra blank paragraphs and
> add
> the spacing by setting the spacing above/below paragraph in the
> paragraph formatting (or even better, in a paragraph style which is
> applied to the appropriate paragraphs).
>
> > Actually, I am trying to fix a bad .epub book.  I found that if I
> > convert it to a text file and the convert it back to an e-book
> > format,
> > most of the formatting issues go away when viewing it with my
> > tablet's
> > .epub readers [Nook and Pocketbook].
>
> Before going too far with the spacing above/below paragraph, you
> might
> want to check that it actually gets applied when converted into .epub
> format. I'd have thought it should, but you never know.
>
> > So has anyone tried to do something like this, removing blank
> > paragraphs, i.e. blank lines between paragraphs, as an automatic
> > find/replace option?
If you need to search and replace paragraph marks and others, you can
use AltSearch (http://extensions.libreoffice.org/extension-center/alter
native-dialog-find-replace-for-writer). If have it installed on LO5.1
and it works fine. You look for two end of paragraph characters (select
from the drop-down or type in \p\p)
 and replace with just one; repeat until there no more replacements.
Cheers!
Rémy Gauthier.
> --
> Mark.
>
>
>

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
krackedpress krackedpress
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

In reply to this post by libreoffice-ml.mbourne
On 03/15/2016 05:32 PM, [hidden email] wrote:

> Tim---Kracked_P_P---webmaster wrote:
>> I have a problem with a old book text file.  There are 3 or more lines
>> between paragraphs.  I want to reduce them down to only one.
>>
>> When I turned the option to view non-printed characters, I get the
>> "PI-ish" looking symbol - ¶ - that is at the end of every paragraph.
>> Each extra line has that symbol as its first character.  The character
>> listed in the last line of this text looks like the non-printing
>> character, but must be different from it, since it gives a "no match
>> found" error when using it.
>
> It's just represented like that when showing non-printing characters;
> it's not actually inserted as that character (which is itself a
> printable character).
>
>> So, is there a way to use that end of paragraph symbol/character to look
>> for three in a row and replace it with only two?  Since some of the
>> paragraphs have more than 3 extra lines between paragraphs, I could run
>> that find/replace several times.  All I want to show is the symbol at
>> the end of the paragraph and the one between paragraphs.  Since there
>> are at least 300 pages to the book - in text format - it would not be
>> practicable to do this manually.
>
> I haven't been able to easily find a way to search for consecutive
> paragraph breaks. However, you can find empty paragraphs by searching
> for "^$" (without the quotes), and ticking "Regular expressions" under
> "Other options". Leave "Replace With" blank and click "Replace All",
> and all the empty paragraphs will be removed.
>
> I don't know if that helps, since it will remove all empty paragraphs,
> not just those where there are 3 or more together. Unfortunately
> searching for "$^$^" doesn't seem to work to find 2 consecutive empty
> paragraphs...
>
> Why do you want to leave 2 consecutive paragraph breaks anyway? If
> it's to get the spacing, you should remove the extra blank paragraphs
> and add the spacing by setting the spacing above/below paragraph in
> the paragraph formatting (or even better, in a paragraph style which
> is applied to the appropriate paragraphs).
>
>> Actually, I am trying to fix a bad .epub book.  I found that if I
>> convert it to a text file and the convert it back to an e-book format,
>> most of the formatting issues go away when viewing it with my tablet's
>> .epub readers [Nook and Pocketbook].
>
> Before going too far with the spacing above/below paragraph, you might
> want to check that it actually gets applied when converted into .epub
> format. I'd have thought it should, but you never know.
>
>> So has anyone tried to do something like this, removing blank
>> paragraphs, i.e. blank lines between paragraphs, as an automatic
>> find/replace option?
>

Here is a link to a part of the screen-clip of a typical problem.
http://libreoffice-na.us/holding/paragraph-break.jpg

This shows the problem with the text file.  The .epub had even more
problems like it seemed that after every sentence would appear to have a
"carriage-return and/or line-feed - plus it seemed to be doubled space.

That ¶ symbol is shown in blue and some of the text has one breaking up
the middle of the sentence between paragraphs even though it was not
needed for a page break.  Some times there 3 of these symbols between
the paragraphs, and other times there are 4 symbols.

I can read the book in a text file, but it would be better as an .epub
file.  The conversion is done by Calibre, first to the text file, then
back from there.  To be honest, I really start using that package to
convert PDF book files to an e-book so I can read it in a larger font
than having the original PDF file viewed on my 10 inch tablet.  Then I
started to use it for my free book collection and other conversion needs.





--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
krackedpress krackedpress
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

In reply to this post by Regina Henschel

This worked for a lot of the issues.

When I took the TXT file and convert it to EPUB file.

I still have some more issues, but it is better than it originally was.


On 03/15/2016 04:24 PM, Regina Henschel wrote:

> Hi,
>
> if this is about removing all empty paragraphs, there is a special
> feature for it besides Search and Replace.
>
> Goto Tools > AutoCorrect > AutoCorrect Options. On tab 'Options'
> remove all checks in column [M], but check 'Remove blank paragraphs'.
> Go back to the text and select the whole area where you want to remove
> empty paragraphs. Goto Tools > AutoCorrect > Apply. That`s all :)
>
> Kind regards
> Regina
>
> Tim---Kracked_P_P---webmaster schrieb:
>>
>>
>> I have a problem with a old book text file.  There are 3 or more lines
>> between paragraphs.  I want to reduce them down to only one.
>>
>> When I turned the option to view non-printed characters, I get the
>> "PI-ish" looking symbol - ¶ - that is at the end of every paragraph.
>> Each extra line has that symbol as its first character.  The character
>> listed in the last line of this text looks like the non-printing
>> character, but must be different from it, since it gives a "no match
>> found" error when using it.
>>
>> So, is there a way to use that end of paragraph symbol/character to look
>> for three in a row and replace it with only two?  Since some of the
>> paragraphs have more than 3 extra lines between paragraphs, I could run
>> that find/replace several times.  All I want to show is the symbol at
>> the end of the paragraph and the one between paragraphs.  Since there
>> are at least 300 pages to the book - in text format - it would not be
>> practicable to do this manually.
>>
>> Actually, I am trying to fix a bad .epub book.  I found that if I
>> convert it to a text file and the convert it back to an e-book format,
>> most of the formatting issues go away when viewing it with my tablet's
>> .epub readers [Nook and Pocketbook].
>>
>> So has anyone tried to do something like this, removing blank
>> paragraphs, i.e. blank lines between paragraphs, as an automatic
>> find/replace option?
>>
>
>


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Felmon Davis Felmon Davis
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

In reply to this post by remygauthier
On Tue, 15 Mar 2016, Remy Gauthier wrote:

> If you need to search and replace paragraph marks and others, you can
> use AltSearch (http://extensions.libreoffice.org/extension-center/alter
> native-dialog-find-replace-for-writer). If have it installed on LO5.1
> and it works fine. You look for two end of paragraph characters (select
> from the drop-down or type in \p\p)
> and replace with just one; repeat until there no more replacements.
> Cheers!
> Rémy Gauthier.

I wonder about simply using search/replace and remove all end of
paragraph marks ('\n\), replacing them with a mark of one's own like
'#' (as long as it doesn't occur elsewhere in the text); then remove
all consecutive "##' and replace with '\r' or '\r\r'?

f.

--
Felmon Davis

He knows not how to know who knows not also how to unknow.
                 -- Sir Richard Burton

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
jomali jomali
Reply | Threaded
Open this post in threaded view
|

Re: anyone knows how to find/replace not-printable glyphs?

On Wed, Mar 16, 2016 at 1:46 PM, Felmon Davis <[hidden email]> wrote:

> On Tue, 15 Mar 2016, Remy Gauthier wrote:
>
> If you need to search and replace paragraph marks and others, you can
>> use AltSearch (http://extensions.libreoffice.org/extension-center/alter
>> native-dialog-find-replace-for-writer). If have it installed on LO5.1
>> and it works fine. You look for two end of paragraph characters (select
>> from the drop-down or type in \p\p)
>> and replace with just one; repeat until there no more replacements.
>> Cheers!
>> Rémy Gauthier.
>>
>
> I wonder about simply using search/replace and remove all end of paragraph
> marks ('\n\), replacing them with a mark of one's own like '#' (as long as
> it doesn't occur elsewhere in the text); then remove all consecutive "##'
> and replace with '\r' or '\r\r'?
>
> f.
>
> This won't work because there are no end of paragraph marks ("\n"). The
Open Document format is a form of XML, in which paragraphs are denoted by
<p>[Content]</p>.

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted