[Writer] Regular expressions

classic Classic list List threaded Threaded
24 messages Options
Next » 12
gordom gordom
Reply | Threaded
Open this post in threaded view
|

[Writer] Regular expressions

Hallo everyone.
I would appreciate your help with the regular expressions. I have a
document consisting of hundreds of lines. A small sample is here:

Set:   01SA34509
0109SA
011017B
01020207B
010902B
01090002
011007B
01090001
090110
Set:   0134501
011101
01110102
01110103
080908
Set:   0111679SE
0111SE

I need to delete all text except these lines started with word "Set". If
I use "set:.+" regular expression, all these lines, that should be kept,
are selected. I cant find a way to reverse this selection. I tried
"[^set:.+].+" and "[^(set:.+)].+" but they don't work. Could you please
give me any clues? Thanks in advance. Regards,
gordom


--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Mirosław Zalewski Mirosław Zalewski
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

Since LibreOffice regex engine is crippled and don't support lookaheads, the
short answer is:
no, you can't do that.

BUT do lines you want to delete happen to fall into some common pattern? In
your sample (which may or may not be representative for entire text) they
does. In fact you want to delete all lines that start with number followed by
letter. You can use this regexp to match these lines:

^[0-9]+.*

^ (caret) is for "begging of line". So this will match every line that starts
with at least one number.

If you don't care about formatting, you may also export your file into TXT and
use perl, which has superior regex capabilities.
--
Best regards
Mirosław Zalewski

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Mirosław Zalewski Mirosław Zalewski
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by gordom
On 08/01/2013 at 19:58, gordom <[hidden email]> wrote:

> I use "set:.+" regular expression, all these lines, that should be kept,
> are selected.

Another idea:
since you can select these lines you want to preserve, why don't copy them and
paste into new document? This should be easiest and most error-proof solution.
--
Best regards
Mirosław Zalewski

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
gordom gordom
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by Mirosław Zalewski
W dniu 2013-01-08 20:20, Mirosław Zalewski pisze:
> Since LibreOffice regex engine is crippled and don't support lookaheads, the
> short answer is:
> no, you can't do that.

That's a pity.

>
> BUT do lines you want to delete happen to fall into some common pattern? In
> your sample (which may or may not be representative for entire text) they
> does. In fact you want to delete all lines that start with number followed by
> letter. You can use this regexp to match these lines:
>
> ^[0-9]+.*
>

Unfortunately my sample wasn't very accurate and can't be regarded as
fully representative. The pattern is more complex actually.

regards,
gordom


--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

gordom gordom
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by Mirosław Zalewski
W dniu 2013-01-08 20:37, Mirosław Zalewski pisze:
> On 08/01/2013 at 19:58, gordom <[hidden email]> wrote:
>
>> I use "set:.+" regular expression, all these lines, that should be kept,
>> are selected.
>
> Another idea:
> since you can select these lines you want to preserve, why don't copy them and
> paste into new document? This should be easiest and most error-proof solution.
>

I can't simply copy and paste them because I will loose the line order.
But I did another thing. I used a Calc to sort data, delete unwanted
text and return to the previous order.

Regards / Pozdrawiam :-),
gordom

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Nino Nino
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by gordom
Am 08.01.2013 19:58, schrieb gordom:

> Set:   01SA34509
> 0109SA
> 011017B
> 01020207B
> 010902B
> 01090002
> 011007B
> 01090001
> 090110
> Set:   0134501
> 011101
> 01110102
> 01110103
> 080908
> Set:   0111679SE
> 0111SE

Search for: ^[^Ss]
Replace with: (leave empty)
[x] Regular expression

works fine for me (LibreOffice 3.5.4.2 on win-32)

Nino

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Nino Nino
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by gordom
Am 08.01.2013 21:11, schrieb gordom:
> W dniu 2013-01-08 20:20, Mirosław Zalewski pisze:
>> Since LibreOffice regex engine is crippled and don't support
>> lookaheads, the
>> short answer is:
>> no, you can't do that.
>
> That's a pity.

but only if it's really true ;-)


>> BUT do lines you want to delete happen to fall into some common
>> pattern? In
>> your sample (which may or may not be representative for entire text) they
>> does. In fact you want to delete all lines that start with number
>> followed by
>> letter. You can use this regexp to match these lines:
>>
>> ^[0-9]+.*
>>
>
> Unfortunately my sample wasn't very accurate and can't be regarded as
> fully representative. The pattern is more complex actually.

Then you should try to show a better example as Regex are pattern
matching and without knowing the pattern we cannot guess how to match it ;-)

Nino

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

gordom gordom
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

W dniu 2013-01-08 21:17, Nino Novak pisze:

> Am 08.01.2013 21:11, schrieb gordom:
>> W dniu 2013-01-08 20:20, Mirosław Zalewski pisze:
>>> Since LibreOffice regex engine is crippled and don't support
>>> lookaheads, the
>>> short answer is:
>>> no, you can't do that.
>>
>> That's a pity.
>
> but only if it's really true ;-)
>
>
>>> BUT do lines you want to delete happen to fall into some common
>>> pattern? In
>>> your sample (which may or may not be representative for entire text)
>>> they
>>> does. In fact you want to delete all lines that start with number
>>> followed by
>>> letter. You can use this regexp to match these lines:
>>>
>>> ^[0-9]+.*
>>>
>>
>> Unfortunately my sample wasn't very accurate and can't be regarded as
>> fully representative. The pattern is more complex actually.
>
> Then you should try to show a better example as Regex are pattern
> matching and without knowing the pattern we cannot guess how to match it
> ;-)
>
> Nino
>

In "real" life :-) there are lines starting with letters also. Only
these with "Set:" at the beginning should be left, rest is going to be
deleted.

Set:   01SA34509
0109SA
011017B
S01020207B
010902B
01090002
011007B
01090001
090110
Set:   0134501
011101
HB01110102
01110103
080908
Set:   0111679SE
0111SE

I'm surprised that there is no simple way to find everything except
"Set:.+$"

gordom

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Mirosław Zalewski Mirosław Zalewski
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

Although you have solved your problem already by other means, but let's check
other possibilities.

On 08/01/2013 at 21:45, gordom <[hidden email]> wrote:

> In "real" life :-) there are lines starting with letters also. Only
> these with "Set:" at the beginning should be left, rest is going to be
> deleted.

By looking at this sample, I see three more patterns that could get work done.
1. It seems that only lower-case letters are in "Set", which is in lines you
want preserve. So maybe looking for lines containing only numbers and upper-
case letters will do the trick.
2. In this sample, spaces are only in lines you want to preserve. What about
matching lines without whitespace characters?
3. It looks like colon happens only in lines you want to delete. Match lines
without colons, maybe?

> I'm surprised that there is no simple way to find everything except
> "Set:.+$"

Well, regexps are most often used in programming languages or tools like grep,
where you can easily get "reverse match" mode (print everything that does NOT
match). But LO is not programming language and it's simple regexp engine is
simply not sophisticated enough in many cases.

I have read somewhere on TDF wiki about incorporating mature regexp library in
LO, but this idea was rejected due to portability issues. LO must run on
Windows and OS X as well as on Linux, whereas library in question runs only on
Linux.
--
Best regards
Mirosław Zalewski

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Mirosław Zalewski Mirosław Zalewski
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by Nino
On 08/01/2013 at 21:17, Nino Novak <[hidden email]> wrote:

> but only if it's really true ;-)

Unless you are 100% sure that my statement is not true, please don't question
it.

And you are NOT 100% sure until you post LO Writer regexp that will use
negation pattern that OP is looking for.
--
Best regards
Mirosław Zalewski

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Tom Tom
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

Hi :) 
I like the quote "Never believe anything until it's been denied by a government minister" but now i can't find where the quote comes from so i'm beginning to doubt it was ever really said by anyone. 
Regards from
Tom :) 





>________________________________
> From: Mirosław Zalewski <[hidden email]>
>To: [hidden email]
>Sent: Tuesday, 8 January 2013, 22:39
>Subject: Re: [libreoffice-users] Re: [Writer] Regular expressions
>
>On 08/01/2013 at 21:17, Nino Novak <[hidden email]> wrote:
>
>> but only if it's really true ;-)
>
>Unless you are 100% sure that my statement is not true, please don't question
>it.
>
>And you are NOT 100% sure until you post LO Writer regexp that will use
>negation pattern that OP is looking for.
>--
>Best regards
>Mirosław Zalewski
>
>--
>For unsubscribe instructions e-mail to: [hidden email]
>Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
>Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
>List archive: http://listarchives.libreoffice.org/global/users/
>All messages sent to this list will be publicly archived and cannot be deleted
>
>
>
--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Winston C. Yang Winston C. Yang
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by Mirosław Zalewski
Gordom,

Below, Miroslaw mentioned grep. In Linux, you can achieve your goal by
typing something like the following:

     grep --extended-regexp "^Set" input.txt > output.txt

This gets the lines in input.txt that start with "Set" and writes these
lines to output.txt.

You might not even have to type the "--extended-regexp" part.

Winston

On 01/08/2013 05:38 PM, Mirosław Zalewski wrote:

> Although you have solved your problem already by other means, but let's check
> other possibilities.
>
> On 08/01/2013 at 21:45, gordom <[hidden email]> wrote:
>
>> In "real" life :-) there are lines starting with letters also. Only
>> these with "Set:" at the beginning should be left, rest is going to be
>> deleted.
> By looking at this sample, I see three more patterns that could get work done.
> 1. It seems that only lower-case letters are in "Set", which is in lines you
> want preserve. So maybe looking for lines containing only numbers and upper-
> case letters will do the trick.
> 2. In this sample, spaces are only in lines you want to preserve. What about
> matching lines without whitespace characters?
> 3. It looks like colon happens only in lines you want to delete. Match lines
> without colons, maybe?
>
>> I'm surprised that there is no simple way to find everything except
>> "Set:.+$"
> Well, regexps are most often used in programming languages or tools like grep,
> where you can easily get "reverse match" mode (print everything that does NOT
> match). But LO is not programming language and it's simple regexp engine is
> simply not sophisticated enough in many cases.
>
> I have read somewhere on TDF wiki about incorporating mature regexp library in
> LO, but this idea was rejected due to portability issues. LO must run on
> Windows and OS X as well as on Linux, whereas library in question runs only on
> Linux.


--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Brian Barker Brian Barker
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by gordom
At 19:58 08/01/2013 +0100, Gordom Noname wrote:

>I would appreciate your help with the regular expressions. I have a
>document consisting of hundreds of lines. A small sample is here:
>
>Set:   01SA34509
>0109SA
>011017B
>01020207B
>010902B
>01090002
>011007B
>01090001
>090110
>Set:   0134501
>011101
>01110102
>01110103
>080908
>Set:   0111679SE
>0111SE
>
>I need to delete all text except these lines started with word
>"Set". If I use "set:.+" regular expression, all these lines, that
>should be kept, are selected. I cant find a way to reverse this
>selection. I tried "[^set:.+].+" and "[^(set:.+)].+" but they don't
>work. Could you please give me any clues?

I think this is fairly simple.  I'm assuming that your "lines" are
actually separate paragraphs, in fact: that they are separated by
paragraph breaks, not line breaks, that is.

o Using Find & Replace with "Regular expressions" ticked, search for
^Set and click Find All. This will select just those words, where
they occur at the start of a line, not the whole lines.
o Click the down-arrow at the right of the Apply Style window in the
Formatting toolbar, and select some (paragraph) style different from
the style of your text (perhaps Heading?).  Since this is a paragraph
style, it will apply to the whole of each relevant line (paragraph),
not just the selected occurrences of the word "Set".
o Back in the Find & Replace dialogue, click "Search for Styles",
choose your original style (perhaps Default?) in the "Search for"
box, and click Find All.
o Press Delete to remove all the unwanted lines.
o Tick "Regular expressions" again, and search for ^$ - replacing
with nothing.  Click Replace All.  This removes the empty paragraphs
left by the previous process.
o Go to Edit | Select All (or press Ctrl+A) and use the Apply Style
window again to reset your paragraph style appropriately (to Default?).

I trust this helps.

Brian Barker



--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Brian Barker Brian Barker
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by Mirosław Zalewski
At 23:39 08/01/2013 +0100, Miroslaw Zalewski wrote:
>On 08/01/2013 at 21:17, Nino Novak wrote:
>>but only if it's really true ;-)
>
>Unless you are 100% sure that my statement is not true, please don't
>question it.
>
>And you are NOT 100% sure until you post LO Writer regexp that will
>use negation pattern that OP is looking for.

This poses a nice paradox.  According to your own stricture, you
shouldn't be criticising Mr Novak unless you can be sure that he
cannot be correct in even questioning your claim.  And again,
according to your same requirement, you cannot be sure that he cannot
be correct to question your claim until you have posted a proof of
your belief - in other words a proof of your original claim.

;^)

Brian Barker



--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

gordom gordom
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

In reply to this post by Brian Barker
W dniu 2013-01-09 07:17, Brian Barker pisze:

> At 19:58 08/01/2013 +0100, Gordom Noname wrote:
>> I would appreciate your help with the regular expressions. I have a
>> document consisting of hundreds of lines. A small sample is here:
>>
>> Set:   01SA34509
>> 0109SA
>> 011017B
>> 01020207B
>> 010902B
>> 01090002
>> 011007B
>> 01090001
>> 090110
>> Set:   0134501
>> 011101
>> 01110102
>> 01110103
>> 080908
>> Set:   0111679SE
>> 0111SE
>>
>> I need to delete all text except these lines started with word "Set".
>> If I use "set:.+" regular expression, all these lines, that should be
>> kept, are selected. I cant find a way to reverse this selection. I
>> tried "[^set:.+].+" and "[^(set:.+)].+" but they don't work. Could you
>> please give me any clues?
>
> I think this is fairly simple.  I'm assuming that your "lines" are
> actually separate paragraphs, in fact: that they are separated by
> paragraph breaks, not line breaks, that is.
>
> o Using Find & Replace with "Regular expressions" ticked, search for
> ^Set and click Find All. This will select just those words, where they
> occur at the start of a line, not the whole lines.
> o Click the down-arrow at the right of the Apply Style window in the
> Formatting toolbar, and select some (paragraph) style different from the
> style of your text (perhaps Heading?).  Since this is a paragraph style,
> it will apply to the whole of each relevant line (paragraph), not just
> the selected occurrences of the word "Set".
> o Back in the Find & Replace dialogue, click "Search for Styles", choose
> your original style (perhaps Default?) in the "Search for" box, and
> click Find All.
> o Press Delete to remove all the unwanted lines.
> o Tick "Regular expressions" again, and search for ^$ - replacing with
> nothing.  Click Replace All.  This removes the empty paragraphs left by
> the previous process.
> o Go to Edit | Select All (or press Ctrl+A) and use the Apply Style
> window again to reset your paragraph style appropriately (to Default?).
>
> I trust this helps.
>
> Brian Barker
>
>
>

It seems to work indeed. Thank you very much :-). Regards,

gordom

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Johnny Rosenberg Johnny Rosenberg
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

2013/1/9 gordom <[hidden email]>:

> W dniu 2013-01-09 07:17, Brian Barker pisze:
>
>> At 19:58 08/01/2013 +0100, Gordom Noname wrote:
>>>
>>> I would appreciate your help with the regular expressions. I have a
>>> document consisting of hundreds of lines. A small sample is here:
>>>
>>> Set:   01SA34509
>>> 0109SA
>>> 011017B
>>> 01020207B
>>> 010902B
>>> 01090002
>>> 011007B
>>> 01090001
>>> 090110
>>> Set:   0134501
>>> 011101
>>> 01110102
>>> 01110103
>>> 080908
>>> Set:   0111679SE
>>> 0111SE
>>>
>>> I need to delete all text except these lines started with word "Set".
>>> If I use "set:.+" regular expression, all these lines, that should be
>>> kept, are selected. I cant find a way to reverse this selection. I
>>> tried "[^set:.+].+" and "[^(set:.+)].+" but they don't work. Could you
>>> please give me any clues?
>>
>>
>> I think this is fairly simple.  I'm assuming that your "lines" are
>> actually separate paragraphs, in fact: that they are separated by
>> paragraph breaks, not line breaks, that is.
>>
>> o Using Find & Replace with "Regular expressions" ticked, search for
>> ^Set and click Find All. This will select just those words, where they
>> occur at the start of a line, not the whole lines.
>> o Click the down-arrow at the right of the Apply Style window in the
>> Formatting toolbar, and select some (paragraph) style different from the
>> style of your text (perhaps Heading?).  Since this is a paragraph style,
>> it will apply to the whole of each relevant line (paragraph), not just
>> the selected occurrences of the word "Set".
>> o Back in the Find & Replace dialogue, click "Search for Styles", choose
>> your original style (perhaps Default?) in the "Search for" box, and
>> click Find All.
>> o Press Delete to remove all the unwanted lines.
>> o Tick "Regular expressions" again, and search for ^$ - replacing with
>> nothing.  Click Replace All.  This removes the empty paragraphs left by
>> the previous process.
>> o Go to Edit | Select All (or press Ctrl+A) and use the Apply Style
>> window again to reset your paragraph style appropriately (to Default?).
>>
>> I trust this helps.
>>
>> Brian Barker
>>
>>
>>
>
> It seems to work indeed. Thank you very much :-). Regards,
>
> gordom

This worked for me with your example lines a minute ago:

Ctrl+h (or whatever method you prefer for opening the Search and
Replace dialogue).
☒ Regular expressions
Search for: ^[^S][^e][^t].*$
Replace with: (leave empty)
Click Replace All

Search for: ^$
Leave everything else as is
Click Replace All.

Done.

The funny thing is that the last part didn't work for me maybe ten
minutes ago, but I must have done something slightly different that
time…


So, in short terms:
1. Replace all ^[^S][^e][^t].*$ with nothing (regular expressions on).
2. Replace all ^$ with nothing (regular expressions still on).
Done.

Step 1 would also erase lines starting with ”set” and ”SET”, so if you
want to keep all possible combinations for the word ”set”, you should
rather try: ^[^Ss][^Ee][^Tt].*$
I didn't try that myself, but it should work. There is always Undo if
it doesn't…


Kind regards

Johnny Rosenberg
ジョニー・ローゼンバーグ

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Johnny Rosenberg Johnny Rosenberg
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

2013/1/9 Johnny Rosenberg <[hidden email]>:

> 2013/1/9 gordom <[hidden email]>:
>> W dniu 2013-01-09 07:17, Brian Barker pisze:
>>
>>> At 19:58 08/01/2013 +0100, Gordom Noname wrote:
>>>>
>>>> I would appreciate your help with the regular expressions. I have a
>>>> document consisting of hundreds of lines. A small sample is here:
>>>>
>>>> Set:   01SA34509
>>>> 0109SA
>>>> 011017B
>>>> 01020207B
>>>> 010902B
>>>> 01090002
>>>> 011007B
>>>> 01090001
>>>> 090110
>>>> Set:   0134501
>>>> 011101
>>>> 01110102
>>>> 01110103
>>>> 080908
>>>> Set:   0111679SE
>>>> 0111SE
>>>>
>>>> I need to delete all text except these lines started with word "Set".
>>>> If I use "set:.+" regular expression, all these lines, that should be
>>>> kept, are selected. I cant find a way to reverse this selection. I
>>>> tried "[^set:.+].+" and "[^(set:.+)].+" but they don't work. Could you
>>>> please give me any clues?
>>>
>>>
>>> I think this is fairly simple.  I'm assuming that your "lines" are
>>> actually separate paragraphs, in fact: that they are separated by
>>> paragraph breaks, not line breaks, that is.
>>>
>>> o Using Find & Replace with "Regular expressions" ticked, search for
>>> ^Set and click Find All. This will select just those words, where they
>>> occur at the start of a line, not the whole lines.
>>> o Click the down-arrow at the right of the Apply Style window in the
>>> Formatting toolbar, and select some (paragraph) style different from the
>>> style of your text (perhaps Heading?).  Since this is a paragraph style,
>>> it will apply to the whole of each relevant line (paragraph), not just
>>> the selected occurrences of the word "Set".
>>> o Back in the Find & Replace dialogue, click "Search for Styles", choose
>>> your original style (perhaps Default?) in the "Search for" box, and
>>> click Find All.
>>> o Press Delete to remove all the unwanted lines.
>>> o Tick "Regular expressions" again, and search for ^$ - replacing with
>>> nothing.  Click Replace All.  This removes the empty paragraphs left by
>>> the previous process.
>>> o Go to Edit | Select All (or press Ctrl+A) and use the Apply Style
>>> window again to reset your paragraph style appropriately (to Default?).
>>>
>>> I trust this helps.
>>>
>>> Brian Barker
>>>
>>>
>>>
>>
>> It seems to work indeed. Thank you very much :-). Regards,
>>
>> gordom
>
> This worked for me with your example lines a minute ago:
>
> Ctrl+h (or whatever method you prefer for opening the Search and
> Replace dialogue).
> ☒ Regular expressions
> Search for: ^[^S][^e][^t].*$
> Replace with: (leave empty)
> Click Replace All
>
> Search for: ^$
> Leave everything else as is
> Click Replace All.
>
> Done.
>
> The funny thing is that the last part didn't work for me maybe ten
> minutes ago, but I must have done something slightly different that
> time…
>
>
> So, in short terms:
> 1. Replace all ^[^S][^e][^t].*$ with nothing (regular expressions on).
> 2. Replace all ^$ with nothing (regular expressions still on).
> Done.
>
> Step 1 would also erase lines starting with ”set” and ”SET”, so if you
> want to keep all possible combinations for the word ”set”, you should
> rather try: ^[^Ss][^Ee][^Tt].*$
> I didn't try that myself, but it should work. There is always Undo if
> it doesn't…
>
>
> Kind regards
>
> Johnny Rosenberg
> ジョニー・ローゼンバーグ

Okay, forget it. This keeps everything that starts with an S, not only
the Set lines…
If that's not at problem, this is a fast way, otherwise it could
require quite some manual work or further Seach and Replace
operations.
I'll give it a few more thoughts…


Kind regards

Johnny Rosenberg
ジョニー・ローゼンバーグ

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Johnny Rosenberg Johnny Rosenberg
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

2013/1/9 Johnny Rosenberg <[hidden email]>:

> 2013/1/9 Johnny Rosenberg <[hidden email]>:
>> 2013/1/9 gordom <[hidden email]>:
>>> W dniu 2013-01-09 07:17, Brian Barker pisze:
>>>
>>>> At 19:58 08/01/2013 +0100, Gordom Noname wrote:
>>>>>
>>>>> I would appreciate your help with the regular expressions. I have a
>>>>> document consisting of hundreds of lines. A small sample is here:
>>>>>
>>>>> Set:   01SA34509
>>>>> 0109SA
>>>>> 011017B
>>>>> 01020207B
>>>>> 010902B
>>>>> 01090002
>>>>> 011007B
>>>>> 01090001
>>>>> 090110
>>>>> Set:   0134501
>>>>> 011101
>>>>> 01110102
>>>>> 01110103
>>>>> 080908
>>>>> Set:   0111679SE
>>>>> 0111SE
>>>>>
>>>>> I need to delete all text except these lines started with word "Set".
>>>>> If I use "set:.+" regular expression, all these lines, that should be
>>>>> kept, are selected. I cant find a way to reverse this selection. I
>>>>> tried "[^set:.+].+" and "[^(set:.+)].+" but they don't work. Could you
>>>>> please give me any clues?
>>>>
>>>>
>>>> I think this is fairly simple.  I'm assuming that your "lines" are
>>>> actually separate paragraphs, in fact: that they are separated by
>>>> paragraph breaks, not line breaks, that is.
>>>>
>>>> o Using Find & Replace with "Regular expressions" ticked, search for
>>>> ^Set and click Find All. This will select just those words, where they
>>>> occur at the start of a line, not the whole lines.
>>>> o Click the down-arrow at the right of the Apply Style window in the
>>>> Formatting toolbar, and select some (paragraph) style different from the
>>>> style of your text (perhaps Heading?).  Since this is a paragraph style,
>>>> it will apply to the whole of each relevant line (paragraph), not just
>>>> the selected occurrences of the word "Set".
>>>> o Back in the Find & Replace dialogue, click "Search for Styles", choose
>>>> your original style (perhaps Default?) in the "Search for" box, and
>>>> click Find All.
>>>> o Press Delete to remove all the unwanted lines.
>>>> o Tick "Regular expressions" again, and search for ^$ - replacing with
>>>> nothing.  Click Replace All.  This removes the empty paragraphs left by
>>>> the previous process.
>>>> o Go to Edit | Select All (or press Ctrl+A) and use the Apply Style
>>>> window again to reset your paragraph style appropriately (to Default?).
>>>>
>>>> I trust this helps.
>>>>
>>>> Brian Barker
>>>>
>>>>
>>>>
>>>
>>> It seems to work indeed. Thank you very much :-). Regards,
>>>
>>> gordom
>>
>> This worked for me with your example lines a minute ago:
>>
>> Ctrl+h (or whatever method you prefer for opening the Search and
>> Replace dialogue).
>> ☒ Regular expressions
>> Search for: ^[^S][^e][^t].*$
>> Replace with: (leave empty)
>> Click Replace All
>>
>> Search for: ^$
>> Leave everything else as is
>> Click Replace All.
>>
>> Done.
>>
>> The funny thing is that the last part didn't work for me maybe ten
>> minutes ago, but I must have done something slightly different that
>> time…
>>
>>
>> So, in short terms:
>> 1. Replace all ^[^S][^e][^t].*$ with nothing (regular expressions on).
>> 2. Replace all ^$ with nothing (regular expressions still on).
>> Done.
>>
>> Step 1 would also erase lines starting with ”set” and ”SET”, so if you
>> want to keep all possible combinations for the word ”set”, you should
>> rather try: ^[^Ss][^Ee][^Tt].*$
>> I didn't try that myself, but it should work. There is always Undo if
>> it doesn't…
>>
>>
>> Kind regards
>>
>> Johnny Rosenberg
>> ジョニー・ローゼンバーグ
>
> Okay, forget it. This keeps everything that starts with an S, not only
> the Set lines…
> If that's not at problem, this is a fast way, otherwise it could
> require quite some manual work or further Seach and Replace
> operations.
> I'll give it a few more thoughts…
>
>
> Kind regards
>
> Johnny Rosenberg
> ジョニー・ローゼンバーグ

Another method:
1. Ctrl+h, Search for: ^Set.*$
2. ☒ Regular expressions, click Search all. Close the dialogue.
3. Ctrl+x Ctrl+a Ctrl+v Ctrl+h
4. Search for: Set
5. Replace with: \nSet
6. ☒ Regular expressions, click Replace all. Close the dialogue.
7. Remove the first line, which now is empty, manually.

Done.
Looks like many steps, but it is quick, actually.


Kind regards

 Johnny Rosenberg
 ジョニー・ローゼンバーグ

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Johnny Rosenberg Johnny Rosenberg
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

Maybe we should do a feature request.
Somewhere in the dialogue, there could be something like this:
☐ Inverse Search/Replace

Leaving it like this, everything should work like usual.

☒ Inverse Search/Replace
Now, Search should find first non-match, Search all should find all
non-matches, Replace should work as usual – that is replace the
currently highlighted text and Replace all should replace all
non-matches.

Just an idea.



Johnny Rosenberg

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Tom Tom
Reply | Threaded
Open this post in threaded view
|

Re: [Writer] Regular expressions

Hi :)
That does sound fairly insane and unlikely to ever be useful.  I like it!!  It would be great to have even if it's just to show-off that LO is ahead of the game in yet another way. 
Regards from
Tom :) 





>________________________________
> From: Johnny Rosenberg <[hidden email]>
>To: LibreOffice Användare <[hidden email]>
>Sent: Wednesday, 9 January 2013, 13:03
>Subject: Re: [libreoffice-users] [Writer] Regular expressions
>
>Maybe we should do a feature request.
>Somewhere in the dialogue, there could be something like this:
>☐ Inverse Search/Replace
>
>Leaving it like this, everything should work like usual.
>
>☒ Inverse Search/Replace
>Now, Search should find first non-match, Search all should find all
>non-matches, Replace should work as usual – that is replace the
>currently highlighted text and Replace all should replace all
>non-matches.
>
>Just an idea.
>
>
>
>Johnny Rosenberg
>
>--
>For unsubscribe instructions e-mail to: [hidden email]
>Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
>Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
>List archive: http://listarchives.libreoffice.org/global/users/
>All messages sent to this list will be publicly archived and cannot be deleted
>
>
>
--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Next » 12