Match cell against any of range of regexes?

classic Classic list List threaded Threaded
10 messages Options
doubi doubi
Reply | Threaded
Open this post in threaded view
|

Match cell against any of range of regexes?

Hi all,

Is the operation in the subject line possible?

More info (maybe I'm asking the wrong question):

I have a (long) list of URLs, and want to categorise them. The
categories themselves could be pretty broad: news sites, search engines,
music sites, etc. Similar URLs (different pages on the same site) come
up many times. So what I'd like to do is define regexes to say "If the
URL has 'guardian.co.uk' in it, make it a 'News' site".

Sheet 1 (list to process; I want to insert the categorising function in
col B)
====
A | B
----
google.com
cnn.com/foo
guardian.co.uk/foo
duckduckgo.com
cnn.com/bar/baz
bing.com

Sheet 2 (columns of site name regexes; col A is news sites, B is search
engines, etc)
====
A | B
----
cnn | google
guardian | duckduckgo
  | bing

The SEARCH function doesn't seem to take a range or array as its findtext.

Any ideas on how to go about this?

Many thanks!

--Ryan


--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Jack Jack
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

Ryan Jendoubi wrote:
> Is the operation in the subject line possible?
>
> More info (maybe I'm asking the wrong question):
>
> I have a (long) list of URLs, and want to categorise them. The categories
> themselves could be pretty broad: news sites, search engines, music sites,
> etc. Similar URLs (different pages on the same site) come up many times. So
> what I'd like to do is define regexes to say "If the URL has
> 'guardian.co.uk' in it, make it a 'News' site".

Hi Ryan,
could you possibly upload a sample file to look at?  I have a possible
idea of what to do, but I think the best way for me to check would be
by using an example.

As this mailing list does not support attachments, you would need to
upload it to the internet (something like www.2shared.com or similar).
 Please remove all personal or sensitive data first.

Regards
Stephan

--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

doubi doubi
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

Hi Stephan,

On 30/06/11 15:22, Stephan Zietsman wrote:
> could you possibly upload a sample file to look at?  I have a possible
> idea of what to do, but I think the best way for me to check would be
> by using an example.

Thank you very much for the offer (and the upload site tip).

http://www.2shared.com/file/406vLMbP/URL_categoryMatch_example.html

This is set up pretty much how I'd like it to work (without the looong
list of input on the first sheet).

I wasted half my day Googling this and trying various things, and the
IRC channel has declared it impossible without a macro.

If you've thought of a winning formula I'll be hugely grateful!

Bests,

--Ryan

--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

jorge-2 jorge-2
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

In reply to this post by doubi
Hi:

I would suggest to do as this (If you have order the information as you
explain and show in your e-mail):

1) Sheet 1:

        a) Put "1" (without "") in each search engine but in column B
        b) Put "2" (without "") in each information relative to search engine
in column B too.

Something like this:


> Data
> Clasification
> google
>                                   1
> cnn
>                                   2
> bin
>                                   1
> hotmail
>                                   2
> Sport
>                                   2
> mamma
>                                   1

        c) Use Automatic Filter (Data Menu) and select all "1" Clasification.
Copy and paste in Sheet 2 in column A. Like This:


> Search Engine
> google
> bin
> mamma

        d) Do the same process for "2" and the result copy and paste in Sheet 2
in column B and the final result it could be like this:


> Search Engine
> Information
> google
> cnn
> bin
> hotmail
> mamma
> Sport


I hope you understand me and solve your problem.

Regards,

Jorge Rodríguez

_____________________
 




El jue, 30-06-2011 a las 14:22 +0100, Ryan Jendoubi escribió:

> Hi all,
>
> Is the operation in the subject line possible?
>
> More info (maybe I'm asking the wrong question):
>
> I have a (long) list of URLs, and want to categorise them. The
> categories themselves could be pretty broad: news sites, search engines,
> music sites, etc. Similar URLs (different pages on the same site) come
> up many times. So what I'd like to do is define regexes to say "If the
> URL has 'guardian.co.uk' in it, make it a 'News' site".
>
> Sheet 1 (list to process; I want to insert the categorising function in
> col B)
> ====
> A | B
> ----
> google.com
> cnn.com/foo
> guardian.co.uk/foo
> duckduckgo.com
> cnn.com/bar/baz
> bing.com
>
> Sheet 2 (columns of site name regexes; col A is news sites, B is search
> engines, etc)
> ====
> A | B
> ----
> cnn | google
> guardian | duckduckgo
>   | bing
>
> The SEARCH function doesn't seem to take a range or array as its findtext.
>
> Any ideas on how to go about this?
>
> Many thanks!
>
> --Ryan
>
>

--
Atentamente,

Jorge Rodríguez


--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
jorge-2 jorge-2
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

Hi again:

The e-mail made a change. The final Result is like this:

Search Engine Information
Google CNN
Bin                     Hotmail
Mamma                   Sports

Regards,

Jorge Rodríguez

_________________

>
>
> El jue, 30-06-2011 a las 10:09 -0600, jorge escribió:
> > Hi:
> >
> > I would suggest to do as this (If you have order the information as you
> > explain and show in your e-mail):
> >
> > 1) Sheet 1:
> >
> > a) Put "1" (without "") in each search engine but in column B
> > b) Put "2" (without "") in each information relative to search engine
> > in column B too.
> >
> > Something like this:
> >
> >
> > > Data
> > > Clasification
> > > google
> > >                                   1
> > > cnn
> > >                                   2
> > > bin
> > >                                   1
> > > hotmail
> > >                                   2
> > > Sport
> > >                                   2
> > > mamma
> > >                                   1
> >
> > c) Use Automatic Filter (Data Menu) and select all "1" Clasification.
> > Copy and paste in Sheet 2 in column A. Like This:
> >
> >
> > > Search Engine
> > > google
> > > bin
> > > mamma
> >
> > d) Do the same process for "2" and the result copy and paste in Sheet 2
> > in column B and the final result it could be like this:
> >
> >
> > > Search Engine
> > > Information
> > > google
> > > cnn
> > > bin
> > > hotmail
> > > mamma
> > > Sport
> >
> >
> > I hope you understand me and solve your problem.
> >
> > Regards,
> >
> > Jorge Rodríguez
> >
> > _____________________
> >  
> >
> >
> >
> >
> > El jue, 30-06-2011 a las 14:22 +0100, Ryan Jendoubi escribió:
> > > Hi all,
> > >
> > > Is the operation in the subject line possible?
> > >
> > > More info (maybe I'm asking the wrong question):
> > >
> > > I have a (long) list of URLs, and want to categorise them. The
> > > categories themselves could be pretty broad: news sites, search engines,
> > > music sites, etc. Similar URLs (different pages on the same site) come
> > > up many times. So what I'd like to do is define regexes to say "If the
> > > URL has 'guardian.co.uk' in it, make it a 'News' site".
> > >
> > > Sheet 1 (list to process; I want to insert the categorising function in
> > > col B)
> > > ====
> > > A | B
> > > ----
> > > google.com
> > > cnn.com/foo
> > > guardian.co.uk/foo
> > > duckduckgo.com
> > > cnn.com/bar/baz
> > > bing.com
> > >
> > > Sheet 2 (columns of site name regexes; col A is news sites, B is search
> > > engines, etc)
> > > ====
> > > A | B
> > > ----
> > > cnn | google
> > > guardian | duckduckgo
> > >   | bing
> > >
> > > The SEARCH function doesn't seem to take a range or array as its findtext.
> > >
> > > Any ideas on how to go about this?
> > >
> > > Many thanks!
> > >
> > > --Ryan
> > >
> > >
> >
>

--
Atentamente,

Jorge Rodríguez


--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
doubi doubi
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

Hi Jorge,

On 30/06/11 17:16, jorge wrote:
> El jue, 30-06-2011 a las 10:09 -0600, jorge escribió:
>>> I would suggest to do as this (If you have order the information as you
>>> explain and show in your e-mail):
>>>
>>> 1) Sheet 1:
>>>
>>> [...]

Thanks for your explanation. I get what you were saying, but what I'm
looking for really is a way to avoid having to look at each and every
URL in column A and in column B say which category they're in, whether
that's indicated with a number or with the word 'news', 'search engine' etc.

I don't think I explained the problem very well to begin with,
especially with my poor attempt at representing sheets with text (I
forgot I could switch Thunderbird's mail editor to not use rich text :s)

You might get a better idea of what I'm after by looking at the example
file (
http://www.2shared.com/file/406vLMbP/URL_categoryMatch_example.html )
but of course you've already spend time writing two emails, for which
I'm very grateful :-)

Bests,

--Ryan

--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Jack Jack
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

In reply to this post by doubi
Ryan Jendoubi wrote:
> This is set up pretty much how I'd like it to work (without the looong list
> of input on the first sheet).
>
> I wasted half my day Googling this and trying various things, and the IRC
> channel has declared it impossible without a macro.
>
> If you've thought of a winning formula I'll be hugely grateful!

I've had a look at the spreadsheet, and tried a few formulas.  It
seems that to do this with formulas is quite tricky and unfortunately
I couldn't get it done.  I tried a few different approaches and I
think it is technically possible.  However, I also think that it would
take quite some time to write such a formula (if it is indeed
possible).

As was mentioned before, it can be done with macros.  Would you
consider using a macro instead?

Regards
Stephan

--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Ryan Jendoubi-2 Ryan Jendoubi-2
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

Hi Stephan,

On 01/07/11 10:00, Stephan Zietsman wrote:
> I've had a look at the spreadsheet, and tried a few formulas.  It
> seems that to do this with formulas is quite tricky and unfortunately
> I couldn't get it done.  I tried a few different approaches and I
> think it is technically possible.  However, I also think that it would
> take quite some time to write such a formula (if it is indeed
> possible).
>
> As was mentioned before, it can be done with macros.  Would you
> consider using a macro instead?

Continued thanks for all your efforts. If I could whip up a macro like
buttered toast I'd do so, but unfortunately for me that would first
involve giving myself a grounding in writing bespoke LO macros. I can
however write a bit of Perl, and I've started trying to tackle the
problem that way :-) Although I think I'll have to finish the current
batch by hand for now, because unlike writing code at least that has a
definite end point...

LO macros would definitely be a good skill to learn eventually, even if
it's not at the top of my todo list at the moment. I haven't yet
searched for any tutorials or the like, but I'd be interested to hear if
you have any personal favourites which you think would suit someone who
knows general programming and just needs an intro to the syntax,
conventions and frameworks / objects involved in LO scripting.

That said, I've noticed there are at least four different languages one
can use. I don't know if any of them would allow writing scripts which
are broadly cross-compatible with other office suites (even if a little
tweaking is needed) but my preference would be for that, so I could help
non-LO-using friends and such.

Thanks again and best regards,

--Ryan

--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Tom Tom
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

Hi :)
The documentation team have access to a lot of good guides.  Jean sent me this
about macros

" ...
good resources forpeople who want serious info on macros. (Shameless plug for
Andrew)

http://www.pitonyak.org/oo.php
http://www.pitonyak.org/book/
"

ODFauthors are considering how to best approach writing documentation that is
relevant to both OOo and LO and identically the same in both.  Obviously some
guides need a lot of modifications (such as just deleting chunks where OOo is
lagging behind and has not added the functionality developed by LO devs) but
apparently macros are likely to develop along an identically same path for both
projects.  Andrew Pitonyak's guide is apparently the best for macros so it's
worth bookmarking or downloading or something.


Regards from
Tom :)



________________________________
From: Ryan Jendoubi <[hidden email]>
To: [hidden email]
Cc: Ryan Jendoubi <[hidden email]>
Sent: Fri, 1 July, 2011 10:28:37
Subject: Re: [libreoffice-users] Match cell against any of range of regexes?

Hi Stephan,

On 01/07/11 10:00, Stephan Zietsman wrote:
> I've had a look at the spreadsheet, and tried a few formulas.  It
> seems that to do this with formulas is quite tricky and unfortunately
> I couldn't get it done.  I tried a few different approaches and I
> think it is technically possible.  However, I also think that it would
> take quite some time to write such a formula (if it is indeed
> possible).
>
> As was mentioned before, it can be done with macros.  Would you
> consider using a macro instead?

Continued thanks for all your efforts. If I could whip up a macro like buttered
toast I'd do so, but unfortunately for me that would first involve giving myself
a grounding in writing bespoke LO macros. I can however write a bit of Perl, and
I've started trying to tackle the problem that way :-) Although I think I'll
have to finish the current batch by hand for now, because unlike writing code at
least that has a definite end point...

LO macros would definitely be a good skill to learn eventually, even if it's not
at the top of my todo list at the moment. I haven't yet searched for any
tutorials or the like, but I'd be interested to hear if you have any personal
favourites which you think would suit someone who knows general programming and
just needs an intro to the syntax, conventions and frameworks / objects involved
in LO scripting.

That said, I've noticed there are at least four different languages one can use.
I don't know if any of them would allow writing scripts which are broadly
cross-compatible with other office suites (even if a little tweaking is needed)
but my preference would be for that, so I could help non-LO-using friends and
such.

Thanks again and best regards,

--Ryan

-- Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Jack Jack
Reply | Threaded
Open this post in threaded view
|

Re: Match cell against any of range of regexes?

In reply to this post by Ryan Jendoubi-2
Ryan Jendoubi wrote:

> LO macros would definitely be a good skill to learn eventually, even if it's
> not at the top of my todo list at the moment. I haven't yet searched for any
> tutorials or the like, but I'd be interested to hear if you have any
> personal favourites which you think would suit someone who knows general
> programming and just needs an intro to the syntax, conventions and
> frameworks / objects involved in LO scripting.
>
> That said, I've noticed there are at least four different languages one can
> use. I don't know if any of them would allow writing scripts which are
> broadly cross-compatible with other office suites (even if a little tweaking
> is needed) but my preference would be for that, so I could help non-LO-using
> friends and such.

The "standard" scripting language used in LO is StarBasic (as far as I
know), which is very much like VBA (visual basic for applications).  I
actually learned the language while writing VBA macros in MS Excel.
Unfortunately I don't know of a good manual/tutorial for either.

I personally dislike the Basic language, but it is useful sometimes.
I'm not enthusiastic about it, but I think sticking to Basic would be
the more "cross-compatible".  I've imported a few Excel files into LO,
and the VBA macros imported just fine (minor modifications are done to
the code when imported).  I have no idea about the other direction
though (LO file to Excel).

Regards
Stephan

--
Unsubscribe instructions: E-mail to [hidden email]
In case of problems unsubscribing, write to [hidden email]
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted