Calc: Identifying duplicate cell values with conditional formatting?

classic Classic list List threaded Threaded
6 messages Options
zr1hpc zr1hpc
Reply | Threaded
Open this post in threaded view
|

Calc: Identifying duplicate cell values with conditional formatting?

Hi,

Using LibreOffice 3.3.1 and am in the process of editing a 68000 row, 10
column file.

There are two main columns that contain the data to be cleaned up with
multiple  instances of duplication i.e. the same text but only the text
case differs between two rows or the text is totally different in column
A row 1 and row 2 but the text in column B rows 1 and 2 is identical i.e.

Col A Col B OR Col A Col B
a hx a hx
A hx a hx

OR

a hx a hx
a hy A hy

etc for the other combinations

I am doing the alphabetical sort via Col A.

I can use find to search for the duplicate record row once I know what I
am looking for however determining what test is different when the
values in the Col A are the same and vice versa/

On 136000 cells this is a FAIR mission!

I would like to know if there is a conditional formula I could use that
could highlight the differences in one column when cells in the other
column are the same. I am thinking of a formula that says if the cell
contents are the same as any other cell in a range, apply the
conditional format. Of course this conditional would need to be added
onto all 136000 cells. :(

That way I can highlight the 'error' cells and find them easily and
correct them or add a new row of data.

Any pointers would be appreciated for doing this in Calc as an external
database is not available. What elements of the formula can I investigate?

Many thanks
Hylton
--
========================================================================
Hylton is a Lions Club member of Lions Club of Fish Hoek (District 410A)
http://www.fishhoeklionsclub.org.za being part of the worlds largest NGO
========================================================================

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

"Ing. Jiří Hladůvka" "Ing. Jiří Hladůvka"
Reply | Threaded
Open this post in threaded view
|

Re: Calc: Identifying duplicate cell values with conditional formatting?

1) put into C1  formula =lower(a1)
2) sort te table on c column
3) put to d1 =0
4) put to d2 = if(c2=c1;1;0) and copy the formula to all rows in d column.
5) select d column, copy CTRL+C and paste special back as unformated text
6) rows with value >0 in d columns have duplicit value with the row above

step 5 is important if you want to sort the table for comparing column b

regards,
Jiri

Dne 12.12.2011 15:39, Hylton Conacher (ZR1HPC) napsal(a):

> Hi,
>
> Using LibreOffice 3.3.1 and am in the process of editing a 68000 row, 10
> column file.
>
> There are two main columns that contain the data to be cleaned up with
> multiple  instances of duplication i.e. the same text but only the text
> case differs between two rows or the text is totally different in column
> A row 1 and row 2 but the text in column B rows 1 and 2 is identical i.e.
>
> Col A Col B OR Col A Col B
> a hx a hx
> A hx a hx
>
> OR
>
> a hx a hx
> a hy A hy
>
> etc for the other combinations
>
> I am doing the alphabetical sort via Col A.
>
> I can use find to search for the duplicate record row once I know what I
> am looking for however determining what test is different when the
> values in the Col A are the same and vice versa/
>
> On 136000 cells this is a FAIR mission!
>
> I would like to know if there is a conditional formula I could use that
> could highlight the differences in one column when cells in the other
> column are the same. I am thinking of a formula that says if the cell
> contents are the same as any other cell in a range, apply the
> conditional format. Of course this conditional would need to be added
> onto all 136000 cells. :(
>
> That way I can highlight the 'error' cells and find them easily and
> correct them or add a new row of data.
>
> Any pointers would be appreciated for doing this in Calc as an external
> database is not available. What elements of the formula can I investigate?
>
> Many thanks
> Hylton

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
m.a.riosv m.a.riosv
Reply | Threaded
Open this post in threaded view
|

Re: Calc: Identifying duplicate cell values with conditional formatting?

In reply to this post by zr1hpc
El 12/12/11 15:39, Hylton Conacher (ZR1HPC) escribió:

> Hi,
>
> Using LibreOffice 3.3.1 and am in the process of editing a 68000 row, 10
> column file.
>
> There are two main columns that contain the data to be cleaned up with
> multiple  instances of duplication i.e. the same text but only the text
> case differs between two rows or the text is totally different in column
> A row 1 and row 2 but the text in column B rows 1 and 2 is identical i.e.
>
> Col A Col B OR Col A Col B
> a hx a hx
> A hx a hx
>
> OR
>
> a hx a hx
> a hy A hy
>
> etc for the other combinations
>
> I am doing the alphabetical sort via Col A.
>
> I can use find to search for the duplicate record row once I know what I
> am looking for however determining what test is different when the
> values in the Col A are the same and vice versa/
>
> On 136000 cells this is a FAIR mission!
>
> I would like to know if there is a conditional formula I could use that
> could highlight the differences in one column when cells in the other
> column are the same. I am thinking of a formula that says if the cell
> contents are the same as any other cell in a range, apply the
> conditional format. Of course this conditional would need to be added
> onto all 136000 cells. :(
>
> That way I can highlight the 'error' cells and find them easily and
> correct them or add a new row of data.
>
> Any pointers would be appreciated for doing this in Calc as an external
> database is not available. What elements of the formula can I investigate?
>
> Many thanks
> Hylton

See if this topic can help:
http://user.services.openoffice.org/en/forum/viewtopic.php?f=9&t=45507&p=210325&hilit=duplicate#p210325

Miguel Ángel

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

David S. Crampton David S. Crampton
Reply | Threaded
Open this post in threaded view
|

Re: Calc: Identifying duplicate cell values with conditional formatting?

In reply to this post by zr1hpc
Hylton,

I can offer a few general considerations and thoughts.  Your problem  
statement appears to contain two objectives:

a) to make consistent the case of all the elements; all Upper Case; all  
Lower Case; whatever.  It seems the case makes no difference to the  
meaning of the record. The case is just an artifact of the typing style of  
the data input person(s).  In this situation, a mass application of =UPPER  
or =LOWER and placing the result in a new sheet would suffice.

I think no amount of special formatting will reliably catch your eye to  
eliminate manually all the instances of differing case.

b) the deduplication. Once the case issue is resolved you will have rows  
with all elements exactly equal.  This deduplication should be done by  
exporting the 68000 x 10 sheet to a database and running a deduplication  
query.  Again any of the visual tricks to identify duplicate rows will be  
unreliable and you will be guaranteed to miss at least a few.

At this point my advise bogs down. I have past experience with the M$  
Access product; recent versions of Access have a pre-written deduplication  
query. It is not available to me right now because I have left my  
workplace and don't have an installation of that recent version of M$  
Suite.

Use caution in the deduplication process; make lots of backups. It is a  
delete-type query and data will be lost! Hopefully only the duplicates but  
you never know.

I hope this vague hand waving is of some help to you,

--
David S. Crampton

On Mon, 12 Dec 2011 06:39:31 -0800, Hylton Conacher (ZR1HPC)  
<[hidden email]> wrote:

> Hi,
>
> Using LibreOffice 3.3.1 and am in the process of editing a 68000 row, 10
> column file.
>
> There are two main columns that contain the data to be cleaned up with
> multiple  instances of duplication i.e. the same text but only the text
> case differs between two rows or the text is totally different in column
> A row 1 and row 2 but the text in column B rows 1 and 2 is identical i.e.
>
> Col A Col B OR Col A Col B
> a hx a hx
> A hx a hx
>
> OR
>
> a hx a hx
> a hy A hy
>
> etc for the other combinations
>
> I am doing the alphabetical sort via Col A.
>
> I can use find to search for the duplicate record row once I know what I
> am looking for however determining what test is different when the
> values in the Col A are the same and vice versa/
>
> On 136000 cells this is a FAIR mission!
>
> I would like to know if there is a conditional formula I could use that
> could highlight the differences in one column when cells in the other
> column are the same. I am thinking of a formula that says if the cell
> contents are the same as any other cell in a range, apply the
> conditional format. Of course this conditional would need to be added
> onto all 136000 cells. :(
>
> That way I can highlight the 'error' cells and find them easily and
> correct them or add a new row of data.
>
> Any pointers would be appreciated for doing this in Calc as an external
> database is not available. What elements of the formula can I  
> investigate?
>
> Many thanks
> Hylton

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

--
David S. Crampton
Jay Lozier Jay Lozier
Reply | Threaded
Open this post in threaded view
|

Re: Calc: Identifying duplicate cell values with conditional formatting?

On 12/12/2011 10:33 AM, David S. Crampton wrote:

> Hylton,
>
> I can offer a few general considerations and thoughts.  Your problem
> statement appears to contain two objectives:
>
> a) to make consistent the case of all the elements; all Upper Case;
> all Lower Case; whatever.  It seems the case makes no difference to
> the meaning of the record. The case is just an artifact of the typing
> style of the data input person(s).  In this situation, a mass
> application of =UPPER or =LOWER and placing the result in a new sheet
> would suffice.
>
> I think no amount of special formatting will reliably catch your eye
> to eliminate manually all the instances of differing case.
>
> b) the deduplication. Once the case issue is resolved you will have
> rows with all elements exactly equal.  This deduplication should be
> done by exporting the 68000 x 10 sheet to a database and running a
> deduplication query.  Again any of the visual tricks to identify
> duplicate rows will be unreliable and you will be guaranteed to miss
> at least a few.
>
> At this point my advise bogs down. I have past experience with the M$
> Access product; recent versions of Access have a pre-written
> deduplication query. It is not available to me right now because I
> have left my workplace and don't have an installation of that recent
> version of M$ Suite.
>
> Use caution in the deduplication process; make lots of backups. It is
> a delete-type query and data will be lost! Hopefully only the
> duplicates but you never know.
>
> I hope this vague hand waving is of some help to you,
>
If the data is in database without removing duplicates you can use the
following query

SELECT DISTINCT (fields to be used)
FROM database
WHERE any search criteria - can be omitted

Depending on the database you can export the results as csv or txt or
possibly directly into a spreadsheet. I tend to use MySQL/MariaDB so I
export as csv.

--
Jay Lozier
[hidden email]


--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

TomW TomW
Reply | Threaded
Open this post in threaded view
|

Re: Calc: Identifying duplicate cell values with conditional formatting?

In reply to this post by zr1hpc
On 2011-12-12 09:39, Hylton Conacher (ZR1HPC) wrote:

> Hi,
>
> Using LibreOffice 3.3.1 and am in the process of editing a 68000 row, 10
> column file.
>
> There are two main columns that contain the data to be cleaned up with
> multiple  instances of duplication i.e. the same text but only the text
> case differs between two rows or the text is totally different in column
> A row 1 and row 2 but the text in column B rows 1 and 2 is identical i.e.
>
> Col A Col B OR Col A Col B
> a hx a hx
> A hx a hx
>
> OR
>
> a hx a hx
> a hy A hy
>
> etc for the other combinations
>
> I am doing the alphabetical sort via Col A.
>
> I can use find to search for the duplicate record row once I know what I
> am looking for however determining what test is different when the
> values in the Col A are the same and vice versa/
>
> On 136000 cells this is a FAIR mission!
>
> I would like to know if there is a conditional formula I could use that
> could highlight the differences in one column when cells in the other
> column are the same. I am thinking of a formula that says if the cell
> contents are the same as any other cell in a range, apply the
> conditional format. Of course this conditional would need to be added
> onto all 136000 cells. :(
>
> That way I can highlight the 'error' cells and find them easily and
> correct them or add a new row of data.
>
> Any pointers would be appreciated for doing this in Calc as an external
> database is not available. What elements of the formula can I investigate?
>
> Many thanks
> Hylton

Hylton:

If your columns have labels (or you can add some), try using the
'Filter' option in the 'Data' menu.  Using the 'AutoFilter' gives the
option of filtering the data set by columns.  From your example, you can
filter Col A for 'a' or 'A', then filter Col B and filter the data for
either 'hx' or 'hy'.
This will only show the rows of data that match your filter criteria.

If you use the 'Standard Filter', you can set whether it is case-sensitive.

TomW

--
For unsubscribe instructions e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted