finding duplicates in a column in Calc

classic Classic list List threaded Threaded
4 messages Options
Philip Jackson-2 Philip Jackson-2
Reply | Threaded
Open this post in threaded view
|

finding duplicates in a column in Calc

I sometimes import a load of data into a column and am looking for a way to detect duplicates. The data is mostly alpha but could contain some digits.

At present, I sort the data in the column alphabetically and then manually inspect for duplicates. But with some data, I don't want to do an alpha sort.

I don't find any way in Calc's menus to do this and the online help gives nothing for 'duplicates'. Maybe someone has an idea of a regex that could be used ?

Philip


--
To unsubscribe e-mail to: [hidden email]
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy

remygauthier remygauthier
Reply | Threaded
Open this post in threaded view
|

Re: finding duplicates in a column in Calc

Hello.
You can highlight the column and select Format > Conditional >
Condition > [Cell Value] [is duplicate] (from the drop downs) and
select a style to apply (Warning is probably a good one). This will set
all duplicate entries in that style (if you selected Warning, the text
will be red).

You can also remove duplicates with Data > More Filters > Standard
Filters; if your data is in column A, for example, then you go [Column
A] [=] [Not Empty] (from the drop downs), and then in the options: No
duplications, Copy results to [A cell address].

I hope this helps.
Rémy.

Le jeudi 12 novembre 2020 à 14:55 +0100, Philip Jackson a écrit :
> I sometimes import a load of data into a column and am looking for a way to detect duplicates. The data is mostly alpha but could contain some digits.
>
> At present, I sort the data in the column alphabetically and then manually inspect for duplicates. But with some data, I don't want to do an alpha sort.
>
> I don't find any way in Calc's menus to do this and the online help gives nothing for 'duplicates'. Maybe someone has an idea of a regex that could be used ?
>
> Philip
>
>

--
To unsubscribe e-mail to: [hidden email]
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy
Philip Jackson-2 Philip Jackson-2
Reply | Threaded
Open this post in threaded view
|

Re: finding duplicates in a column in Calc

Thank you, Rémy.
This is the solution I was hoping for. It works without requiring a prior data sort.

I have previously used conditional formatting for other reasons but hadn't noticed the 'duplicate' option therein. Strange that it wasn't mentioned in the Calc online help.

Philip

On 12/11/2020 17:06, Remy Gauthier wrote:

> Hello.
> You can highlight the column and select Format > Conditional >
> Condition > [Cell Value] [is duplicate] (from the drop downs) and
> select a style to apply (Warning is probably a good one). This will set
> all duplicate entries in that style (if you selected Warning, the text
> will be red).
>
> You can also remove duplicates with Data > More Filters > Standard
> Filters; if your data is in column A, for example, then you go [Column
> A] [=] [Not Empty] (from the drop downs), and then in the options: No
> duplications, Copy results to [A cell address].
>
> I hope this helps.
> Rémy.
>
> Le jeudi 12 novembre 2020 à 14:55 +0100, Philip Jackson a écrit :
>> I sometimes import a load of data into a column and am looking for a way to detect duplicates. The data is mostly alpha but could contain some digits.
>>
>> At present, I sort the data in the column alphabetically and then manually inspect for duplicates. But with some data, I don't want to do an alpha sort.
>>
>> I don't find any way in Calc's menus to do this and the online help gives nothing for 'duplicates'. Maybe someone has an idea of a regex that could be used ?

--
To unsubscribe e-mail to: [hidden email]
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy
John Kaufmann John Kaufmann
Reply | Threaded
Open this post in threaded view
|

Re: finding duplicates in a column in Calc

> Strange that it wasn't mentioned in the Calc online help.

Though it /is/ in Calc Guide CG62. (I have not checked for Calc 7, but presume it will be there.) There are other examples of things missing from both the Guides and Help documentation. It is a reminder to us that FOSS documentation, like other common-good endeavors, is written by people - in this case, volunteers - with limited time, and that those volunteers could use some help. Some of that help is provided operationally by good people contributing to good resources like this list and askbot, but it would be nice if every incidence of missing info were noted and led to ameleorative action.

Bugzilla tries to serve that function for the software. Those who do documentation may have something akin to an informal Doczilla or Helpzilla for this purpose but, if so, formalizing it (if doing so would not add to the burden on volunteers) might widen and deepen its impact.

John


On 2020-11-13 09:20, Philip Jackson wrote:

> Thank you, Rémy.
> This is the solution I was hoping for. It works without requiring a prior data sort.
>
> I have previously used conditional formatting for other reasons but hadn't noticed the 'duplicate' option therein. Strange that it wasn't mentioned in the Calc online help.
>
> Philip
>
> On 12/11/2020 17:06, Remy Gauthier wrote:
>> Hello.
>> You can highlight the column and select Format > Conditional >
>> Condition > [Cell Value] [is duplicate] (from the drop downs) and
>> select a style to apply (Warning is probably a good one). This will set
>> all duplicate entries in that style (if you selected Warning, the text
>> will be red).
>>
>> You can also remove duplicates with Data > More Filters > Standard
>> Filters; if your data is in column A, for example, then you go [Column
>> A] [=] [Not Empty] (from the drop downs), and then in the options: No
>> duplications, Copy results to [A cell address].
>>
>> I hope this helps.
>> Rémy.
>>
>> Le jeudi 12 novembre 2020 à 14:55 +0100, Philip Jackson a écrit :
>>> I sometimes import a load of data into a column and am looking for a way to detect duplicates. The data is mostly alpha but could contain some digits.
>>>
>>> At present, I sort the data in the column alphabetically and then manually inspect for duplicates. But with some data, I don't want to do an alpha sort.
>>>
>>> I don't find any way in Calc's menus to do this and the online help gives nothing for 'duplicates'. Maybe someone has an idea of a regex that could be used ?


--
To unsubscribe e-mail to: [hidden email]
Problems? https://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: https://wiki.documentfoundation.org/Netiquette
List archive: https://listarchives.libreoffice.org/global/users/
Privacy Policy: https://www.documentfoundation.org/privacy