[libreoffice-l10n] [PROPOSAL] New project for dictionaries

classic Classic list List threaded Threaded
13 messages Options
Dennis Roczek Dennis Roczek
Reply | Threaded
Open this post in threaded view
|

[libreoffice-l10n] [PROPOSAL] New project for dictionaries

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hello community,

I was having some discussions with dictionary maintainers and come to a
completely new idea for a new project!

The TL;DR version: Provide a central place for dictionaries maintainers
including useful tools plus a possibility for easier collaboration.

The long version: LibreOffice uses dictionaries based on Hunspell.
Hunspell is a free, open source project. Many applications do support
Hunspell based dictionaries. The following list (but not
exclusive) of applications supporting Hunspell based dictionaries is
shameless copied from the German Wikipedia:
* AOO
* LibO
* Mozilla products
** Thunderbird
** Seamonkey
** Firefox
* Latex IDEs: WinShell, TexWorks, LyX, Texmaker, TeXstudio, etc.
* Google Chrome
* The Bat
* Emacs
* Opera Web Browsr
* Apple Mac OS X 10.6+
* Adobe InDesign
* Adobe FrameMaker
* SoftMaker Office
* Scribus
and many, many more applications.

Some applications additional provide their own dictionary extension page
to download dictionaries. Moreover depending on the applications, the
dictionary have to be differently packed (simplest example is a
different file extensions)

Some projects do maintain their extension separately (independently).
Also the user who wants to propose a change / an addition has mostly no
easy solution to find out whom to contact to get another word added.

The basis idea is to provide a portal which provides tools to pack the
word lists to dictionaries automatically by scripts and - if possible -
to update and upload the new version of the dictionary to the download
pages. (e.g. the extension centre of AOO / LO or Mozilla's page) The
portal should provide a way for users to help the maintainers and
proposing new words or contact the maintainer if there are errors in the
word list.

The manual work of packaging the extension is very time consuming and
moreover leads to situations that there might be multiple maintained
versions for the same language. By providing tools to create a new
version of the dictionary the maintainer can invest their time in
improving the dictionary and release easily more often a new version.

I'm eager to hear what do you thing about this idea.

Regards,

Dennis Roczek

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXHe28AAoJEM4+Qf3OKrbZciAP/j9iahmzxZ+13CaRDZ/bdNkz
3pPstw8vwRJLam42Re1PXc0BICCN6xm/viLuTHGjyeNQ8/5FpVTrqIX0Wuzd0btc
v8nTb8dLTh7W58sRulKomifyffFfp4GUXHGVQGFE2H6A15o82EMsKnnLmWIx2VXc
tPBwz0+3Em2+FF6B6dDprqfG6NO5Oboi23TVlFDigfINLF93swtb31AbZI9zBlUW
jybWc2rkCLqn3lj4jJVCawvw8/HQTmWFy7qeUxR66mcsSnA+g1AnNbmJiEvoj4mh
yscy+VxQXPcdYbRZOYF3ahT81JqDmJ7jxPIAMFGM6FEVvrZhRsO/huFJ+mrbCKck
qEDYCP+xhCA9JFt6gBTBYcaXvdNR0mnv0JNYSzTnUfMRvprGxEW3FueclcueN+IG
P+ijmr5lBc5rhRZI8waRQMdzPO091f81UkONWEBRYsV+AOH8D8j1BeyF78wWPqII
ZvsMFpF6VuOSknoDNR++EiuwHcXDy9gWy/Kdn9epHHSJdxyGob+m6j0Tnx2y0KpF
+plJZls8p2VhlvlzAXTrtfeBimucY+rWEkdhG288uhed9xvRUNx9EMfPOdbwvrK1
xxTH72VtY7NM9NNPUZDXuL4hb2jzJf4v0VMqP3TzVakkSqMBK7vqthmXI+guIWwW
FRFYtRi9Z3yd9UDnsLR/
=iapJ
-----END PGP SIGNATURE-----

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Michael Bauer Michael Bauer
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

I think the different licenses across products might be a problem. But
not an expert.

On the whole, I'm not sure if the packaging is actually the biggest
stumbling block for the end user but the actual
installation/implementation. Some are very easy, like Mozilla or LO, you
just get the file and it either auto-installs or you click it. Others,
like Trados, you have to find the folder for the dic file, you have to
manually add (if it's not bundled) the locale to an XML file ... and
pray it works. Others, like Chrome, you can't actually add a
spellchecker for a language that Google hates.

Though perhaps if there was a central place on the web for all these so
there's a single place where devs could point their software for
grabbing Hunspell dictionaries that might make things easier. I'd be all
up for that but it would probably end up an exercise in herding cats.
Maybe forking existing dictionaries might be worth considering, if the
owners are either non-communicative or not interested?

Michael

Sgrìobh Dennis Roczek na leanas 25/04/2016 aig 11:13:
> The manual work of packaging the extension is very time consuming and
> moreover leads to situations that there might be multiple maintained
> versions for the same language. By providing tools to create a new
> version of the dictionary the maintainer can invest their time in
> improving the dictionary and release easily more often a new version.
>
> I'm eager to hear what do you thing about this idea.


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Dennis Roczek Dennis Roczek
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Michael,

Am 25.04.2016 um 14:23 schrieb Michael Bauer:
> I think the different licenses across products might be a problem.
> But not an expert.
In comparison to the Apache Foundation the Document Foundation has no
particular open source license mentioned in our statutes. The Document
Liberation Project for example uses many different licenses across the
different libraries.

> On the whole, I'm not sure if the packaging is actually the
> biggest stumbling block for the end user
Well the ideal part is that somebody provides an pre-backed "Binary" /
extension.

> but the actual installation/implementation. Some are very easy,
> like Mozilla or LO, you just get the file and it either
> auto-installs or you click it. Others, like Trados, you have to
> find the folder for the dic file, you have to manually add (if it's
> not bundled) the locale to an XML file ... and pray it works.
> Others, like Chrome, you can't actually add a spellchecker for a
> language that Google hates.
We actually cannot do anything about the different implementations.
But we could provide as easiest packages as possible.

> Though perhaps if there was a central place on the web for all
> these so there's a single place where devs could point their
> software for grabbing Hunspell dictionaries that might make things
> easier. I'd be all up for that but it would probably end up an
> exercise in herding cats. Maybe forking existing dictionaries might
> be worth considering, if the owners are either non-communicative or
> not interested?
If the maintainers are not responsive (or if the dictionary is
abandoned), we actually cannot do anything as long as we have no new
maintainer for a fork. If there is somebody willing to overtake the
development, then forking of that particular dictionary is indeed a
possibility.

> Michael
</snip>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXHirHAAoJEM4+Qf3OKrbZqd0QAKelW3mlAkIC4hsnEyQBSUN3
MkqM0UIAKyobx41VQ8DHUuCXcWMEF/l8d2Kq6kR1rJhqvdhnQNNAp9WUUQxtM2x2
miMmWJMd1N8kerj42ipMHPWl/FQhQGRbKls2lbKKGkSKqAHZS71tR8mOogxHksps
aDYfE3exhsymqDdaM8AnOcZUCXns5L6z5ndXm7gTOGxPjqMFMNXN6dpYw410qLBp
tKa2KKGum2hezB3Ebhpc9iA7YzgNQX6uwV18iwK0GyXxZ3wKpkgElhmggM5zClG5
LXZqk2/uhN7HwXH604LtGZFXM+A0T7a0mOJGkHh9NqNVnj+0iU7alezjBnY+qLp6
SRB2FtYTL+7u5RqxjPdL9ng9oAhUETFanMi6JpoxLf05F6G91K6DsuLg0W2akJy/
pFLJ111Ftw3fpatiloxlSuKNOMMU1IN7h9rJ1Pyp/DDnOqsJRYXKRIueK3NBK0yj
xQDGvIy5YOGQCyA5rshWnOj3ObU2xl1DT4dwB/DUmniQoln+cmHB1w3OLW+b4OEH
LR+5YQfWGx30iNhYgqwn7XvWWJGL2py3hTH28hqZhXYAKFS5wSEBkYO6Q4z2/Ytu
NaCZmffT0xb3OMrOhi2YpMIRKPFopHOa9tdXHRHYOoWT74tcyOBfddmdOt2KDiJ6
theT+Ndw63OkYDQeA5GS
=4G+T
-----END PGP SIGNATURE-----

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Krunose Krunose
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

In reply to this post by Dennis Roczek
25.04.2016 u 12:13, Dennis Roczek je napisao/la:
> The basis idea is to provide a portal which provides tools to pack the
> word lists to dictionaries automatically by scripts and - if possible -
> to update and upload the new version of the dictionary to the download
> pages. (e.g. the extension centre of AOO / LO or Mozilla's page) The
> portal should provide a way for users to help the maintainers and
> proposing new words or contact the maintainer if there are errors in the
> word list.

I think that's a great idea and I look forward to it (although I can do
very little to help).

First time I look for a Hunspell dictionary, I look into Hunspell's own
source code. How stupid of me (or is it?).

People often complain about abandoned dictionaries but having a central
repository could make maintaining easier. There are languages that
haven't been updated for fifteen years. Maintaining dictionaries the way
we do translations could be beneficial for smaller languages. It would
be very helpful to hear about missing or wrong word form in a dictionary.

You can't abandon Wikipedia article, this way you could not abandon
dictionary; there will always be somebody to continue when one losses
interest. It's very hard take over maintaining unmaintained dictionary.

But I see big ones to complain about doing things this way.

Building plugins for other projects also sounds good, it's hard to beg
'Hey, I've updated word list, could you please update that Firefox
plugin you maintain, you haven't done it for two years'.

But again, what will make that Firefox person to actually include that
updated dictionary.

It could be great repository for developing dictionaries and updating
ones i LO, but doubt that cooperations with other project will go the
way it was communicated in initial mail.

Thanks,
Kruno

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Stanislav Horáček Stanislav Horáček
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

In reply to this post by Dennis Roczek
Hi Dennis,

I strongly support your idea. For our Czech language, the dictionary has
not been updated for many years and no one maintains it. So we were
thinking about a webpage where users would give their feedback and
propose new words. If there was an infrastructure for that provided by
TDF, it would be extremely helpful.

Thanks,
Stanislav


Dne 25.4.2016 v 12:13 Dennis Roczek napsal(a):

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> Hello community,
>
> I was having some discussions with dictionary maintainers and come to a
> completely new idea for a new project!
>
> The TL;DR version: Provide a central place for dictionaries maintainers
> including useful tools plus a possibility for easier collaboration.
>
> The long version: LibreOffice uses dictionaries based on Hunspell.
> Hunspell is a free, open source project. Many applications do support
> Hunspell based dictionaries. The following list (but not
> exclusive) of applications supporting Hunspell based dictionaries is
> shameless copied from the German Wikipedia:
> * AOO
> * LibO
> * Mozilla products
> ** Thunderbird
> ** Seamonkey
> ** Firefox
> * Latex IDEs: WinShell, TexWorks, LyX, Texmaker, TeXstudio, etc.
> * Google Chrome
> * The Bat
> * Emacs
> * Opera Web Browsr
> * Apple Mac OS X 10.6+
> * Adobe InDesign
> * Adobe FrameMaker
> * SoftMaker Office
> * Scribus
> and many, many more applications.
>
> Some applications additional provide their own dictionary extension page
> to download dictionaries. Moreover depending on the applications, the
> dictionary have to be differently packed (simplest example is a
> different file extensions)
>
> Some projects do maintain their extension separately (independently).
> Also the user who wants to propose a change / an addition has mostly no
> easy solution to find out whom to contact to get another word added.
>
> The basis idea is to provide a portal which provides tools to pack the
> word lists to dictionaries automatically by scripts and - if possible -
> to update and upload the new version of the dictionary to the download
> pages. (e.g. the extension centre of AOO / LO or Mozilla's page) The
> portal should provide a way for users to help the maintainers and
> proposing new words or contact the maintainer if there are errors in the
> word list.
>
> The manual work of packaging the extension is very time consuming and
> moreover leads to situations that there might be multiple maintained
> versions for the same language. By providing tools to create a new
> version of the dictionary the maintainer can invest their time in
> improving the dictionary and release easily more often a new version.
>
> I'm eager to hear what do you thing about this idea.
>
> Regards,
>
> Dennis Roczek
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iQIcBAEBCAAGBQJXHe28AAoJEM4+Qf3OKrbZciAP/j9iahmzxZ+13CaRDZ/bdNkz
> 3pPstw8vwRJLam42Re1PXc0BICCN6xm/viLuTHGjyeNQ8/5FpVTrqIX0Wuzd0btc
> v8nTb8dLTh7W58sRulKomifyffFfp4GUXHGVQGFE2H6A15o82EMsKnnLmWIx2VXc
> tPBwz0+3Em2+FF6B6dDprqfG6NO5Oboi23TVlFDigfINLF93swtb31AbZI9zBlUW
> jybWc2rkCLqn3lj4jJVCawvw8/HQTmWFy7qeUxR66mcsSnA+g1AnNbmJiEvoj4mh
> yscy+VxQXPcdYbRZOYF3ahT81JqDmJ7jxPIAMFGM6FEVvrZhRsO/huFJ+mrbCKck
> qEDYCP+xhCA9JFt6gBTBYcaXvdNR0mnv0JNYSzTnUfMRvprGxEW3FueclcueN+IG
> P+ijmr5lBc5rhRZI8waRQMdzPO091f81UkONWEBRYsV+AOH8D8j1BeyF78wWPqII
> ZvsMFpF6VuOSknoDNR++EiuwHcXDy9gWy/Kdn9epHHSJdxyGob+m6j0Tnx2y0KpF
> +plJZls8p2VhlvlzAXTrtfeBimucY+rWEkdhG288uhed9xvRUNx9EMfPOdbwvrK1
> xxTH72VtY7NM9NNPUZDXuL4hb2jzJf4v0VMqP3TzVakkSqMBK7vqthmXI+guIWwW
> FRFYtRi9Z3yd9UDnsLR/
> =iapJ
> -----END PGP SIGNATURE-----
>

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Rimas Kudelis Rimas Kudelis
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

In reply to this post by Dennis Roczek
Hello Dennis,

2016-04-25 13:13, Dennis Roczek wrote:
> <...>
>
> The TL;DR version: Provide a central place for dictionaries maintainers
> including useful tools plus a possibility for easier collaboration.
>
> <...>

I think the idea is awesome. One of the programs I localize currently
maintains its own list of dictionary URLs in XML format, and these point
to OOo mirrors, which I suppose are slowly going into oblivion...

Since Hunspell (with a few exceptions, I know) is pretty much the
de-facto spell checker in today's open-source applications (and not just
them), I think it may be beneficial to have a central repository to host
these dictionaries. Perhaps it would even make sense to adopt one of the
package formats as proposed/official, and then begin getting in touch
with application developers, suggesting that they adopt support for it.
Possibilities here are endless, for example, the repository could
(should) provide a generated listing of these dictionaries in some
pre-agreed format, so that application developers could parse it
automatically and allow users to download desired dictionaries and
install them without ever opening their browser. TDF might indeed be a
good candidate to host such repository.

Rimas



--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Olivier Hallot-4 Olivier Hallot-4
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

In reply to this post by Dennis Roczek
Hi Dennis

Em 25/04/2016 07:13, Dennis Roczek escreveu:
> The TL;DR version: Provide a central place for dictionaries maintainers
> including useful tools plus a possibility for easier collaboration.

+1
Everithing I can say is sumarized as: it is a nice hard work among a
"herd of cats".

The easiest part is the technical part. Beware of the service user.

The success of such initiative will depend on the strategy and aproach
to get the support of the dicitionary maintainers, l10n and doc' leaders
and make them use and spread the word about the tool: mutate cats into
wolves. It involves communications and marketing of the service and will
require engagement of resources.

The strategy is also about a migration from existing processes to the
new one. We know a lot about migration issues: solid/old installed base,
user resistance to change, rebelions and even sabotage (derail the
project). So, IMHO,  start with TDF hunspell for LibreOffice in as many
L10n communities as possible. Once the commnunity is engaged,
invite/open other communities (Mozz, Latex, etc...). we did it for
Pootle, and it was not in a snapshot.

Regards

PS> A thesaurus service is welcome in some l10n com'ties.
--
Olivier Hallot
Comunidade LibreOffice
Rio de Janeiro - Brazil - Local time: UTC-03
http://ask.libreoffice.org/pt-br

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

In reply to this post by Rimas Kudelis
On 26/04/2016 06:08, Rimas Kudelis wrote:

> I think it may be beneficial to have a central repository to host these dictionaries.

If Hunspell does not offer a repository for all languages, and The
Document Foundation has the resources to do so, then this is something
that would be extremely useful.

I don't what resources would be required, but my guess is:
* GIT, or something similar, that hosts word lists;
* A BuildBot system, that creates the dictionary extensions/packages/etc
on a weekly/fortnightly/monthly/quarterly builds;
* Automatic uploading of spelling dictionaries to the Dictionary
Extensions host used by LibreOffice;

Somebody has to write _extensive_ documentation on:
* How to create word list;
* How to modify a word list;
* How to upload the wordlist to the repository;
* How third parties can download content in the repository;
And maybe also:
* How to transform the data in the repository to use with other software;

I know that storing the wordlist on Git is well within the realms of
being doable.  However, I have no idea how that is done, and my reading
of books on Git hasn't provided me with any pointers in that direction.

For people who aren't used to versioning systems, Git is a confusing
maze that is utterly incomprehensible. If it weren't for the fact that
there are over 100 books on using Git, in print, I'd suggest that the
first solvable problem is writing a manual for Git that mortals can
understand.  On second thoughts, even with 100 books in print, this type
of manual could be an extremely useful addition.

jonathon

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Michael Bauer Michael Bauer
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

In reply to this post by Dennis Roczek

Sgrìobh Dennis Roczek na leanas 25/04/2016 aig 15:33:
> If the maintainers are not responsive (or if the dictionary is
> abandoned), we actually cannot do anything as long as we have no new
> maintainer for a fork. If there is somebody willing to overtake the
> development, then forking of that particular dictionary is indeed a
> possibility.
A couple of extra thoughts from an active maintainer of a dictionary and
the associated extensions.

In general, I'm interested in anything that brings together scattered
resources and makes stuff more user friendly, be that end users or devs
using it.

- there would have to be some sort of a locale-specific admin system for
active maintainers joining. Crowdsourcing stuff works in big languages
but if folk start submitting random stuff to a locale like Scottish
Gaelic or playing with the affix file, that's a recipe for disaster.

- there are different ways in which people maintain their dictionaries.
Some edit the text file directly, others build from corpora, in our
case, we use scripts to build from a dictionary database, which
automatically generates the .zip/.oxt/.xpi files. One would probably
have to consider different entry points if ones were to successfully
attract people from different projects because I would NOT want to have
to start manually maintaining our file.

- one thing that hasn't been mooted yet would be making some provision
for easily creating new dictionaries for new locales, something that has
a loc-tech threshold.

Michael

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Dennis Roczek Dennis Roczek
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Michael,
Hi toki,

but that's the whole point: getting the actual system easier and include
more contributors.

I have no clear understanding which languages uses which kind of system
and scripts. But as far as I know: many who uses scripts to get the
extensions packed, simply use homegrown / self-made scripts in any
language or simply pack the extension manually.

Do try to get the system running, provide a system (hosted / maintained
by TDF) and everybody would use the same system to create new
dictionaries - it will become superior after time. Additional systems
(e.g. Mozilla based products for the easiest example) can be added later
.

The point about the affix file: I imagined the system more on a much
lower base: Joe Average is computer affine and realized that the system
is based on volunteer work and sends his "customs"/unknown words to the
maintainer (either e.g. a web page, directly integrated within
libreoffice or whatever) and the maintainer (similar to the Language
Administrator in Pootle) decides if it goes in the dictionary or not.

@toki I really hope not that the maintainers do have *NOT* to learn git.
It should be more than an intelligent pootle system. (dunno how that
looks with the affix files, but we will find somebody who can do it - if
we want)

Moreover: that kind of system won't include that much resources (neither
human resource nor server source) to maintain. OTOH we might have
additional many new easy hacks for new developers who want to develop
new "skripts" to create extensions for other plugins or conversations to
other dictionary systems, or the like.

At the moment this is only a discussion if "we" find that useful. If the
"TDF" and volunteers are able to implement that in a technical way, Will
be a completely different story.

Dennis Roczek

Am 27.04.2016 um 23:40 schrieb Michael Bauer:
>
> Sgrìobh Dennis Roczek na leanas 25/04/2016 aig 15:33:
>> If the maintainers are not responsive (or if the dictionary is
>> abandoned), we actually cannot do anything as long as we have no new
>> maintainer for a fork. If there is somebody willing to overtake the
>> development, then forking of that particular dictionary is indeed a
>> possibility.
> A couple of extra thoughts from an active maintainer of a dictionary a
nd
> the associated extensions.
>
> In general, I'm interested in anything that brings together scattered
> resources and makes stuff more user friendly, be that end users or dev
s
> using it.
>
> - there would have to be some sort of a locale-specific admin system f
or
> active maintainers joining. Crowdsourcing stuff works in big languages
> but if folk start submitting random stuff to a locale like Scottish
> Gaelic or playing with the affix file, that's a recipe for disaster.
>
> - there are different ways in which people maintain their dictionaries
.
> Some edit the text file directly, others build from corpora, in our
> case, we use scripts to build from a dictionary database, which
> automatically generates the .zip/.oxt/.xpi files. One would probably
> have to consider different entry points if ones were to successfully
> attract people from different projects because I would NOT want to hav
e
> to start manually maintaining our file.
>
> - one thing that hasn't been mooted yet would be making some provision
> for easily creating new dictionaries for new locales, something that h
as
> a loc-tech threshold.
>
> Michael
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXIUuwAAoJEM4+Qf3OKrbZIQQP/1pKTjanQa3UYDat1zByne6j
YB3kSTu7tkc/DyZJ2x6cutXRW0mtlntGoUiAP/AT4CjwXuZDwt8M16sx94Zl+bXt
9d3mHIhEn8EjrTNi37yjjSeeeahKC1D7gNTTZ2cWoFk2jTWPAlhxldB+f0taBrhI
NxhrrYAiMqGKB6nrTjJlMsgvzaeiuneGtrdNY+KWyUSa4zpr5eJG+kVcu/RB0wHD
i2cQWajV7GkIok/A5rNJVCKmTrKUiWNjPLjIhGd2G+/TWceAj/VnJ9xP9sFyZEZS
EwLeCm3wZMvoGzCXRkogUoNnM4oB1aQ+2Ebwkh1zBq7KwTUpYzkl6hIItOZjzgY/
ol/OhJjt+n7RA0VfhvwshW8C23Al05Q8Be+37ComFPzUhDPvZQWkj+Nt6VYRXnpE
Dw+7m+Yuleoh2kaMYpOdNNJ7tpcBfkCQyl62fGvPoGLQDYRYHxnp9tiW5hXwW8Cp
RFomjsY1SGU7IDPPItkZ9qg/kowyvy3293oDaeARuApetRiDHqw5pv5ib+XrD23e
bcus2v7Py+o7RAbbqpsXwwkqLEoOume5XdjtAe1LVuP7xxHxYFiiLZ10KQprd0i4
d9atT4ePUZ5J+0BFxIZhhbe4kyRVXp9ckDecJ3wCZ8b9lHRLfCAZ8onU2YURC+gQ
GF0mhulb1MyzHkOVJKto
=Auhh
-----END PGP SIGNATURE-----

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

On 27/04/2016 23:30, Dennis Roczek wrote:

> Do try to get the system running, provide a system (hosted / maintained
 by TDF) and everybody would use the same system to create new dictionaries

Make it easy enough, and every group that is working on reducing a
language to writing, will be uploading their word lists here.
(If that occurs, then LibO will definitely need to redesign the language
selection component.)

> libreoffice or whatever) and the maintainer (similar to the Language
> Administrator in Pootle) decides if it goes in the dictionary or not.

This is where automated tools are great.
(I'll ignore  issues such as
the one with Afrikaans, which didn't contain the word "die" for several
years.  This is the definite article in Afrikaans, so it was a pretty
annoying omission.)

> @toki I really hope not that the maintainers do have *NOT* to learn git.
> It should be more than an intelligent pootle system.

The reason I mentioned GIT, is because there was (operative word _was_)
a group of people working on an extension for AOo/LibO to save documents
to GIT. The projected end result would be that all the individual would
have to know about GIT, was to click on this extension to either save,
or retrieve the document from GIT.

> looks with the affix files, but we will find somebody who can do it -

My impression is that there is a python library that takes word lists,
and creates affix files from them.

> If the "TDF" and volunteers are able to implement that in a technical way, Will be a completely different story.

Listing the technical requirements for the must-haves and nice-to-haves.

jonathon


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Dennis Roczek Dennis Roczek
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi toki,

Am 28.04.2016 um 13:01 schrieb toki:
> On 27/04/2016 23:30, Dennis Roczek wrote:
>> @toki I really hope not that the maintainers do have *NOT* to learn g
it.
>> It should be more than an intelligent pootle system.
>
> The reason I mentioned GIT, is because there was (operative word _was_
)
> a group of people working on an extension for AOo/LibO to save documen
ts
> to GIT. The projected end result would be that all the individual woul
d
> have to know about GIT, was to click on this extension to either save,
> or retrieve the document from GIT.
there are still some projects, like the en_US or the en_GB dictionary.

>> looks with the affix files, but we will find somebody who can do it -

>
> My impression is that there is a python library that takes word lists,
> and creates affix files from them.
:-)

>> If the "TDF" and volunteers are able to implement that in a technical
 way, Will be a completely different story.
>
> Listing the technical requirements for the must-haves and nice-to-have
s.
Yeah, I think easiest would be to start to create a wiki page listing
* pro and Cons
* must-have
* nice-to-have
* could be added later
stuff (mostly features). Technical requirement mostly depends on the
implementation (e.g. phyton as mentioned above).

> jonathon
Dennis
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJXIfGBAAoJEM4+Qf3OKrbZGgQP/1d0Lmu13QTnKwQISBJ40hjV
d1xFy4mhRwQiVCgPyeilpcGKhDDK7CvC16Qd5/C6wZKHFYs1sfUIIPq8QshzyWsb
yDt6R4+AWbsa4WDOK3HoqpBvw7R1RwTJ9PZ04Tsl7SB5V/Gn9cMedXQK/HNgUIeZ
FNfmcbj8ufMANK8TCBlgd6ZENkbpb1MoR4FehiAR1D9YmV+jQp8yLRQOeDToT8Cl
Roha+Cd9SFv9IhaahMG6gcQsNa2BQeWQizmN/WmWwkfHEvmZ/cHYl6yjqYAIIgZo
5by63HQKw1g57H0ldMWkVf3x4HOtLlu8J0TkgVfkKKVknEw5A6KnmSkRa745cWlF
vnGPxhRX+joTEMOUYfloFIUDw8HwpzRyRP6Fi2GZH7Ia8RhNfpDtbmLUz66D+Fzl
O7Zu+MnU6RyPdA4LQUE/u2H+Pe4h16ONpQlHBJYs4apL1M5x1Qu2zfPUL8x1KQPW
QljK3YtTBcg9SVwHq2nTUbEwRQBwabHOrO5E0mHHwqhh6qQKgnGZbOpkbUYKF6Fy
osRcu8mlJFaJQRcW+7un7XqXYXXuwavCTb5jIV7OpxcyJjVXq3+L8KaiST6HmMUx
ZRQ/qENR4O0dx3SUOJ7eW52JZqFzgh1YafnOMEIdZUhlRWSkPgQMYBbCJwtk8UcN
7f6WsJJpS1VNiu0NnR3T
=/LED
-----END PGP SIGNATURE-----

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Michael Bauer Michael Bauer
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] [PROPOSAL] New project for dictionaries

In reply to this post by Dennis Roczek
Hi Dennis,

Sgrìobh Dennis Roczek na leanas 28/04/2016 aig 00:30:
> I have no clear understanding which languages uses which kind of system
> and scripts. But as far as I know: many who uses scripts to get the
> extensions packed, simply use homegrown / self-made scripts in any
> language or simply pack the extension manually.
Yes, we use a homegrown script because it has to parcel a very specific
file which gets exported from the dictionary database
> Do try to get the system running, provide a system (hosted / maintained
> by TDF) and everybody would use the same system to create new
> dictionaries - it will become superior after time. Additional systems
> (e.g. Mozilla based products for the easiest example) can be added later
There will always be special cases and if the new centralised system is
to draw in as many as possible, it must allow committing of ready
dic/aff/xpi etc files by people who create their Hunspell stuff in other
ways. There's no way I would ever start maintaining our files on another
platform manually and I would imagine that not many people who have a
dynamic setup like ours would either. We have grown the dic this way
from 500k to 1.5m words this way in 4 years, that would simply not be
feasible in another way (for us).
> The point about the affix file: I imagined the system more on a much
> lower base: Joe Average is computer affine and realized that the system
> is based on volunteer work and sends his "customs"/unknown words to the
> maintainer (either e.g. a web page, directly integrated within
> libreoffice or whatever) and the maintainer (similar to the Language
> Administrator in Pootle) decides if it goes in the dictionary or not.
That might work quite well for a very mature dictionary file or new
locales where there is no existing data that one can draw on.
> Moreover: that kind of system won't include that much resources (neither
> human resource nor server source) to maintain. OTOH we might have
Hah. Nothing that involves spelling is ever easy :) Even if just because
many languages have competing orthographies.

Michael

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted