Struggling with Hebrew in LO

classic Classic list List threaded Threaded
21 messages Options
Next » 12
Jonathan Allen Jonathan Allen
Reply | Threaded
Open this post in threaded view
|

Struggling with Hebrew in LO

Dear List,

I'm struggling with using mixed English and Hebrew text in LO.  This is
a fully up-to-date LO 5.1.4.2 in a new installation of Linux Mint.

When I type English text, the letters come out in the right order, but the
punctuation goes at the beginning of the line, until the next letter is typed
which is strange but sort-of-OK in mid-sentence but no good at end of the
paragraph.

Attempting to combine Hebrew and English text in the same sentence, as it
were to say 'shalom' in flight, assembles the language blocks the wrong
way round.  Using Alt-Ctrl-8 and Alt-Ctrl-9 doesn't seem to fix this and
the Shift-Ctrl-D and Alt-Shift combinations are also dysfunctional.  The
font-name (selected as SBL Hebrew) switches once characters are typed to
DejaVu Sans.

This is probably all very familiar to someone (if not all), so anyone
help me to get this working correctly, please?

Jonathan

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

Trever L. Adams-2 Trever L. Adams-2
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

On 10/25/2016 08:43 AM, Jonathan Allen wrote:

> Dear List,
>
> I'm struggling with using mixed English and Hebrew text in LO.  This is
> a fully up-to-date LO 5.1.4.2 in a new installation of Linux Mint.
>
> When I type English text, the letters come out in the right order, but the
> punctuation goes at the beginning of the line, until the next letter is typed
> which is strange but sort-of-OK in mid-sentence but no good at end of the
> paragraph.
>
> Attempting to combine Hebrew and English text in the same sentence, as it
> were to say 'shalom' in flight, assembles the language blocks the wrong
> way round.  Using Alt-Ctrl-8 and Alt-Ctrl-9 doesn't seem to fix this and
> the Shift-Ctrl-D and Alt-Shift combinations are also dysfunctional.  The
> font-name (selected as SBL Hebrew) switches once characters are typed to
> DejaVu Sans.
>
> This is probably all very familiar to someone (if not all), so anyone
> help me to get this working correctly, please?
>
> Jonathan
>
I have never been able to get RTL and LTR languages to mix in the same
sentence properly with any application in Windows or Linux when you
start adding punctuation. In LO you can set the language for a paragraph
and, if I remember correctly, that fixes the punctuation placement
problem. I am sorry for not being certain as it has been a few years
since I wrote any significant quantity of Hebrew.

I have never had any problems with the order of actual words between the
two languages.

In English, I say hello, in Hebrew I say שלום.

שלום, Jonathan.

I am not sure what mailer you are using or how it will render, but LO
and Thunderbird in Linux render the above the same. Period may be
considered out of order in the first. The comma in the second definitely
is. This is with leaving the language at default (en_US-UTF-8 for
myself). Try selecting your paragraphs and changing the language to
Hebrew and see if that helps any.

I cannot comment on the keyboard shortcuts you mentioned as I don't know
what they are supposed to do or where they do work.

Trever


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Trever L. Adams-2 Trever L. Adams-2
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

On 10/25/2016 05:17 PM, Trever L. Adams wrote:

> On 10/25/2016 08:43 AM, Jonathan Allen wrote:
>> Dear List,
>>
>> I'm struggling with using mixed English and Hebrew text in LO.  This is
>> a fully up-to-date LO 5.1.4.2 in a new installation of Linux Mint.
>>
>> When I type English text, the letters come out in the right order, but the
>> punctuation goes at the beginning of the line, until the next letter is typed
>> which is strange but sort-of-OK in mid-sentence but no good at end of the
>> paragraph.
>>
>> Attempting to combine Hebrew and English text in the same sentence, as it
>> were to say 'shalom' in flight, assembles the language blocks the wrong
>> way round.  Using Alt-Ctrl-8 and Alt-Ctrl-9 doesn't seem to fix this and
>> the Shift-Ctrl-D and Alt-Shift combinations are also dysfunctional.  The
>> font-name (selected as SBL Hebrew) switches once characters are typed to
>> DejaVu Sans.
>>
>> This is probably all very familiar to someone (if not all), so anyone
>> help me to get this working correctly, please?
>>
>> Jonathan
>>
> I have never been able to get RTL and LTR languages to mix in the same
> sentence properly with any application in Windows or Linux when you
> start adding punctuation. In LO you can set the language for a
> paragraph and, if I remember correctly, that fixes the punctuation
> placement problem. I am sorry for not being certain as it has been a
> few years since I wrote any significant quantity of Hebrew.
>
Sorry, the option is Format/Character/Font/Language or right click and
character/font/language. It isn't paragraph. As I said, it has been a
few years.

Trever


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

In reply to this post by Jonathan Allen
On 25/10/2016 14:43, Jonathan Allen wrote:

> Attempting to combine Hebrew and English text in the same sentence, as it
> were to say 'shalom' in flight, assembles the language blocks the wrong way round.

Use language specific character styles, and language specific paragraph
styles. Hebrew only, English only. You can't mix English and Hebrew in
the same style. Things will get messed up.

I usually use a different colour, for each paragraph style, and each
character style,. Then, when the document is completed, proof-read,
copy-edited, grammar checked, etc., change the style colours to black.

To avoid the misplaced punctuation issue, don't use the non-dominant
language at either the beginning, or end of the sentence. Ideally, it
won't be the first, or last word, in the correctly punctuated phrase.

jonathon


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

Hi Jonathan:

Welcome to the multi-language war veterans club! Your purple heart is in the mail.

There IS actually a logic as to how characters are laid out when typing mixed L-T-R and R-T-L text in a line or sentence, but you need to understand that most text rendering mechanisms continue to make some unwarranted assumptions. First off, some punctuation in "foreign" (i.e. non-Latin) scripts is taken from what used to be called the lower ASCII characters. This makes some sense, as there is no point in duplicating identical characters that are used for identical purposes. BUT: when the text rendering engines encounter such characters while typing in another script, they decide that the typist is back to using Latin script which, as you've seen, can be disastrous.

With Hebrew Script (used in Hebrew, Yiddish and perhaps others I'm not aware of), there is an additional problem:

Like most languages, Hebrew uses the same set of parentheses as English, and treats "opening" and "closing" as meaning the curve is towards the innards of the set. Since Hebrew is written from right-to-left, however, what is an "opening" paren in Latin scripts is a "closing" paren in Hebrew and vice-versa. This is the reason that the opening paren is above the 9 key for both keyboard layouts, but they face the opposite direction (the "(" is above the "9" key on English keyboards and above the "0" key on many Hebrew keyboards) . Because of the flaw in the way rendering engines recognize these characters (both of which are in the "Latin" set) as indicating a return to English, you lose! Well, you get the idea.

Arabic - another RTL script used in far more languages than Hebrew - leaves the "(" and ")" characters in the same positions they are on English keyboards.

If you go to LibreOffice Bug #92655 (https://bugs.documentfoundation.org/show_bug.cgi?id=92655) I attached a pdf document there titled "General Discussion of Complex Text Attributes" which you can download; this (and particularly page 14 &ff) describes some of the details of the issue you're running into, and Hebrew is one of the specific scripts used as an example.

Jonathon(toki)'s advice given above is spot on! If you understand what's actually going on, and understand the characters to avoid having in certain places and why (that's where I think my document may help you), you can intermingle multiple directions within single lines successfully almost all the time. Like him, I've done this for long enough to be able to get it done, but I also have my own "tricks" to keep my head screwed on straight when composing.

LibreOffice, which like most apps relies on an external rendering engine (I believe it's HarfBuzz now, but am not certain) is affected by this rendering assumption, as you have seen. You may also run into LibreOffice sometimes substituting fonts unnecessarily when you switch, even if you have specifically selected a font containg both scripts/languages you are using. This results from some fonts not properly reporting which scripts and languages they support. So the rendering engines dutifully find a substitute font to use. It's messy. The other document attached to the same bug report discusses many other side effects you'll need to become aware of.

You might also take a look at bug 32357 which deals with auto-completion quirks when using multiple languages.

But it can be done: Best of Luck, and if you have other questions, or discover new tricks, please post them.

Frank
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

On 26/10/2016 18:36, CVAlkan wrote:

> substituting fonts unnecessarily when you switch, even if you have specifically selected a font containing both scripts/languages you are using.

Does the bad font substitution go away, if you manually add foundry data
to the font?

jonathon


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

A very good question but, depending on what you mean specifically by "foundry data" I'm afraid I don't know that it will. My best guess is that it will **improve the chances that it will**, but there are many other factors involved. My understanding of all this is murky at best given the unclear and often conflicting information I could find on the web.

In Linux at least, there is a "thing" (utility? service?) called fc-match that seems to actually decide which fonts are the closest match to the one that doesn't meet the immediate needs of the calling app - whether fc-match is called directly by an app or indirectly through Harfbuzz or other rendering mechanism (it isn't clear to me if, or to what extent, Windows uses this, although Gimp under Windows certainly does).

fc-match is part of Behdad Esfahbod's fontconfig package (see https://en.wikipedia.org/wiki/Behdad_Esfahbod), and it determines the matches (and ranking) according to a multitude of factors exposed/reported by the fonts themselves. Since a number of perfectly lovely fonts are either missing some things, or define them incorrectly or inconsistently, the answer to your question might be: if you were to fix all of those things, you probably wouldn't encounter the unwarranted and unexpected font substitutions.

There are of course two significant "gotchas" in expecting good results from fc-match: The first is while the font itself is actually suitable, it doesn't report some of its capabilities correctly, causing an unecessary quest for a substitute. The second is that, while looking for a good substitute, the fonts being examined don't correctly or consistently report their capabilities. The acronymn we used to use in the early days was GIGO (Garbage In = Garbage Out)

I've long been annoyed that LibreOffice (among other apps, but this is an LO list) doesn't report that such substitutions were made, but I've since discovered the possibility that it might not even have been given that information as feedback from the rendering engine in the first place. The way I confirm these stealth substitutions by the way is to either generate a .pdf or save the file as an .fodt; in either case the actual font being used can be determined from those files.

So: Trying to answer the question you pose has been part of my goal for a while, but I first wanted to come up with a list of fonts that would be suitable for experimentation. There are a number of ways of "looking into" a font to see what's in it, but they are all fairly tedious if you want to compare some arbitrary set of fonts, and involve looking at one font at a time.

I've recently been playing with a shell script I wrote; if I'd known how deep the water was that I was stepping into, the choice of bash would likely have been different, but that's water (pun intended) under the bridge. I give the script multiple command line arguments for the particular scripts I'm interested in combining in a single document and it gives me back a list of all the fonts that are "potentials." Along with each font name, there is a short list of some of the things it has to say about itself. The results so far are rather fascinating, and tend to confirm your suspicion/hope/guess that fixing the fonts may fix the problem.

In no particular order, here are some representative tidbits I've discovered:

1) When looking at fonts containing both Greek and Armenian characters, there are 31 of them installed on my machine, and all of those 31 (all from the FreeFont and DejaVu families) include the appropriate language codes ('el' and 'hi') for this example. BUT: DejaVuSans-ExtraLight.ttf is missing the 0x0559 character from the available character bit map. Not knowing Armenian, I don't know what to make of that, but it's interesting. FreeSerifItalic doesn't report the ISO 15924 script tag 'grek' and FreeMono fails to report the code 'armn'; DejaVuSansMono-Oblique and FreeMonoOblique don't report either 'el' or 'hi'. You can see that answering your question would require some serious experimentation: are there valid reasons some members of these families are slightly different from others, etc? There are more examples like this.

2) I have found two versions of Garamond on my system using this script (I don't believe I would have done that, so I suspect that different apps may have added them not realizing the other was there). Appearance-wise, their glyphs seem at least superficially identical, but what they report is quite different. The base font for one reports it's in a 'Normal' style, while the other says it's 'Regular.' One of these families is clearly superior in what it is reporting as capabilities, so I'll soon be purging the other, but the questions remain: how the heck would I ever have stumbled across this? and what effect(s) might this have had on unexpected font substitution?

3) For coverage of "upper" Unicode planes (i.e. scripts that begin beyond 0xffff and containing such things as ligatures, box drawing characters, complete musical symbols and so forth), none of the utilities I've used seems to report anything. An examination (using FontForge) of some fonts that provide these seem to be constructed correctly, leading me to believe that the underlying utilities may never have been updated to handle extended values, but that's just a guess.

4) Despite the fact that Thai and Laotian character sets, while different and have different Unicode plane assignments, are similar enough that they can be read (though possibly not understood) on either side of the border, I have found no fonts whatever that contain both of these. Since my collection of Thai fonts is rather extensive, I find this odd. If I were ever to mix Thai and Laotian in the same document - which I haven't - my guess is that substitution problems will pop up immediately.

5) The Droid family of fonts, by the way, does NOT contain any Thai characters. Fair enough, as they provide supplemental fonts for other Scripts. For Thai, I have DroidSerifThai-Regular, DroidSerifThai-Bold, and DroidSansThai installed. You would think therefore, that when using DroidSerif-Regular as the font, DroidSerifThai-Regular would be the perfect substitute for text passages containing Thai. For reasons I've yet to track down, however, it isn't even on the list of fonts considered as substitutes; I suspect that since none of these Thai variants report support for the ISO 639-1 Language Code 'th' that's probably a good clue but, since I don't actually use Droid, I haven't pursued that further. (To the OP's original question, there are equivalent Droid Hebrew fonts as well.)

Finally, I have some hesitation in modifying any particular font, since as far as I know they could be overwritten at any time by a helpful app or OS. It seems preferable that if any font errors are found, that they be vetted, confirmed and corrected by the original creating entity. Unfortunately, I'm not sure how all the existing "faulty" versions could ever be rounded up and destroyed.

But - that's another problem. If enough definitive examples are found, perhaps there will be some recognition that there is still work to be done. I know from past postings that you're familiar with this situation, so if there is a way for me to pass my bash script along - assuming you have access to a linux machine (or the Windows 10 bash shell experiment???), let me know; I'd be happy to hear any comments or corrections you might have  ...

Time to stop here: I have a tendency to run on when goaded.

Regards - Frank
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

On 27/10/2016 14:55, CVAlkan wrote:

> answer to your question might be: if you were to fix all of those things,
> you probably wouldn't encounter the unwarranted and unexpected font substitutions.

If it wasn't so big, Unifont would make a good test case.
Maybe I can figure out how to do bulk editing of the meta-data.

> .pdf or save the file as an .fodt; in either case the actual font being used

I'll add that (.fodt) to the list of formats to save documents in.

For one project I'm working on, getting the characters to behave
themselves is a huge problem. So much so, that I've wanted for a macro
that could change the character style of a glyph, according to its
Unicode Code Point.

> font to see what's in it, but they are all fairly tedious if you want to
> compare some arbitrary set of fonts, and involve looking at one font at a time.

Running a script, or set of scripts that export data to a text file,
then looking at the results, is probably the simplest/fastest way to
examine them.

> deep the water was that I was stepping into, the choice of bash would likely

Donald Knuth said that the only way to write software, was to write the
program three times, changing programming languages at least once.

> a short list of some of the things it has to say about itself. The results
> so far are rather fascinating, and tend to confirm your suspicion/hope/guess
> that fixing the fonts may fix the problem.

> ('el' and 'hi') for this example. BUT: DejaVuSans-ExtraLight.ttf is missing
> the 0x0559 character from the available character bit map. Not knowing
> Armenian, I don't know what to make of that, but it's interesting.

The most probably reason for an otherwise complete weight, with other
weights in the same typeface including the missing glyphs, is designer
oversight.

> FreeSerifItalic doesn't report the ISO 15924 script tag 'grek' and FreeMono
> fails to report the code 'armn'; DejaVuSansMono-Oblique and FreeMonoOblique
> don't report either 'el' or 'hi'. You can see that answering your question
> would require some serious experimentation: are there valid reasons some
> members of these families are slightly different from others, etc?

The Unicode Standard allows for some designer leeway, in both how the
glyphs are constructed, and how they interact with each other. That
might be what is happening here.

> 2) I have found two versions of Garamond on my system using this script (I
> don't believe I would have done that, so I suspect that different apps may
> have added them not realizing the other was there).

That gets into font metadata, and a program might look for something
specific, and failing to find it, automatically downloads and installs
the font that contains what it was specifically looking for.

> The base font for one reports it's in a 'Normal' style, while the other says it's 'Regular.'

Those are two different weights. Without looking at them, my guess is
that "Regular" is the inferior looking one.

> questions remain: how the heck would I ever have stumbled across this?

This is where one uses a utility that generates a chart of all installed
fonts in one's system, and then looks at which fonts it claims are
installed, and what they look like.

I've forgotten which font management utility for Linux includes that
functionality.

> that the underlying utilities may never have been updated to handle extended
> values, but that's just a guess.

Look at how they handle the multi-coloured emoji of Unicode 9.0.

OTOH, that is perhaps unfair, because most font utilities for *Nix are
fixated on Unicode 5.0, or earlier.

> If I were ever to mix Thai and Laotian in the same document

They require different IMEs, so most people that put fonts together,
won't combine them in the same typeface.

> my guess is that substitution problems will pop up immediately.

No doubt.

> For reasons I've yet to track down, however, it isn't even on the list of fonts considered as substitutes;

One probably needs to update a database somewhere.

> Finally, I have some hesitation in modifying any particular font, since as
> far as I know they could be overwritten at any time by a helpful app or OS.

This is where running
sudo /usr/share/fonts rm * -R
sudo cp /media/theme/fonts/default/* /usr/share/fonts/*
ensures that only the fonts that one wants are installed.

> It seems preferable that if any font errors are found, that they be vetted,
> confirmed and corrected by the original creating entity.

This depends upon who/what created the font in question.

For some of those US$5K fonts I've seen, even looking for "errors" in
the font is a breach of the license.

>Unfortunately, I'm not sure how all the existing "faulty" versions
could ever be rounded up and destroyed.

You give the corrected font a new, higher version number, and hope that
users will start using the updated font. Trying to destroy faulty
versions, is playing whack a mole.  There are better things to do with
one's time and energy.

> there is a way for me to pass my bash script along - assuming you have
> access to a linux machine (or the Windows 10 bash shell experiment???), let

Is it on GitHub? if so, that might be the easiest.
Otherwise send it as an email attachment.
Box, or dropbox might be easier than email.

I use Linux.  Windows is way to frustrating for me to use.

jonathon


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

I've attempted to upload my shell script using the More button; it is very heavily commented (as much for my own benefit as anything). As a quick trial you can make it executable and then type, for instance:

FindFont thai greek
or
FindFont hindi persian hebrew

The first will list all the fonts that contain characters for both Thai and Greek, and list enough information to see which of those fonts correctly report the needed support. That helps when choosing a font that Writer (hopefully) won't mysteriously replace. Even with the ability to set a CTL language and font, it will be ignored and replaced if the font isn't reporting what it has correctly. With trying to intermingle more than one language/script in a document, CTL hurts far more than it helps.

The second command - giving three arguments - probably won't list anything unless you're have some really beefy fonts, although both FreeFontSerif and its bold counterpart support all three.FindFont.FindFont

The bad news is that the list of languages/scripts is not comprehensive (just ones I've happened to look into); the good news is that they're defined in a "case" statement, so adding some others should be relatively simple.

Frank
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

On 28/10/2016 19:25, CVAlkan wrote:

> I've attempted to upload my shell script using the More button;

Thanks.

> heavily commented (as much for my own benefit as anything).

If a script is not commented, then nobody knows what it is supposed to do.

> That helps when choosing a font that Writer (hopefully) won't mysteriously replace. Even with the ability to set a CTL
> language and font, it will be ignored and replaced if the font isn't reporting what it has correctly.

If the font does correctly report what it has, then it shouldn't be
replaced.

> The bad news is that the list of languages/scripts is not comprehensive
> (just ones I've happened to look into); the good news is that they're
> defined in a "case" statement, so adding some others should be relatively
> simple.

Give me a week or so to play with it.
I'll probably expand it to include most, if not all writing systems
included in Unicode 9.0:

#####

On second thoughts, it might be easier to rewrite this script in Python,
and add a procedure that inspects each glyph.

* ISO 15924 for writing system codes: Both alpha and numeric codes;
* ISO 639-3 for language codes;
* ISO 3166 for country codes;

Create, and use an SQLite database, with one table for each of the
following:
* Writing systems that are officially implemented in Unicode;
* Writing systems that are unofficially implemented in the Private Plane
Area;
* Language or languages associated with each writing system;
* Glyphs that are associated with each writing system;
* Glyphs that are associated with each language;

By using a database, modifications due to Unicode updates is trivial ---
add the new information to the database.

jonathon


--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted

CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

Would love to see it when you have a prototype ...

As I said in the comments, I started with a quick and dirty shell script, having no idea how big it would get, but by the time I realized I should have begun with a "real" language it was a little late.

If I can do any testing and can find the time, I'd be happy to ...

Good Luck.

Frank
CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

Jonathon (and Jonathan): Here is a pdf containing a better explanation of how to HELP avoid unnecessary font substitutions in LibreOffice Writer and other applications. The primary cause of this to be using fonts that either don't have or don't correctly report their coverage and other capabilities. This is particularly annoying when intermingling multiple scripts/languages within a single sentence, paragraph, or document. The pdf also contains an updated bash shell script for helping to select fonts that are appropriate for a given combination of scripts/languages, as well as identifying fonts that should perhaps be retired to that proverbial foundry in the sky and replaced with better ones. Frank Evaluating-fonts-for-multilingual-use.pdf
Dotan Cohen Dotan Cohen
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

In reply to this post by Jonathan Allen
I apologize for not having seen this thread sooner. Here is a document
which explains why you see what you do, and how to work with it rather
than against it. The concepts are really quite simple, but not
intuitive:
http://dotancohen.com/howto/rtl_right_to_left.html

You are invited to contact me at any time with questions.

Some examples of proper mixed Hebrew and English:

Hello, יהונתן, how are you?‎
‫שלום, Jonathan, מה שלומך?

English at the beginning, עברית בסוף.‎
‫עברית בהתחלה, English at the end.

Because plain-text email does not even have a concept of alignment,
the alignment of the Hebrew sentences depends on your renderer (email
client or web browser). Most likely, they will all be left-aligned.
Note however that alignment and directionality are different concepts.
In all cases, the punctuation should be at the proper end of the
sentence. In order to have Hebrew texts right-aligned in email, I
would have to have sent an HTML email. In LibreOffice you shouldn't
have such an issue. LibreOffice, unlike email, has a concept of
alignment.




On Tue, Oct 25, 2016 at 5:43 PM, Jonathan Allen
<[hidden email]> wrote:

> Dear List,
>
> I'm struggling with using mixed English and Hebrew text in LO.  This is
> a fully up-to-date LO 5.1.4.2 in a new installation of Linux Mint.
>
> When I type English text, the letters come out in the right order, but the
> punctuation goes at the beginning of the line, until the next letter is typed
> which is strange but sort-of-OK in mid-sentence but no good at end of the
> paragraph.
>
> Attempting to combine Hebrew and English text in the same sentence, as it
> were to say 'shalom' in flight, assembles the language blocks the wrong
> way round.  Using Alt-Ctrl-8 and Alt-Ctrl-9 doesn't seem to fix this and
> the Shift-Ctrl-D and Alt-Shift combinations are also dysfunctional.  The
> font-name (selected as SBL Hebrew) switches once characters are typed to
> DejaVu Sans.
>
> This is probably all very familiar to someone (if not all), so anyone
> help me to get this working correctly, please?
>
> Jonathan
>
> --
> To unsubscribe e-mail to: [hidden email]
> Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
> Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
> List archive: http://listarchives.libreoffice.org/global/users/
> All messages sent to this list will be publicly archived and cannot be deleted
>



--
Dotan Cohen

http://gibberish.co.il
http://what-is-what.com

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

Dotan:

I'm sorry I never stumbled across your essay before, but thanks for an excellent explanation of how the Unicode® Standard Annex #9/Unicode Bidirectional Algorithm works *in actual practice*!

So far as I can see, your description is still valid for even the recently updated version of that algorithm (http://www.unicode.org/reports/tr9/tr9-35.html). As published, the Unicode Consortium's algorithm really doesn't explain what's happening in a way that would help an average user - one who is just trying to "type" while mixing multiple scripts with opposing directionality - it's more intended for developers.

Unfortunately (in my view anyway), the algorithm itself makes some assumptions that I find unjustifiable. A primary example is the categorization of certain "shared characters" (spaces, punctuation and so forth) as neutral, and accompanying that with the idea that they should therefore take on the directionality of the paragraph unless and until surrounded by characters that clearly define them as one directionality or another.

This seems to be why, for instance, the cursor jumps around mysteriously when entering a multi-word segment of Hebrew or Arabic scripts (regardless of the actual language they are used for) each time a space is encountered (you said "In LibreOffice you shouldn't have such an issue" - true enough, but several remain). It would seem to me that - from a user-interface perspective at least - such characters should keep the directionality of the most recently typed character, leaving the cursor where it was before the space (most common example, but occurs with other such characters) was entered. If the next character is indeed one of the opposite directionality, then make the correction accordingly.

As a matter of principle, assumptions in algorithms always seem risky and/or dangerous. In this case, the whole idea that one needs to set the directionality of characters or phrases ahead of time seems particularly problematic. The obvious counter-argument to this is when beginning a paragraph with a character that isn't in the direction the writer intended, that would need to be treated as a special circumstance.

The ultimate objective would seem to be completely removing any barriers to freely typing in whatever language or script desired without needing to know a lot of special tricks; both Unicode (and UTF-8) and OpenType font technology are big huge steps towards this goal - but we're not quite there yet.

Again, thanks for pointing out your essay!
jonathon-6 jonathon-6
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

In reply to this post by CVAlkan
On 28/10/16 22:26, CVAlkan wrote:

> Would love to see it when you have a prototype ...

Instead of creating a program from scratch, I'm going to modify an
existing one. The downside is that I have to learn another programming
language. Along the way, I'll also have to fix some existing bugs in
that program.

The reason for modifying the existing program, is that it means I have
to write one routine (^1) to display the metadata that it currently reads.

Once I've finished that modification, I'll modify a different tool, to
write valid metadata to each glyph.

I don't have a time frame by which this will be done.

I'm in the midst of two major projects, and one minor project.  However,
one of those projects requires LibO to recognize, and correctly display
different writing systems in the same phrase.
(So this is, in effect, becomes a sub-project of one of those projects.)

^1: That is the theory. In practice, I expect to have to fix half a
dozen bugs, and, maybe implement one or two other functions.

jonathon

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

A couple questions ...

What's the program and language you're looking at? Even though I doubt I could assist, I'd still like to be able to read through what already exists in a never-ending quest to get a handle on this subject (which is far more complex than I originally envisioned, although I'm very impressed with what things have been done since the days of 8 bits).

But consider this my encouragement for whatever that's worth; I'm also willing to experiment with whatever you come up with if you need another pair or eyes.

Regards ...

Frank
jonathon-4 jonathon-4
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

On 24/11/16 10:35, CVAlkan wrote:

> What's the program and language you're looking at?

FontMatrix, and C++, respectively.

> But consider this my encouragement for whatever that's worth; I'm also
> willing to experiment with whatever you come up with if you need another

I cloned it to my GitHub space, and then cloned it on my system here.
I'll merge my changes onto the version on my GitHub space.  That way,
the original stays "pure", but people that want to risk their system
with my programming skills, can try it out.

jonathon

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/users/
All messages sent to this list will be publicly archived and cannot be deleted
Khaled Hosny Khaled Hosny
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

In reply to this post by CVAlkan
CONTENTS DELETED
The author has deleted this message.
CVAlkan CVAlkan
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

Khaled:

Re: "paragraph direction and alignment are two different things." Certainly true (and, from many discussions here and elsewhere, it's obvious to me that anyone who cares about such things already knows this as well) but, I think, misses the primary point, which is to eliminate a bizarre and confusing user interface.

So: Two responses to your comment: "This usually means you didn’t set the paragraph direction and just aligned the paragraph to the right while leaving its direction LTR. No jumping would happen if the paragraph has RTL direction."

First off, I'm basing my own opinions on the idea that following the Unicode Standard in this regard *should be* the objective, since a) it is well thought out and b) results in an interface that is both more intuitive and far easier to use in practice. The reference for that, by the way, is http://www.unicode.org/reports/tr9/tr9-35.html for the official “Unicode® Standard Annex #9: UNICODE BIDIRECTIONAL ALGORITHM.”

So here goes.

To your point about setting the paragraph direction, you are correct. But why should a user need to do so if it is unnecessary? Annex 9 clearly recommends that the *default* paragraph direction should be set to the directionality of the first strongly directional character entered into a that paragraph. This is just my interpretation, of course, but it's bolstered by the fact that it makes life easier. More on the *default* in a bit ...

The Calligra Words and FocusWriter word processors, as well as the gEdit and Kate text editors both act in this manner, so it's not unheard of. Of course, neither word processor has the feature set of Writer, but that's not the scope of this discussion - I will say that, for some complex or extensive entry where intermingling of bidirectional text is required, I will switch to one of those to do the actual typing, and then copy the block to Writer to make use of its other features. In such situations, Writer's behavior is actually annoying.

Secondly, you seem to be assuming that paragraphs run in just one direction or another. For certain use cases, that's reasonable, of course, but as a general rule, that is entirely too limiting. (Think of translators, literary, morphological, and etymological analyses, and so forth).

Far and away the most annoying aspect of this is when initially entering an RTL phrase of more than one word in an otherwise LTR paragraph. Having the cursor jump to the right as each space (a non-directional or "neutral" as Annex 9 calls it) between each RTL word is entered is fun to watch, but certainly not what a typical user would expect.

Annex 9 does not specify this (although I've read some postings suggesting it does). The relevant section says “Generally, NIs [i.e. neutral and isolate formatting characters] take on the direction of the surrounding text. In case of a conflict, they take on the embedding direction.” But, if the user hasn't yet entered any character beyond the space, there is no SURROUNDING text - there is only PRECEDING text. The cursor should stay just where it is unless and until the user enters another LTR character. Of course this doesn't take into account very unusual needs (where the isolate formatting characters are needed), but for typical text entry, this is the most common use case for mixing bidirectional text in a single paragraph.

As a further comment on "No jumping would happen if the paragraph has RTL direction." The same distracting behavior will occur in the opposite direction if an LTR segment is entered into a RTL paragraph. (except when numeric digits are entered, which are mostly LTR even with RTL languages; they seem to be handled independently of other characters in most implementations - but again, that's a distraction from this particular thread).

The original poster also mentioned his struggle with placing the period at the end of a sentence; in a normally LTR paragraph containing bidirectional text that ENDS WITH the non-default directionality I could almost hear him screaming as it took me ages to figure out how to overcome that in Writer, but it seems that interpreting "surrounding" is the culprit here as well. I'm not sure if you've ever explained to a non-technical translator how to insert a zero-width character before, but it can turn into a fascinating conversation - I'd like to see Writer (and, to be fair) many other apps a bit more intuitive to use in such cases.

Editing text in bidirectional paragraphs is a bit different, of course, since the settled layout needs to be disrupted, but the issues there are a bit more involved than entry, so I'll leave well enough alone for the moment.

But thanks for responding; it's good to know that some attention is being bestowed on those (apparently very) few of us who type such things.

Are you, by the way, the "HarfBuzz" Khaled Hosny, or is that a different person?

-Frank
Khaled Hosny Khaled Hosny
Reply | Threaded
Open this post in threaded view
|

Re: Struggling with Hebrew in LO

CONTENTS DELETED
The author has deleted this message.
Next » 12