[libreoffice-l10n] Help text for MIDB

classic Classic list List threaded Threaded
10 messages Options
Jesper Hertel Jesper Hertel
Reply | Threaded
Open this post in threaded view
|

[libreoffice-l10n] Help text for MIDB

The help text for MIDB (
https://help.libreoffice.org/Calc/Text_Functions#MIDB) says this:

"

MIDB

Returns a text string of a DBCS text. The parameters specify the starting
position and the number of characters.

Syntax

MIDB("Text"; Start; Number_bytes)

Text is the text containing the characters to extract.

Start is the position of the first character in the text to extract.

Number_bytes specifies the number of characters MIDB will return from text,
in bytes.

Example

=MIDB("office";2;2) returns ff.

"

But "office" is not a string written in a double byte character set (DBCS,
https://en.wikipedia.org/wiki/DBCS), so the example is not helpful for the
main use of the function.

There should primarily be an example with a string in a language that
actually uses a double byte character set (DBCS), like Chinese. And then
the example would show that only 1 character is returned when asking for 2
bytes, i.e. the number of characters returned will be *half* of the number
of bytes asked for. It should also be noted that if you ask for 3 bytes,
you get 1 character, etc.

The given example only shows the rather special case when you are *not*
giving the function a DBCS string; in this case the number of characters
returned is the *same* as the number of bytes.


Secondly, the sentence "Number_bytes specifies the number of characters
MIDB will return from text, in bytes" is not very clear, because the number
does *not* specify the number of characters. The fact is that if you feed
the function a string in a single byte character set (SBCS,
https://en.wikipedia.org/wiki/SBCS), such as "office", the number of bytes
is the *same* as the number of characters. If you feed the function a
string in a double byte character set (DBCS), such as a string of Chinese
characters, the number of bytes is *double* the amount of characters.


I don't know if this is the right place to report this problem.

The problem is probably also known already, and I kind of expect to get the
reply "oh yeah, we know, the help is a big mess and really needs
improvement.". ;-) But maybe I am wrong, so I am reporting it anyway.


Jesper

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Kevin Suo Kevin Suo
Reply | Threaded
Open this post in threaded view
|

Re: [libreoffice-l10n] Help text for MIDB

A1 = "中国"
B1 = MIDB(A1,1,1) returns ""
B1 = MIDB(A1,1,2) returns "中"
B1 = MIDB(A1,1,3) returns "中"
B1 = MIDB(A1,1,4) returns "中国"

I think it is better up to the localizer to translate this help text according to their needs, for example Japanese team may show how this works with Japanese chars.

Kevin Suo

于 2015年1月19日 GMT+08:00PM9:16:23, Jesper Hertel <[hidden email]> 写到:

>The help text for MIDB (
>https://help.libreoffice.org/Calc/Text_Functions#MIDB) says this:
>
>"
>
>MIDB
>
>Returns a text string of a DBCS text. The parameters specify the
>starting
>position and the number of characters.
>
>Syntax
>
>MIDB("Text"; Start; Number_bytes)
>
>Text is the text containing the characters to extract.
>
>Start is the position of the first character in the text to extract.
>
>Number_bytes specifies the number of characters MIDB will return from
>text,
>in bytes.
>
>Example
>
>=MIDB("office";2;2) returns ff.
>
>"
>
>But "office" is not a string written in a double byte character set
>(DBCS,
>https://en.wikipedia.org/wiki/DBCS), so the example is not helpful for
>the
>main use of the function.
>
>There should primarily be an example with a string in a language that
>actually uses a double byte character set (DBCS), like Chinese. And
>then
>the example would show that only 1 character is returned when asking
>for 2
>bytes, i.e. the number of characters returned will be *half* of the
>number
>of bytes asked for. It should also be noted that if you ask for 3
>bytes,
>you get 1 character, etc.
>
>The given example only shows the rather special case when you are *not*
>giving the function a DBCS string; in this case the number of
>characters
>returned is the *same* as the number of bytes.
>
>
>Secondly, the sentence "Number_bytes specifies the number of characters
>MIDB will return from text, in bytes" is not very clear, because the
>number
>does *not* specify the number of characters. The fact is that if you
>feed
>the function a string in a single byte character set (SBCS,
>https://en.wikipedia.org/wiki/SBCS), such as "office", the number of
>bytes
>is the *same* as the number of characters. If you feed the function a
>string in a double byte character set (DBCS), such as a string of
>Chinese
>characters, the number of bytes is *double* the amount of characters.
>
>
>I don't know if this is the right place to report this problem.
>
>The problem is probably also known already, and I kind of expect to get
>the
>reply "oh yeah, we know, the help is a big mess and really needs
>improvement.". ;-) But maybe I am wrong, so I am reporting it anyway.
>
>
>Jesper
>
>--
>To unsubscribe e-mail to: [hidden email]
>Problems?
>http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
>Posting guidelines + more:
>http://wiki.documentfoundation.org/Netiquette
>List archive: http://listarchives.libreoffice.org/global/l10n/
>All messages sent to this list will be publicly archived and cannot be
>deleted

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Jesper Hertel Jesper Hertel
Reply | Threaded
Open this post in threaded view
|

Fwd: [libreoffice-l10n] Help text for MIDB

2015-01-19 15:32 GMT+01:00 Kevin Suo <[hidden email]>:

> A1 = "中国"
> B1 = MIDB(A1,1,1) returns ""
> B1 = MIDB(A1,1,2) returns "中"
> B1 = MIDB(A1,1,3) returns "中"
> B1 = MIDB(A1,1,4) returns "中国"
>

Thanks for the examples, Kevin! I was afraid they wouldn't go through the
maling list system, so that was why I didn't supply any. But yours are even
better than the ones I would have thought of providing.


> I think it is better up to the localizer to translate this help text
> according to their needs, for example Japanese team may show how this works
> with Japanese chars.
>

I agree that the specific translation is up to the localizers. But even
people using a non-DBCS user interface language, such as English or Danish,
could want to use that function and could want to know what it is about and
how to use it; they could work with Japanese or another DBCS language
without having the user interface in that language. So I still believe the
English text could be improved. Both regarding the earlier mentioned
sentence and regarding the addition of several actual DBCS examples similar
to the good ones you provided. Maybe just worded and expanded like this to
show that the position argument is also counted in bytes and not in
character positions:

MIDB("中国",1,1) returns "" (1 byte is only half a character and it is
therefore discarded).
MIDB("中国",1,2) returns "中" (2 bytes are one complete character).
MIDB("中国",1,3) returns "中" (3 bytes are one character and a half; the last
byte is discarded).
MIDB("中国",1,4) returns "中国" (4 bytes are two complete characters).
MIDB("中国",2,1) returns "" (byte position 2 is not at the beginning of a
character).
MIDB("中国",2,2) returns "" (byte position 2 is not at the beginning of a
character).
MIDB("中国",3,1) returns "" (byte position 3 is at the beginning of a
character, but 1 byte is only half a character and is therefore discarded).
MIDB("中国",3,2) returns "国".


And yes, I do believe that this rather large amount of examples are
necessary to make it completely clear how this rather technical function
works, and that the Help should be the place for such an explanation.

Whether my explanations in parentheses are understandable or relevant I
don't know. It is an attempt to explain what is happening to the
not-so-technical users, but even also to technical users that want to be
sure they understood it right.


Jesper

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Stanislav Horáček Stanislav Horáček
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: [libreoffice-l10n] Help text for MIDB

Hi,

I agree that these examples are really useful. Could you provide also
some examples for the other functions dealing with DBCS (LEFTB, RIGHTB,
LENB)?
If so, I will add them to the Help text.

Thanks!
Stanislav


Dne 19.1.2015 v 16:11 Jesper Hertel napsal(a):

> 2015-01-19 15:32 GMT+01:00 Kevin Suo <[hidden email]>:
>
>> A1 = "中国"
>> B1 = MIDB(A1,1,1) returns ""
>> B1 = MIDB(A1,1,2) returns "中"
>> B1 = MIDB(A1,1,3) returns "中"
>> B1 = MIDB(A1,1,4) returns "中国"
>>
>
> Thanks for the examples, Kevin! I was afraid they wouldn't go through the
> maling list system, so that was why I didn't supply any. But yours are even
> better than the ones I would have thought of providing.
>
>
>> I think it is better up to the localizer to translate this help text
>> according to their needs, for example Japanese team may show how this works
>> with Japanese chars.
>>
>
> I agree that the specific translation is up to the localizers. But even
> people using a non-DBCS user interface language, such as English or Danish,
> could want to use that function and could want to know what it is about and
> how to use it; they could work with Japanese or another DBCS language
> without having the user interface in that language. So I still believe the
> English text could be improved. Both regarding the earlier mentioned
> sentence and regarding the addition of several actual DBCS examples similar
> to the good ones you provided. Maybe just worded and expanded like this to
> show that the position argument is also counted in bytes and not in
> character positions:
>
> MIDB("中国",1,1) returns "" (1 byte is only half a character and it is
> therefore discarded).
> MIDB("中国",1,2) returns "中" (2 bytes are one complete character).
> MIDB("中国",1,3) returns "中" (3 bytes are one character and a half; the last
> byte is discarded).
> MIDB("中国",1,4) returns "中国" (4 bytes are two complete characters).
> MIDB("中国",2,1) returns "" (byte position 2 is not at the beginning of a
> character).
> MIDB("中国",2,2) returns "" (byte position 2 is not at the beginning of a
> character).
> MIDB("中国",3,1) returns "" (byte position 3 is at the beginning of a
> character, but 1 byte is only half a character and is therefore discarded).
> MIDB("中国",3,2) returns "国".
>
>
> And yes, I do believe that this rather large amount of examples are
> necessary to make it completely clear how this rather technical function
> works, and that the Help should be the place for such an explanation.
>
> Whether my explanations in parentheses are understandable or relevant I
> don't know. It is an attempt to explain what is happening to the
> not-so-technical users, but even also to technical users that want to be
> sure they understood it right.
>
>
> Jesper
>

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Jesper Hertel Jesper Hertel
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: [libreoffice-l10n] Help text for MIDB

2015-01-19 21:30 GMT+01:00 Stanislav Horáček <[hidden email]>:

> Hi,
>
> I agree that these examples are really useful. Could you provide also some
> examples for the other functions dealing with DBCS (LEFTB, RIGHTB, LENB)?
> If so, I will add them to the Help text.
>

Hi Stanislav and others,

Here are my suggestions for examples for MIDB, LEFTB, RIGHTB and LENB.

I actually made a spreadsheet in LibreOffice Calc and tested each
expression to be absolutely sure of the results. The spreadsheet I made can
be found at [1]. I made it using the English (US) user interface and locale.

[1]: http://www49.zippyshare.com/v/YbkWBbkZ/file.html

It turned out that invalid requests (half DBCS characters) actually do
*not* result in empty strings but rather in a *space character*.

Therefore these suggested examples and explanations.

The return values are the *actual* return values using the actual mentioned
expressions and were therefore *not* typed by hand (check the spreadsheet
if you want to see how). Note the rather subtle spaces returned.

MIDB("中国",1,0) returns "" (0 bytes is always an empty string).MIDB("中国",1,1)
returns " " (1 byte is only half a DBCS character and therefore the result
is a space character).MIDB("中国",1,2) returns "中" (2 bytes constitute one
complete DBCS character).MIDB("中国",1,3) returns "中 " (3 bytes constitute
one and a half DBCS character; the last byte results in a space
character).MIDB("中国",1,4)
returns "中国" (4 bytes constitute two complete DBCS characters).MIDB("中国",2,1)
returns " " (byte position 2 is not at the beginning of a character in a
DBCS string; 1 space character is returned).MIDB("中国",2,2) returns " "
(byte position 2 points to the last half of the first character in the DBCS
string; the 2 bytes asked for therefore constitutes the last half of the
first character and the first half of the second character in the string; 2
space characters are therefore returned).MIDB("中国",2,3) returns " 国" (byte
position 2 is not at the beginning of a character in a DBCS string; a space
character is returned for byte position 2).MIDB("中国",3,1) returns " " (byte
position 3 is at the beginning of a character in a DBCS string, but 1 byte
is only half a DBCS character and a space character is therefore returned
instead).MIDB("中国",3,2) returns "国" (byte position 3 is at the beginning of
a character in a DBCS string, and 2 bytes constitute one DBCS
character).MIDB("office",2,3)
returns "ffi" (byte position 2 is at the beginning of a character in a
non-DBCS string, and 3 bytes of a non-DBCS string constitute 3 characters).
LEFTB("中国",1) returns " " (1 byte is only half a DBCS character and a space
character is returned instead).LEFTB("中国",2) returns "中" (2 bytes
constitute one complete DBCS character).LEFTB("中国",3) returns "中 " (3 bytes
constitute one DBCS character and a half; the last character returned is
therefore a space character).LEFTB("中国",4) returns "中国" (4 bytes constitute
two complete DBCS characters).LEFTB("office",3) returns "off" (3 non-DBCS
characters each consisting of 1 byte).
RIGHTB("中国",1) returns " " (1 byte is only half a DBCS character and a
space character is returned instead).RIGHTB("中国",2) returns "国" (2 bytes
constitute one complete DBCS character).RIGHTB("中国",3) returns " 国" (3
bytes constitute one half DBCS character and one whole DBCS character; a
space is returned for the first half).RIGHTB("中国",4) returns "中国" (4 bytes
constitute two complete DBCS characters).RIGHTB("office",3) returns "ice"
(3 non-DBCS characters each consisting of 1 byte).
LENB("中") returns "2" (1 DBCS character consisting of 2 bytes).LENB("中国")
returns "4" (2 DBCS characters each consisting of 2 bytes).LENB("office")
returns "6" (6 non-DBCS characters each consisting of 1 byte).


If anyone else is curious, "中国" means China in Chinese – according to
Google Translate :-).


Jesper

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Kevin Suo Kevin Suo
Reply | Threaded
Open this post in threaded view
|

Re:Re: Fwd: [libreoffice-l10n] Help text for MIDB

在2015年01月20 09时25分, "Jesper Hertel"<[hidden email]>写道:
> Here are my suggestions for examples for MIDB, LEFTB, RIGHTB and LENB.

Good job!

> If anyone else is curious, "中国" means China in Chinese – according to
Google Translate :-).

Google Translate is 100% right.





Kevin Suo



--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
naruoga naruoga
Reply | Threaded
Open this post in threaded view
|

Re: Re: Fwd: [libreoffice-l10n] Help text for MIDB

Hi, Kevin, Jesper, *

Sorry, I couldn't catch this discussion, just short comment.

Basically Japanese characters can be expressed double-byte as
Chinese, and some of Japanese characters use 4 bytes (called
"Surrogate Pair"), not a two byte, such as "𠀋" (U+2000B).

I know it's trivial example:

A1 = "𠀋𠀋"
B1 = MIDB(A1,1,1) returns ""
B1 = MIDB(A1,1,2) returns "(*)"
B1 = MIDB(A1,1,3) returns "(*)"
B1 = MIDB(A1,1,4) returns "𠀋"
B1 = MIDB(A1,1,5) returns "𠀋(*)"
B1 = MIDB(A1,1,6) returns "𠀋(*)"
B1 = MIDB(A1,1,7) returns "𠀋(*)"
B1 = MIDB(A1,1,8) returns "𠀋𠀋"

(*) is a special character means that font has no glyph in that codepoint.

I wonder if HELP should describe such a detail, though.


Regards,
--
Naruhiko Ogasawara ([hidden email])

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Yury Yury
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: [libreoffice-l10n] Help text for MIDB

I won't pretend I understood the Chinese and
Japanese cases, however, seems to me ALL this,
or at least the most representative parts,
should go into help, all languages, possibly not
into the specific Basic function but into some
separate subclause ("handling the multi-byte
codings?").
This shouldn't be considered a "duplicate" of
the relevant standards, but an explanation of
what is actually implemented in LO.

On 01/20/2015 04:32 PM, Naruhiko Ogasawara wrote:
...
> I wonder if HELP should describe such a detail, though.

-Yury

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Jesper Hertel Jesper Hertel
Reply | Threaded
Open this post in threaded view
|

Re: Re: Fwd: [libreoffice-l10n] Help text for MIDB

In reply to this post by Kevin Suo
2015-01-20 13:18 GMT+01:00 Kevin Suo <[hidden email]>:

> 在2015年01月20 09时25分, "Jesper Hertel"<[hidden email]>写道:
> > Here are my suggestions for examples for MIDB, LEFTB, RIGHTB and LENB.
> Good job!
>

Thanks!


>
>
> > If anyone else is curious, "中国" means China in Chinese – according to
> Google Translate :-).
>
> Google Translate is 100% right.
>

:-)

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted
Stanislav Horáček Stanislav Horáček
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: [libreoffice-l10n] Help text for MIDB

In reply to this post by Jesper Hertel
Thanks a lot!

I've submitted the suggestions to Gerrit, where anyone is welcome to
comment them:
https://gerrit.libreoffice.org/#/c/14092/

Stanislav

Dne 20.1.2015 v 02:25 Jesper Hertel napsal(a):

>
>
> Here are my suggestions for examples for MIDB, LEFTB, RIGHTB and LENB.
>
> I actually made a spreadsheet in LibreOffice Calc and tested each
> expression to be absolutely sure of the results. The spreadsheet I made
> can be found at [1]. I made it using the English (US) user interface and
> locale.
>
> [1]: http://www49.zippyshare.com/v/YbkWBbkZ/file.html
>
> It turned out that invalid requests (half DBCS characters) actually do
> *not* result in empty strings but rather in a *space character*.
>
> Therefore these suggested examples and explanations.
>
> The return values are the *actual* return values using the actual
> mentioned expressions and were therefore *not* typed by hand (check the
> spreadsheet if you want to see how). Note the rather subtle spaces returned.
>
> MIDB("中国",1,0) returns "" (0 bytes is always an empty string).
> MIDB("中国",1,1) returns " " (1 byte is only half a DBCS character and
> therefore the result is a space character).
> MIDB("中国",1,2) returns "中" (2 bytes constitute one complete DBCS
> character).
> MIDB("中国",1,3) returns "中 " (3 bytes constitute one and a half DBCS
> character; the last byte results in a space character).
> MIDB("中国",1,4) returns "中国" (4 bytes constitute two complete DBCS
> characters).
> MIDB("中国",2,1) returns " " (byte position 2 is not at the beginning of
> a character in a DBCS string; 1 space character is returned).
> MIDB("中国",2,2) returns " " (byte position 2 points to the last half of
> the first character in the DBCS string; the 2 bytes asked for therefore
> constitutes the last half of the first character and the first half of
> the second character in the string; 2 space characters are therefore
> returned).
> MIDB("中国",2,3) returns " 国" (byte position 2 is not at the beginning
> of a character in a DBCS string; a space character is returned for byte
> position 2).
> MIDB("中国",3,1) returns " " (byte position 3 is at the beginning of a
> character in a DBCS string, but 1 byte is only half a DBCS character and
> a space character is therefore returned instead).
> MIDB("中国",3,2) returns "国" (byte position 3 is at the beginning of a
> character in a DBCS string, and 2 bytes constitute one DBCS character).
> MIDB("office",2,3) returns "ffi" (byte position 2 is at the beginning of
> a character in a non-DBCS string, and 3 bytes of a non-DBCS string
> constitute 3 characters).
>
> LEFTB("中国",1) returns " " (1 byte is only half a DBCS character and a
> space character is returned instead).
> LEFTB("中国",2) returns "中" (2 bytes constitute one complete DBCS
> character).
> LEFTB("中国",3) returns "中 " (3 bytes constitute one DBCS character and
> a half; the last character returned is therefore a space character).
> LEFTB("中国",4) returns "中国" (4 bytes constitute two complete DBCS
> characters).
> LEFTB("office",3) returns "off" (3 non-DBCS characters each consisting
> of 1 byte).
>
> RIGHTB("中国",1) returns " " (1 byte is only half a DBCS character and a
> space character is returned instead).
> RIGHTB("中国",2) returns "国" (2 bytes constitute one complete DBCS
> character).
> RIGHTB("中国",3) returns " 国" (3 bytes constitute one half DBCS
> character and one whole DBCS character; a space is returned for the
> first half).
> RIGHTB("中国",4) returns "中国" (4 bytes constitute two complete DBCS
> characters).
> RIGHTB("office",3) returns "ice" (3 non-DBCS characters each consisting
> of 1 byte).
>
> LENB("中") returns "2" (1 DBCS character consisting of 2 bytes).
> LENB("中国") returns "4" (2 DBCS characters each consisting of 2 bytes).
> LENB("office") returns "6" (6 non-DBCS characters each consisting of 1
> byte).
>
>
>
> If anyone else is curious, "中国" means China in Chinese – according to
> Google Translate :-).
>
>
> Jesper

--
To unsubscribe e-mail to: [hidden email]
Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/
Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette
List archive: http://listarchives.libreoffice.org/global/l10n/
All messages sent to this list will be publicly archived and cannot be deleted