Can't track flow of characters in from Input Method Editor

classic Classic list List threaded Threaded
12 messages Options
Richard Wordingham Richard Wordingham
Reply | Threaded
Open this post in threaded view
|

Can't track flow of characters in from Input Method Editor

On Sunday I raised bug report 94753 about the apparent generation of
lone surrogates in response to the use of Keyman for Linux under ibus
as the input method editor. I have compiled Version 4.4.4.3.0+ with
debug to facilitate my investigation; I think my compiler (gcc Version
4.6.3) is too old to compile Version 5.0, which is where I noticed the
problem.

I use emacs as an IDE for debugging, but Emacs Version 24 does not seem
able to cope with Version 4.4.4.3.0+.  The debugger gdb run from the
terminal appears to be able to cope.  I have been trying to narrow down
the source of the error by inserting fprintf() calls.  However, I cannot
find where characters enter the program from the IME.  I am running
Ubuntu 12.04 with the default desktop.  The IME is KMfL running under
ibus.

I set up fprintf() and abort() calls to monitor the apparent sole call
of XmbLookupString (there are no visible calls of XwcLookupString) and
also within the call of SalKDEDisplay::checkdirectInputEvent().
However, inputting text from the Supplementary Multilingual Plane using
the IME to input characters generates neither output from the fprintf()
calls nor a core dump from abort().  Have I overlooked another route by
which characters are reaching the program?

My current suspicion is that Qt is not handling KMfL's replacement of
one supplementary character by another properly, but I cannot
demonstrate that.  My test input text sequence is the three characters
dYH, which when applied to an instrumented program using X generates
the characters U+1148F, U+114C0, U+0008 (also as symbol), U+114BF.  I
suspect that U+0008 is only cancelling the low surrogate of U+114C0,
and that this is happening in Qt code. I have seen similar behaviour
with Konsole, which I believe is a Qt application.  Claws mail,
Gnome-terminal, Emacs Version 24, gedit, Abiword and even LibreOffice
Calc all exhibit receipt of the correct sequence of characters, namely
<U+1148F, U+114BF>.  (Some of these do not display it properly, but
that is another issue.)

Richard.
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Miklos Vajna-4 Miklos Vajna-4
Reply | Threaded
Open this post in threaded view
|

Re: Can't track flow of characters in from Input Method Editor

Hi Richard,

Writer "main text" gets all keyboard input in SwEditWin::KeyInput(),
sw/source/uibase/docvw/edtwin.cxx. It's VCL that calls that member
function, and in your case it's probably the VCL KDE backend in
particular.

Regards,

Miklos

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice

signature.asc (188 bytes) Download Attachment
Jan-Marek Glogowski Jan-Marek Glogowski
Reply | Threaded
Open this post in threaded view
|

Re: Can't track flow of characters in from Input Method Editor

In reply to this post by Richard Wordingham
Am 06.10.2015 um 23:51 schrieb Richard Wordingham:
> On Sunday I raised bug report 94753 about the apparent generation of
> lone surrogates in response to the use of Keyman for Linux under ibus
> as the input method editor. I have compiled Version 4.4.4.3.0+ with
> debug to facilitate my investigation; I think my compiler (gcc Version
> 4.6.3) is too old to compile Version 5.0, which is where I noticed the
> problem.

...

> I am running Ubuntu 12.04 with the default desktop.

I guess you're running Kubuntu 12.04, as you talk about KDE in this post.

LO 5.0 builds just fine in Precise / 12.04. See
https://launchpad.net/~libreoffice/+archive/ubuntu/ppa?field.series_filter=precise
for newer packages.

There is also a 12.04 / precise Tinderbox building the libreoffice-5-0
branch with kde4:
http://tinderbox.libreoffice.org/libreoffice-5-0/status.html

> The IME is KMfL running under ibus.

We also had problems with Qt4 / all KDE applications and ibus. At the
end we backported the 14.04 / trusty version of fcitx and use this
currently :-(

Jan-Marek
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Mark Hung Mark Hung
Reply | Threaded
Open this post in threaded view
|

Re: Can't track flow of characters in from Input Method Editor

In reply to this post by Richard Wordingham
Hi ,

Since you mentioned that Writer exhibit the problem but Calc doesn't,you
might want to take a look at sw/source/core/doc/extinput.cxx.

If you're looking for IME code from vcl, I suggest that you grep InputContext as a keyword.



2015-10-07 5:51 GMT+08:00 Richard Wordingham <[hidden email]>:
On Sunday I raised bug report 94753 about the apparent generation of
lone surrogates in response to the use of Keyman for Linux under ibus
as the input method editor. I have compiled Version 4.4.4.3.0+ with
debug to facilitate my investigation; I think my compiler (gcc Version
4.6.3) is too old to compile Version 5.0, which is where I noticed the
problem.

I use emacs as an IDE for debugging, but Emacs Version 24 does not seem
able to cope with Version 4.4.4.3.0+.  The debugger gdb run from the
terminal appears to be able to cope.  I have been trying to narrow down
the source of the error by inserting fprintf() calls.  However, I cannot
find where characters enter the program from the IME.  I am running
Ubuntu 12.04 with the default desktop.  The IME is KMfL running under
ibus.

I set up fprintf() and abort() calls to monitor the apparent sole call
of XmbLookupString (there are no visible calls of XwcLookupString) and
also within the call of SalKDEDisplay::checkdirectInputEvent().
However, inputting text from the Supplementary Multilingual Plane using
the IME to input characters generates neither output from the fprintf()
calls nor a core dump from abort().  Have I overlooked another route by
which characters are reaching the program?

My current suspicion is that Qt is not handling KMfL's replacement of
one supplementary character by another properly, but I cannot
demonstrate that.  My test input text sequence is the three characters
dYH, which when applied to an instrumented program using X generates
the characters U+1148F, U+114C0, U+0008 (also as symbol), U+114BF.  I
suspect that U+0008 is only cancelling the low surrogate of U+114C0,
and that this is happening in Qt code. I have seen similar behaviour
with Konsole, which I believe is a Qt application.  Claws mail,
Gnome-terminal, Emacs Version 24, gedit, Abiword and even LibreOffice
Calc all exhibit receipt of the correct sequence of characters, namely
<U+1148F, U+114BF>.  (Some of these do not display it properly, but
that is another issue.)

Richard.
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice



--
Mark Hung

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Richard Wordingham Richard Wordingham
Reply | Threaded
Open this post in threaded view
|

Re: Can't track flow of characters in from Input Method Editor

Thank you all for your inputs.

On Wed, 7 Oct 2015 09:57:14 +0200
Miklos Vajna <[hidden email]> wrote:

> Writer "main text" gets all keyboard input in SwEditWin::KeyInput(),
> sw/source/uibase/docvw/edtwin.cxx. It's VCL that calls that member
> function, and in your case it's probably the VCL KDE backend in
> particular.

On Wed, 7 Oct 2015 22:20:01 +0800
Hung Mark <[hidden email]> wrote:

> Since you mentioned that Writer exhibit the problem but Calc
> doesn't,you might want to take a look at
> sw/source/core/doc/extinput.cxx.

SwEditWin::KeyInput() is receiving the input not generated by the IME,
e.g. Latin and Thai as I have my keyboards set up, but the normal
character input generated by the IME (BMP Tai Tham and SMP Tirhuta) is
going to SwExtTextInput::SetInputData instead!  Backspaces generated by
hitting the 'rubout' key (labelled with a right-to-left arrow) follow
the non-IME route.  I do not yet know what happens to backspaces
generated by the IME.

On Wed, 07 Oct 2015 11:10:08 +0200
Jan-Marek Glogowski <[hidden email]> wrote:

> I guess you're running Kubuntu 12.04, as you talk about KDE in this
> post.

The KDE code was a red herring.  The characters are coming in from the
basic X system via GtkSalFrame::signalKey, as one would expect for a
primarily Gnome system, despite the graphical shell being Unity.  So,
it's basically Ubuntu.

> LO 5.0 builds just fine in Precise / 12.04. See
> https://launchpad.net/~libreoffice/+archive/ubuntu/ppa?field.series_filter=precise
> for newer packages.

I'll give it another try.  Pre-release versions obtained via Git
wouldn't compile.

> We also had problems with Qt4 / all KDE applications and ibus. At the
> end we backported the 14.04 / trusty version of fcitx and use this
> currently :-(

I hope we haven't got a race condition.  I don't understand the order
of my monitoring outputs.  I was able to run LibreOffice under gdb
running from Emacs Version 23, whereas the combination failed under
Emacs 24.  (The two Emacsen use different interfaces to gdb,
which may be the reason for the difference.)  However, not only was I
not able to set a break point where I wanted (probably my lack of
competence), I could not reproduce the error.  I got no lone
surrogate!  This better behaviour has not been reproduced.

Richard.
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Richard Wordingham Richard Wordingham
Reply | Threaded
Open this post in threaded view
|

Re: Can't track flow of characters in from Input Method Editor

On Thu, 8 Oct 2015 01:17:14 +0100
Richard Wordingham <[hidden email]> wrote:

> Thank you all for your inputs.

I've finally found where the problem materialises.  There is a callback
of GtkSalFrame::IMHandler::signalIMDeleteSurrounding() to delete one
'character'.  I now need to work out where the interfacing is in
error.  The intent of the call is to delete one Unicode character; it
is now a question of where the conversion from Unicode characters to
code units should be made.  It might be anywhere from KMfL to
signalIMDeleteSurrounding().  For hacking, there is the good news that
when KMfL decides to delete two Unicode characters, there are two calls
of the function, so I could fix *my* problem straightforwardly.

Does this appear to relate to any other known problems in interfacing
with ibus?

Richard.
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Caolán McNamara Caolán McNamara
Reply | Threaded
Open this post in threaded view
|

Re: Can't track flow of characters in from Input Method Editor

On Thu, 2015-10-08 at 08:52 +0100, Richard Wordingham wrote:
> The intent of the call is to delete one Unicode character;

I imagine you need to change signalIMDeleteSurrounding where we have
nDeletePos = nPosition + offset and
nDeleteEnd = nDeletePos + nchars
and instead of adding "offset" and adding "nchars" you need to call
getText on xText to get the string, then use
OUString::iterateCodePoints to count forward from nPosition by "offset"
IM codepoints to get the utf-16 offset for LibreOffice, and similarly
iterateCodePoints by IM nchars to get the LibreOffice utf-16 nchars to
delete.

might suck rocks for performance.

C.
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Richard Wordingham Richard Wordingham
Reply | Threaded
Open this post in threaded view
|

Re: Can't track flow of characters in from Input Method Editor

On Thu, 08 Oct 2015 10:18:15 +0100
Caolán McNamara <[hidden email]> wrote:

> On Thu, 2015-10-08 at 08:52 +0100, Richard Wordingham wrote:
> > The intent of the call is to delete one Unicode character;

On reading the GTK documentation, it is clear that the arguments are
in terms of Unicode characters, and not UTF-16 code units.

> I imagine you need to change signalIMDeleteSurrounding where we have
> nDeletePos = nPosition + offset and
> nDeleteEnd = nDeletePos + nchars
> and instead of adding "offset" and adding "nchars" you need to call
> getText on xText to get the string, then use
> OUString::iterateCodePoints to count forward from nPosition by
> "offset" IM codepoints to get the utf-16 offset for LibreOffice, and
> similarly iterateCodePoints by IM nchars to get the LibreOffice
> utf-16 nchars to delete.
>
> might suck rocks for performance.

I can't fathom how getText() works - obfuscation by abstraction!
However, as using OUString::iterateCodePoints would appear to involve,
at the very least, copying a long string, I have coded up a similar
function that works directly with the 'editable accessible' string
(and associated data).  I have added a patch to the bug report
https://bugs.documentfoundation.org/show_bug.cgi?id=94753 .

Richard

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Richard Wordingham Richard Wordingham
Reply | Threaded
Open this post in threaded view
|

Version of gcc for LibreOffice

In reply to this post by Jan-Marek Glogowski
On Wed, 07 Oct 2015 11:10:08 +0200
Jan-Marek Glogowski <[hidden email]> wrote:
(when topic was 'Can't track flow of characters in from Input Method
Editor')

> Am 06.10.2015 um 23:51 schrieb Richard Wordingham:
> > I think my compiler (gcc
> > Version 4.6.3) is too old to compile Version 5.0, which is where I
> > noticed the problem.
>
> ...
>
> > I am running Ubuntu 12.04 with the default desktop.

> LO 5.0 builds just fine in Precise / 12.04. See
> https://launchpad.net/~libreoffice/+archive/ubuntu/ppa?field.series_filter=precise
> for newer packages.

OK. I found a tar ball for 5.0.2.2 which *does* build on Ubuntu 12.04.

However, when I try building from 'trunk' (or whatever its called)
pulling in the source via git, compilation still fails, just as (well,
one line number's changed) happened just over three months ago
(https://ask.libreoffice.org/en/question/52435/what-version-of-gcc-do-i-need-to-build-libreoffice/
).  I did not get a usable answer then.

In response to my example patch at
https://bugs.documentfoundation.org/show_bug.cgi?id=94753 , I've been
told to use gerrit to discuss patch proposals.  Presumably I should at
least confirm that my patches compile in the developing form of
LibreOffice.  So, what version of gcc do I need to build LibreOffice?
Or is there a bug in include/rtl/ustring.hxx?  I don't know C++ well
enough to understand the problem.

Richard.
_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Markus Mohrhard Markus Mohrhard
Reply | Threaded
Open this post in threaded view
|

Re: Version of gcc for LibreOffice

Hey Richard,

On Sat, Oct 10, 2015 at 3:45 AM, Richard Wordingham <[hidden email]> wrote:
On Wed, 07 Oct 2015 11:10:08 +0200
Jan-Marek Glogowski <[hidden email]> wrote:
(when topic was 'Can't track flow of characters in from Input Method
Editor')

> Am 06.10.2015 um 23:51 schrieb Richard Wordingham:
> > I think my compiler (gcc
> > Version 4.6.3) is too old to compile Version 5.0, which is where I
> > noticed the problem.
>
> ...
>
> > I am running Ubuntu 12.04 with the default desktop.

> LO 5.0 builds just fine in Precise / 12.04. See
> https://launchpad.net/~libreoffice/+archive/ubuntu/ppa?field.series_filter=precise
> for newer packages.

OK. I found a tar ball for 5.0.2.2 which *does* build on Ubuntu 12.04.

However, when I try building from 'trunk' (or whatever its called)
pulling in the source via git, compilation still fails, just as (well,
one line number's changed) happened just over three months ago
(https://ask.libreoffice.org/en/question/52435/what-version-of-gcc-do-i-need-to-build-libreoffice/
).  I did not get a usable answer then.

In response to my example patch at
https://bugs.documentfoundation.org/show_bug.cgi?id=94753 , I've been
told to use gerrit to discuss patch proposals.  Presumably I should at
least confirm that my patches compile in the developing form of
LibreOffice.  So, what version of gcc do I need to build LibreOffice?
Or is there a bug in include/rtl/ustring.hxx?  I don't know C++ well
enough to understand the problem.


You need at least gcc 4.7, we are even moving to gcc 4.8 but I'm not sure if we already require any feature that is only in 4.8.

Regards,
Markus

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
dk122 dk122
Reply | Threaded
Open this post in threaded view
|

Re: Version of gcc for LibreOffice

IIUC, the discussion was about compiling LibreOffice master when ...

Quoting "Markus Mohrhard" <[hidden email]>:
> You need at least gcc 4.7, we are even moving to gcc 4.8 but I'm not sure
> if we already require any feature that is only in 4.8.

It happens that as a user of debian-wheezy, which supplies gcc 4.7.2,
I can sadly report that several .cxx files within LibreOffice provoke
the compiler to complain ...

     internal compiler error: Segmentation fault.  Please submit a full
     bug report ...

At least some of the failing source lines had a OUStringLiteral
parameterized by a static const sal_Unicode.

My solution was to build gcc 5.2.0 from source and to this and its
libraries to build LibreOffice.  To run the program in wheezy, I need
to set LD_LIBRARY_PATH to find the 5.2.0 libraries.

Additionally, include/canvas/propertysethelper.hxx:52 provoked
gcc 4.7.2 to say ...

     error: '<::' cannot begin a template-argument list

HTH,
Terry.

_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice
Michael Stahl-2 Michael Stahl-2
Reply | Threaded
Open this post in threaded view
|

Re: Version of gcc for LibreOffice

In reply to this post by Markus Mohrhard
On 10.10.2015 08:52, Markus Mohrhard wrote:

> On Sat, Oct 10, 2015 at 3:45 AM, Richard Wordingham
> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     OK. I found a tar ball for 5.0.2.2 which *does* build on Ubuntu 12.04.
>
>     However, when I try building from 'trunk' (or whatever its called)
>     pulling in the source via git, compilation still fails, just as (well,
>     one line number's changed) happened just over three months ago
>     (https://ask.libreoffice.org/en/question/52435/what-version-of-gcc-do-i-need-to-build-libreoffice/
>     ).  I did not get a usable answer then.
>
>
> You need at least gcc 4.7, we are even moving to gcc 4.8 but I'm not
> sure if we already require any feature that is only in 4.8.

GCC 4.6 won't work on master for several months now, unfortunately
nobody updated the configure check for the gcc version until today
(commit e4b24c7399352642aeaeb412394d6230562d04d2)


_______________________________________________
LibreOffice mailing list
[hidden email]
http://lists.freedesktop.org/mailman/listinfo/libreoffice