Kerning and Ligatures using Layout Engine

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Kerning and Ligatures using Layout Engine

Keith Stribley
I am interested in getting clig,liga,mark,mkmk,kern OpenType tables to
be processed by the OpenJDK layout engine for the Myanmar code block.
Currently Unicode 5.1 Myanmar fonts cannot be used with Java AWT/Swing.

I noticed that the layout  engine code in OpenJDK is essentially an old
version of the ICU layout engine and ICU is capable of rendering Myanmar
Unicode 5.1 compliant fonts such as Myanmar3 and Padauk correctly.

The first step was to make sun.font.FontManager.isComplexCharCode()
return true for the Myanmar range. However, I then needed to modify the
sun.font.GlyphLayout.EngineRecord. This has an eflags fields which is
passed to ICU.
I'm not quite sure why 0x4 is used as the value when there are marks, I
believe it corresponds to "no canonical processing", though I don't know
why that is needed. More seriously, this does not trigger ICU kerning or
ligatures.
this.eflags needs to be set to 0x3 for this. 1=kerning, 2=ligatures (see
http://www.icu-project.org/apiref/icu4c/classLayoutEngine.html#cee4ea27f3211be215ea9b9bd3a91c32)

My question is therefore, why aren't kerning and ligatures turned on, at
least for complex scripts. I've noticed that with Latin text that if you
set TextAttribute.KERNING and TextAttribute.LIGATURES ligatures work for
non-complex text e.g. ffi with DoulosSIL, but if you have a mark in the
text, ligatures stop working, though the mark attaches correctly. I
would therefore have thought that there is little to be lost from using
eflags = 0x3 in all the cases where eflags is set. I guess there might
be a slight speed drop, but is it still significant these days? Is there
a specific reason why kerning and ligatures haven't been enabled in ICU
when used in the JDK? Does it have some unexpected side affect?

Currently EngineRecord only sets eflags for NON_SPACING_MARK,
ENCLOSING_MARK, COMBINING_SPACING_MARK. At the moment, this isn't
sufficient for Burmese since the character properties in the jdk haven't
been updated to Unicode 5.1, hence I enabled it for the whole code block
in my test build.

For reference, Myanmar fonts are available at:
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=Padauk
http://myanmarnlpteam.blogspot.com/2007/08/download-links.html
http://www.mymyanmar.net/2g/

(Another Myanmar font, Parabaik uses OpenType rlig, which ICU doesn't
process for this code block without further code changes).

There is a possible patch below, which displays Unicode 5.1 Myanmar
correctly with Padauk, MyMyanmar Unicode and Myanmar3 fonts when used
with the methods TextLayout.draw, drawString and drawChars in
Font2DTest. Some attached marks get lost with Padauk using
TextLayout.getOutline+draw.

I would appreciate feedback on whether to submit this as a patch purely
for the Myanmar script or whether eflags should be changed more generally.

Regards,
Keith Stribley

--- ./jdk/src/share/classes/sun/font/GlyphLayout.java.orig    2008-05-29
15:01:33.000000000 +0100
+++ ./jdk/src/share/classes/sun/font/GlyphLayout.java    2008-05-29
23:13:26.000000000 +0100
@@ -644,11 +644,15 @@
                     ch = toCodePoint((char)ch,_textRecord.text[++i]);
// inc
                 }
                 int gc = getType(ch);
+                if (script == 28) { // Myanmar - see LEScripts.h
+                    this.eflags = 0x3;// 1=kerning, 2=ligatures
+                    break;
+                }
                 if (gc == NON_SPACING_MARK ||
                     gc == ENCLOSING_MARK ||
                     gc == COMBINING_SPACING_MARK) { // could do range
test also
 
-                    this.eflags = 0x4;
+                    this.eflags = 0x4; // 4 = no canonical processing,
but would 0x3 be better?
                     break;
                 }
             }
--- ./jdk/src/share/classes/sun/font/FontManager.java.orig    2008-05-28
12:46:03.000000000 +0100
+++ ./jdk/src/share/classes/sun/font/FontManager.java    2008-05-29
21:33:31.000000000 +0100
@@ -3594,6 +3594,12 @@
             // 0E00 - 0E7F if Thai, assume shaping for vowel, tone marks
             return true;
         }
+        else if (code < 0x1000) {
+            return false;
+        }
+        else if (code < 0x10A0) { // 1000-109F Myanmar
+            return true;
+        }
         else if (code < 0x1780) {
             return false;
         }

Reply | Threaded
Open this post in threaded view
|

Re: Kerning and Ligatures using Layout Engine

phil.race
I can take a stab at this but it probably needs the ICU developer to confirm
what's happening there.

Keith Stribley wrote:
> I am interested in getting clig,liga,mark,mkmk,kern OpenType tables to

Those are not tables. Those are features in the OpenType GSUB and GPOS tables.

> be processed by the OpenJDK layout engine for the Myanmar code block.
> Currently Unicode 5.1 Myanmar fonts cannot be used with Java AWT/Swing.
>
> I noticed that the layout  engine code in OpenJDK is essentially an old
> version of the ICU layout engine and ICU is capable of rendering Myanmar
> Unicode 5.1 compliant fonts such as Myanmar3 and Padauk correctly.

FYI: it was "current" at the point in JDK 6 development when it was integrated.
JDK 7 will get an updated version in due course.

>
> The first step was to make sun.font.FontManager.isComplexCharCode()
> return true for the Myanmar range. However, I then needed to modify the
> sun.font.GlyphLayout.EngineRecord. This has an eflags fields which is
> passed to ICU.
> I'm not quite sure why 0x4 is used as the value when there are marks, I
> believe it corresponds to "no canonical processing", though I don't know
> why that is needed.

I think you have this backwards. 0x4 means do canonical processing
and its there for performance. ie if its not set then we can skip
a lot of work. I don't recall (at all) how much that was but I
suspect it was significant.

> More seriously, this does not trigger ICU kerning or
> ligatures.
> this.eflags needs to be set to 0x3 for this. 1=kerning, 2=ligatures (see
> http://www.icu-project.org/apiref/icu4c/classLayoutEngine.html#cee4ea27f3211be215ea9b9bd3a91c32)
>

No, I believe that comes from _typo_flags.

> My question is therefore, why aren't kerning and ligatures turned on, at
> least for complex scripts. I've noticed that with Latin text that if you
> set TextAttribute.KERNING and TextAttribute.LIGATURES ligatures work for
> non-complex text e.g. ffi with DoulosSIL, but if you have a mark in the
> text, ligatures stop working, though the mark attaches correctly. I
> would therefore have thought that there is little to be lost from using
> eflags = 0x3 in all the cases where eflags is set. I guess there might
> be a slight speed drop, but is it still significant these days? Is there
> a specific reason why kerning and ligatures haven't been enabled in ICU
> when used in the JDK? Does it have some unexpected side affect?

I think the basic reason is compatibility of text advance.
Text that is rendered through drawString() and text that is rendered
via TextLayout() should be the same.
So optional ligatures and kerning need to be requested by those
who know they want them.

You might then ask but why not at least do this for complex
scripts where text has to go through layout and mandatory ligatures
are performed. I would have to dig to be sure what actually happens
in ICU, but one scenario is mixed script text. Eg some latin followed
by some complex script. If the optional ligatures were performed by
layout and you are in say a text editor and delete the complex
text leaving only the latin text it would look odd if the optional
ligatures no longer formed and if kerning stopped being applied.

However if you are pointing out that even when specifying
TextAttribute.KERNING and TextAttribute.LIGATURES that they do not
get applied, then that would seem like a bug. But my reading of
the code is that that the request for kerning and ligatures is
not held in "eflags" but in "_typo_flags" and the value
passed down to layout is  "_typo_flags | eflags"

As far as I can see your patch is equivalent to always
adding the TextAttribute.KERNING and TextAttribute.LIGATURES
as attributes on these two fonts (no JDK source code changes
needed). Is that what you see?

>
> Currently EngineRecord only sets eflags for NON_SPACING_MARK,
> ENCLOSING_MARK, COMBINING_SPACING_MARK.

That is I believe for performance.

At the moment, this isn't

> sufficient for Burmese since the character properties in the jdk haven't
> been updated to Unicode 5.1, hence I enabled it for the whole code block
> in my test build.
>
> For reference, Myanmar fonts are available at:
> http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=Padauk
> http://myanmarnlpteam.blogspot.com/2007/08/download-links.html
> http://www.mymyanmar.net/2g/
>
> (Another Myanmar font, Parabaik uses OpenType rlig, which ICU doesn't
> process for this code block without further code changes).
>
> There is a possible patch below, which displays Unicode 5.1 Myanmar
> correctly with Padauk, MyMyanmar Unicode and Myanmar3 fonts when used
> with the methods TextLayout.draw, drawString and drawChars in
> Font2DTest. Some attached marks get lost with Padauk using
> TextLayout.getOutline+draw.
>
> I would appreciate feedback on whether to submit this as a patch purely
> for the Myanmar script or whether eflags should be changed more generally.

Before we can accept any patch you will need to sign and submit
the Sun Contributor Agreement. See http://openjdk.java.net/contribute/

>
> Regards,
> Keith Stribley
>
> --- ./jdk/src/share/classes/sun/font/GlyphLayout.java.orig    2008-05-29
> 15:01:33.000000000 +0100
> +++ ./jdk/src/share/classes/sun/font/GlyphLayout.java    2008-05-29
> 23:13:26.000000000 +0100
> @@ -644,11 +644,15 @@
>                      ch = toCodePoint((char)ch,_textRecord.text[++i]);
> // inc
>                  }
>                  int gc = getType(ch);
> +                if (script == 28) { // Myanmar - see LEScripts.h
> +                    this.eflags = 0x3;// 1=kerning, 2=ligatures
> +                    break;
> +                }
>                  if (gc == NON_SPACING_MARK ||
>                      gc == ENCLOSING_MARK ||
>                      gc == COMBINING_SPACING_MARK) { // could do range
> test also
>  
> -                    this.eflags = 0x4;
> +                    this.eflags = 0x4; // 4 = no canonical processing,
> but would 0x3 be better?

I think you have this backwards. 0x4 means DO canonical processing.

>                      break;
>                  }
>              }
> --- ./jdk/src/share/classes/sun/font/FontManager.java.orig    2008-05-28
> 12:46:03.000000000 +0100
> +++ ./jdk/src/share/classes/sun/font/FontManager.java    2008-05-29
> 21:33:31.000000000 +0100
> @@ -3594,6 +3594,12 @@
>              // 0E00 - 0E7F if Thai, assume shaping for vowel, tone marks
>              return true;
>          }
> +        else if (code < 0x1000) {
> +            return false;
> +        }
> +        else if (code < 0x10A0) { // 1000-109F Myanmar
> +            return true;
> +        }
>          else if (code < 0x1780) {
>              return false;
>          }
>


-phil.
Reply | Threaded
Open this post in threaded view
|

Re: Kerning and Ligatures using Layout Engine

phil.race


Phil Race wrote:
> I can take a stab at this but it probably needs the ICU developer to
> confirm
> what's happening there.

Here's what the opentype layout engine developer over at the ICU project said.

 > I'm surprised to read that ICU handles Myanmar correctly. I suppose that
 > the default processing must work with the fonts he sites. My quick reading
 > of the description of the Myanmar script in the Unicode 5.0 book suggests
 > that it would need more processing. His comment about 'rlig' seems to bear
 > this out.
 >
 > If the presence of marks is inhibiting ligature formation, that could be a
 > bug. I'd need to see some specific test cases to know what's going on. In
 > general, I would expect it to work. (I did recently fix some bugs having to
 > do w/ canonical processing that might  be relevant here...)

So probably taking the up to date ICU layout code into JDK would tell us
if that's one of these recent fixed bugs. Steven is planning to do that
but I'm not sure when it'll happen.
So specific test cases (the text and a reference to the font, and
an "actual" vs "expected" description) would be better. But this
may in part come down to ICU not actually having all the code it needs
to render Myanmar script.

-Phil.
Reply | Threaded
Open this post in threaded view
|

Re: Kerning and Ligatures using Layout Engine

Keith Stribley
In reply to this post by phil.race
Hi Phil,

thanks for your response, it has helped clarify my understanding and I
think I've found the real bug - see below.

Phil Race wrote:
> Keith Stribley wrote:
>> I am interested in getting clig,liga,mark,mkmk,kern OpenType tables to
>
> Those are not tables. Those are features in the OpenType GSUB and GPOS
> tables.
Yes, sorry for the confusion.
>
> FYI: it was "current" at the point in JDK 6 development when it was
> integrated.
> JDK 7 will get an updated version in due course.
Great, though I'm not sure it will affect the Myanmar rendering in this
case. ICU 4 in general has support for Unicode 5.1, which is needed for
Myanmar.
>> I'm not quite sure why 0x4 is used as the value when there are marks, I
>> believe it corresponds to "no canonical processing", though I don't know
>> why that is needed.
>
> I think you have this backwards.
Sorry, I read the comment in LayoutEngine::characterProcessing(...) too
quickly and was perhaps confused by the actual behaviour.

Is the "canonical processing" flag an OpenJDK specific optimisation? I
couldn't see it in ICU svn history, but perhaps I was looking at the
wrong revisions.
> So optional ligatures and kerning need to be requested by those
> who know they want them.
This is fine for Latin text, but not for Burmese, which is heavily
dependent on ligatures to make it readable. Most Burmese Unicode fonts
also use kerning.
>
> I would have to dig to be sure what actually happens
> in ICU, but one scenario is mixed script text. Eg some latin followed
> by some complex script. If the optional ligatures were performed by
> layout and you are in say a text editor and delete the complex
> text leaving only the latin text it would look odd if the optional
> ligatures no longer formed and if kerning stopped being applied.
Currently the opposite happens. As soon as the complex script is
present, the kerning and ligatures stop if they were requested in the
TextAttributes.

Consider the latin case with say DoulosSIL
http://scripts.sil.org/DoulosSIL_download :
fi ffi
fi ffi â̬

Both lines should show the ffi ligatures when set in the TextAttributes,
but only the first one does, because the marks on the second line
trigger eflags.
>
> However if you are pointing out that even when specifying
> TextAttribute.KERNING and TextAttribute.LIGATURES that they do not
> get applied, then that would seem like a bug.
Yes.
> But my reading of
> the code is that that the request for kerning and ligatures is
> not held in "eflags" but in "_typo_flags" and the value
> passed down to layout is  "_typo_flags | eflags"
I've looked some more into this, and I believe there is a bug in the
OpenTypeLayoutEngine constructor
(jdk/src/share/native/sun/font/layout/OpenTypeLayoutEngine.cpp). This
currently does not expect a value greater than 3 for typoFlags, so if 4
is applied then kerning and ligatures are disabled, which is fine if
only bit 4 is set, but not if you have 7 from the or that you describe.
This wouldn't trigger in the normal ICU as I don't think they use the
0x4 flag.

I've now added & 0x3, so that 0x4 bit will be ignored when setting the
feature mask:

    switch (typoFlags & 0x3) {
    case 0: break; // default
    case 1: fFeatureMask = kernFeatures; break;
    case 2: fFeatureMask = ligaFeatures; break;
    case 3: fFeatureMask = kernAndLigaFeatures; break;
    default: break;
    }

>
> As far as I can see your patch is equivalent to always
> adding the TextAttribute.KERNING and TextAttribute.LIGATURES
> as attributes on these two fonts (no JDK source code changes
> needed). Is that what you see?
Now I've found the bug in OpenTypeLayoutEngine and if that was fixed,
that might be the case.

But I don't think it should be regarded as just applying for 2 specific
fonts. Burmese relies on ligatures in many words (as do most if not all
of the other languages using the Myanmar code block). If ligatures and
kerning aren't enabled Burmese Unicode text is unreadable. I think they
should be enabled by default for the Myanmar code block, at least when
marks are present.
Most people will not realise that these attributes need to be set to get
correct Burmese (and with the current bug, they won't take affect anyway
because they involve marks).
>
> Before we can accept any patch you will need to sign and submit
> the Sun Contributor Agreement. See http://openjdk.java.net/contribute/
I've just signed and emailed it.
 
Regards,
Keith
Reply | Threaded
Open this post in threaded view
|

Re: Kerning and Ligatures using Layout Engine

Keith Stribley
In reply to this post by phil.race
Phil Race wrote:
>
> Here's what the opentype layout engine developer over at the ICU
> project said.
>
> > I'm surprised to read that ICU handles Myanmar correctly. I suppose
> that
> > the default processing must work with the fonts he sites.
Yes, the fonts have been designed to use the default processing which is
enabled by default in Uniscribe if Complex language support is turned on
in Windows XP SP2 or Vista.
> My quick reading
> > of the description of the Myanmar script in the Unicode 5.0 book
> suggests
> > that it would need more processing. His comment about 'rlig' seems
> to bear
> > this out.
Currently, there is no standard set of OpenType features which have been
defined for Myanmar. It is possible to do the reordering with a sequence
of context ligatures and this is what has been done in the existing
Myanmar Unicode fonts. It is possible that in the future Myanmar
specific OpenType features will be officially defined, in which case
Myanmar specific Layout code could be written to simplify the number of
lookups in the font.

These ligatures are all really required for Myanmar, but different fonts
have chosen to use different features (not just rlig) if they are
currently turned on by default in Uniscribe. Since liga and clig are
turned on by default in ICU when ligature processing is turned on they
work better with it than rlig.

The reason rlig doesn't work is that it isn't defined in the
ligaFeatures define in the top of OpenTypeLayoutEngine.cpp. I tried
adding this in the past with ICU layout and it seemed to work in my test
app.

> >
> > If the presence of marks is inhibiting ligature formation, that
> could be a
> > bug. I'd need to see some specific test cases to know what's going
> on. In
> > general, I would expect it to work. (I did recently fix some bugs
> having to
> > do w/ canonical processing that might  be relevant here...)
I believe it is specific to the way OpenJDK calls ICU. Try the following
in a java editor with DoulosSIL or another font with ffi ligatures. The
ffi ligatures should be enabled on both lines, but are only enabled on
the first until you delete the marks on the second.
fi ffi
fi ffi â̬
(I was testing using the notepad demo app tweaked to set the font and
TextAttributes)
>
> So probably taking the up to date ICU layout code into JDK would tell us
> if that's one of these recent fixed bugs. Steven is planning to do that
> but I'm not sure when it'll happen.
> So specific test cases (the text and a reference to the font, and
> an "actual" vs "expected" description) would be better.
Shall I open a bug for this and attach some screenshots of the rendering?
> But this
> may in part come down to ICU not actually having all the code it needs
> to render Myanmar script.
>
The fonts I'm using work fine with the default features enabled in
Uniscribe and mostly work in Pango. I agree that the fonts could be
simplified if Burmese reordering was handled in a standard manner by the
commonly used layout engines, but for the moment they are being written
to take advantage of features which are enabled by default in common
rendering engines.

Keith