Optimising Asian fonts for Multi-language flash sites

April 23rd 2009

So I'm sure you've done a website that needed to be on 1,238 different languages. And every time you reached Chinese, Japanese, Korean... you got surprised that just the embeded font made your swf 9,000kbytes big.

For this project we're working on I'm mainly doing little tools with PHP. One of them is a translations manager, so you have a little SQL database with all the keywords and languages and someone fills it with data. At any point you can export it as .xml ready to be used in the website.

Having this set up, Theo came up with the idea that, as we had control on the text that was going to be needed for each language, we could do a script to output the list of characters needed for each font.

The PHP script goes down to this:

// In this case $lines is a associative array that comes from MySQL.

$list = array();

foreach($lines as $line)
{
	$string = $line["text"];
	$string = strip_tags($string);
	$string = str_replace('\n','',$string);

	preg_match_all('/./u', $string, $chars);

	foreach($chars[0] as $char)
	{
		$found = false;

		foreach($list as $listchar)
			if ($listchar == $char)
				$found = true;

		if ($found == false)
			$list[] = $char;
	}
}

foreach($list as $item)
{
	echo "U+" . zeropad( strtoupper( dechex( substr( mb_encode_numericentity ( $item, array (0x0, 0xffff, 0, 0xffff), 'UTF-8'), 2, -1 ) ) ), 4 ) . ",";
}

You'll also need this:

function zeropad($num, $lim)
{
   return (strlen($num) >= $lim) ? $num : zeropad("0" . $num, $lim);
}

What this code does (properly setted up in yours) is split the whole string into characters and check one by one if has been added to the list of characters used, if it's a new it just adds it. Then it writes a unicode list formated as U+XXXX. The output looks something like this:

U+0043, U+0048, U+0041, U+004E, U+0045, U+004C, U+002E, U+004F, U+004D, U+0052, U+0044, U+0049, U+0054, U+0053, U+5168, U+5C4F, U+89C2, U+770B, U+5176, U+5B83, U+8BED, U+8A00, U+6CD5, U+5F8B, U+58F0, U+660E, U+97F3, U+91CF, U+5E55, U+540E, U+82B1, U+7D6E, U+5965, U+9EDB, U+4E3D, U+2022, U+5854, U+56FE, U+0020, U+4E0E, U+8BA9, U+002D, U+76AE, U+8036, U+5C14, U+70ED, U+5185, U+62CD, U+6444, U+8BB0, U+5F55, U+73B0, U+573A, U+5F71, U+7247, U+0032, U+5206, U+0030, U+79D2, U+0036, U+00B0, U+0035, U+4F20, U+5947, U+4E3A, U+4EC0, U+4E48, U+9009, U+5851, U+9020, U+5973, U+795E, U+642D, U+4E58, U+591C, U+95F4, U+5217, U+8F66, U+7684, U+4EBA, U+6027, U+611F, U+8BF1, U+60D1, U+4F60, U+6700, U+559C, U+7231, U+955C, U+5934, U+7B2C, U+4E00, U+6B21, U+7EED, U+5199, U+8F89, U+714C, U+4EE3, U+00BA, U+9999, U+6C34, U+6C1B, U+5FC6, U+6211, U+53F7, U+2014, U+79D8, U+6570, U+5B57, U+0039, U+5948, U+513F, U+4E4B, U+5E74, U+7537, U+4E3B, U+89D2, U+5D14, U+7EF4, U+65AF, U+0660, U+8FBE, U+6587, U+6CE2, U+7279, U+8FC7, U+7A0B, U+4E2D, U+7F8E, U+597D, U+56DE, U+5609, U+4F2F, U+8389, U+5212, U+65F6, U+521B, U+4F5C, U+73CD, U+8D35, U+6735, U+539F, U+6599, U+5999, U+8C03, U+548C, U+5242, U+7A7F, U+8D8A, U+5149, U+7ECF, U+5178, U+56DB, U+79CD, U+6F14, U+7ECE

What's this for you'll ask. Well, just look at this:

[Embed(source="yourfont.ttf", fontFamily="YourFont", fontWeight= "bold", fontStyle = "normal",advancedAntiAliasing="true", mimeType="application/x-font-truetype", 
unicodeRange="U+0043, U+0048, U+0041, U+004E, U+0045, U+004C, U+002E, U+004F, U+004D, U+0052, U+0044, U+0049, U+0054, U+0053, U+5168, U+5C4F, U+89C2, U+770B, U+5176, U+5B83, U+8BED, U+8A00, U+6CD5, U+5F8B, U+58F0, U+660E, U+97F3, U+91CF, U+5E55, U+540E, U+82B1, U+7D6E, U+5965, U+9EDB, U+4E3D, U+2022, U+5854, U+56FE, U+0020, U+4E0E, U+8BA9, U+002D, U+76AE, U+8036, U+5C14, U+70ED, U+5185, U+62CD, U+6444, U+8BB0, U+5F55, U+73B0, U+573A, U+5F71, U+7247, U+0032, U+5206, U+0030, U+79D2, U+0036, U+00B0, U+0035, U+4F20, U+5947, U+4E3A, U+4EC0, U+4E48, U+9009, U+5851, U+9020, U+5973, U+795E, U+642D, U+4E58, U+591C, U+95F4, U+5217, U+8F66, U+7684, U+4EBA, U+6027, U+611F, U+8BF1, U+60D1, U+4F60, U+6700, U+559C, U+7231, U+955C, U+5934, U+7B2C, U+4E00, U+6B21, U+7EED, U+5199, U+8F89, U+714C, U+4EE3, U+00BA, U+9999, U+6C34, U+6C1B, U+5FC6, U+6211, U+53F7, U+2014, U+79D8, U+6570, U+5B57, U+0039, U+5948, U+513F, U+4E4B, U+5E74, U+7537, U+4E3B, U+89D2, U+5D14, U+7EF4, U+65AF, U+0660, U+8FBE, U+6587, U+6CE2, U+7279, U+8FC7, U+7A0B, U+4E2D, U+7F8E, U+597D, U+56DE, U+5609, U+4F2F, U+8389, U+5212, U+65F6, U+521B, U+4F5C, U+73CD, U+8D35, U+6735, U+539F, U+6599, U+5999, U+8C03, U+548C, U+5242, U+7A7F, U+8D8A, U+5149, U+7ECF, U+5178, U+56DB, U+79CD, U+6F14, U+7ECE")]
public var FontClass:Class;

In this way, you're going to import on the .swf only the characters you're using from the .ttf.

In our case, Chinese went down from 9,554kbytes to 45kbytes. That's a 99.6% reduction. Pretty cool!.

Hopefully this will save some sleepless nights to someone.

23 comments written so far...

Here comes the doobster,
shrinking some embeds.
doobsterelly, with the unicooodes.
lowering fiiilesiiize.
alriiight
April 23rd 2009
Inspired12
The price of such solution is that it's more complicated to change the text :/

interesting solution anyway :)
April 23rd 2009
grgrdvrt
Yeah. But usually texts doesn't change much at this stage.
April 23rd 2009
mr.doob
The problem (or rather the inefficiency) for such an approach is that you would have to republish the font every time there is a copy change.

My main gripe is that there is really no good solution to this. Apart from the huge size of the non-latin character sets for fonts, I feel that at some level, the font downloading should be something handled on the browser (similar to @font-face in css).
April 23rd 2009
Ronnie
Any one wanting to learn how to create the font swf in flash CS4, check out this Lee Brimelow tutorial:
http://www.gotoandlearn.com/play?id=102
April 24th 2009
Rob Shearing
and you're compiling the swf with the embed tag server-wise anytime the text changes?
April 24th 2009
sascha
No no, the text doesn't change. Although having that linked to a online CMS is a interesting idea too.
April 24th 2009
mr.doob
That's the biggest pain for Chinese site since ever...with as3 it is much easier than b4.For flash project we publish a swf that exports one or several textfield with needed characters embeded for storing fonts and another textfield as a public component for actual using...it may look complicated but can do the magic.And yes,it is complicated too,to change texts and add new chacaters..we've already get used to...
April 24th 2009
Ryan
It a cool solution but it not work correctly when texts are dynamics (an admin can enter an new word using other letters).
It's for what i'm working on more complexe other solution :
streaming needed chars (load only chars you need at "t" instant)

Demo :
http://memmie.lenglet.name/documents/lab/fontstream/waterfall_demo.html

Post (only fr at this moment)
http://memmie.lenglet.name/?p=33

No source yet, but release it soon.

For quickly describe it, this use the same hack of sound generation in flash 9 (dynamic generation of SWF file bytecode) including font data (loaded chars) and voilà !
April 24th 2009
Mem's
That's a really good solution too Mem! I guess you're parsing the .ttf with php or something, otherwise, if you're accessing the .ttf directly we would have problems of making public fonts that don't allow that on their license.
April 24th 2009
mr.doob
So ideally, the next step would be for the server-side recompile a font swf each time the copy gets changed. Is this possible?
April 24th 2009
Mike Tucker
It is not impossible. But I think Mem's approach is much more interesting (streaming the font).
April 24th 2009
mr.doob
Font licencing can be a problem. I don't known.
PHP read a specific binary file.
This file is generated from a SWF.
Each glyphs binary data is extracted (SHAPE type in SWF file format and more like advance or kernings) and keept the same form inside generated file.
It's roughly the same as SWF data about fonts (DefineFont3) but reoganizated for speed usage and more.
The client receive the same file without not needed chars and so on
April 24th 2009
Mem's
interesting.
April 25th 2009
yeson
Hey Mr Doob,

I'm pretty sure to know why you are posting this article :D

Don't worry, the end is coming very soon ;)
April 27th 2009
samoth
I wonder who you are samoth... ;D

Uhm... Arabic went in today! Arabic is tricky. Hint: If you think the characters are not being displayed properly, this is the step you're probably forgetting:

http://www.arabicode.com/en/flaraby/swf/
April 28th 2009
mr.doob
Oh! Seems like I know who you're now samoth, "my people" found out ;)
April 28th 2009
mr.doob
nice solution, might rewrite it in flash.

I've been looking at ways of taking screen shots of non embedded fonts and then smoothing them somehow to emulate the antialiasing of embedded fonts.

Have tried increasing the size of the text, getting a bitmap, blurring it slightly, turning on smoothing and scaling it back down but it still looks very grainy and aliased.

It's annoying that _sans cant be made to look smooth in fp9.
May 20th 2009
caz
lol yeah as easy as a question to A. ;)
May 21st 2009
samoth
@caz
if you have a program like photoshop, then usually you can write it out with the font tool onto a transparent background, resize the file to the size you want, then saving them as .PNG 24bit files, they have alpha which prevents both problems unless fp9 has a horrid .PNG rendering, or if you try to resize them in flash, if that was the case, you could possibly (with time and patience) export a vector version to flash, which will (hopefully) have no problems.
May 29th 2009
Squinty
It would be very interesting to combine both your approaches + do some manual labour to get the most common chars nailed down for each language (not applicable for western languages).

You'd scan your content & generate font SWFs based on that, pretty simple stuff that could be integrated in the publish step for your CMS.

If you had text input on your site you would include the common chars & load the rest on the fly from Mems font service.

All this would be wrapped in a special TextField component that knows about what font glyphs are loaded & can initiate the new glyph fetching, so the coder would never even have to worry about it – just get the font service running.

Maybe I've just re-capped what you have said, maybe not. :)
June 19th 2009
Erik Pettersson
Hi,

Mem's approach is pretty cool!!

I worked in a similar System as described in this post back in 2005 using swfmill (http://swfmill.org/). Nowadays, AS3 allows to do it in a "more AS way", as mr.doob does. Btw, you can generate the font on the server, you will need the Flex SDK installed on your server and compile the SWF remotely. I have been using this in a few projects. Actually what we did with swfmill was exactly the same:

1) Use CMS to feed your content
2) Fetch unique glyphs from content
3) Compile the SWF remotely

:)

SWFMILL is working in AS2 as well. But MR. doob's approach is much more up-to-date. And, anyway, who wants to use AS2 any longer? ;)
June 29th 2009
Edu
very useful post, really a very good trick! ;)
October 13th 2009
ángel

Have your say!

Name:

Website:

Comment:

Some of the projects that I worked on.



Some of the HTML5 and Actionscript experiments I've done.