flying saucer生成pdf実装句読点が行頭にない


プロジェクトの過程で,flying saucerを用いて生成されたpdfファイルでは,行頭に中国語の句読点がよく現れるという問題が見つかった.flying saucerのソースコードとテストを注意深く分析した結果、最終的な解決策は以下の通りです.
まず、私たちはこれらのものの本当の意味を理解しなければなりません.
 
Character.UnicodeBlock.CJK_UNIFIED_IDEOgraPHS:4 E 00-9 FBF:CJK統一表意記号
Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOgraPHS:F 900-AFF:CJK対応象形文字Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A:3400-4 DBF:CJK統一表意符号拡張A
CJKは「Chinese,Japanese,Korea」の略で、実は中日韓三国の象形文字のUnicodeコード
Character.UnicodeBlock.GENERAL_PUNCTUATION:2000-206 F:常用句読点Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION:3000-303 F:CJK記号と句読点Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS:FF 00-FEF:半角及び全角形式
 
flying saucerがpdfを生成する過程で、ソースコードはグループと行幅で改行され、具体的な論理は以下の通りである.
スペースをグループ化すると、隣接する2つのスペースの間の文字は改行できません.行の複数の文字の合計が行幅に最も近いとき、つまり1つの文字のグループを追加すると行幅を超えて改行されます.
 
この方法は英語には効果的ですが、中国語にはだめです.中国語はスペースで区切られていないからです.yye_JAvaeye兄さんはソースコードを改善し、中国語文字かどうかを判断し、中国語文字であれば、各中国語文字をグループ化し、グループ+行幅で改行しました.彼は2つの方法を追加しました:isChinese(char c)とgetStrRight(String s,int left)、変更後のソースコードの一部は以下の通りです.
。。。。。。
package org.xhtmlrenderer.layout;

import org.xhtmlrenderer.css.constants.IdentValue;
import org.xhtmlrenderer.css.style.CalculatedStyle;
import org.xhtmlrenderer.render.FSFont;

/**
 * A utility class that scans the text of a single inline box, looking for the 
 * next break point.
 * @author Torbj�rn Gannholm
 */
public class Breaker {

。。。。。。    
    public static void breakText(LayoutContext c, 
            LineBreakContext context, int avail, CalculatedStyle style) {
。。。。。。
        String currentString = context.getStartSubstring();
        int left = 0;
//        int right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);
        int right = getStrRight(currentString,left);
        int lastWrap = 0;
        int graphicsLength = 0;
        int lastGraphicsLength = 0;

        while (right > 0 && graphicsLength <= avail) {
            lastGraphicsLength = graphicsLength;
            graphicsLength += c.getTextRenderer().getWidth(
                    c.getFontContext(), font, currentString.substring(left, right));
            lastWrap = left;
            left = right;
//            right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);
            right = getStrRight(currentString,left+1);
        }

。。。。。。
    }

    private static boolean isChinese(char c) {
        Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
        if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS
                || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
                || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
                || ub == Character.UnicodeBlock.GENERAL_PUNCTUATION
                || ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION
                || ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS) {
            return true;
        }
        return false;
    }

    private static int getStrRight(String s,int left){
        if(left>=s.length())
            return -1;
        char[] ch = s.toCharArray();
        for(int i = left;i<ch.length;i++){
            if(isChinese(ch[i]) || ' ' == ch[i]){
                return i==0?i+1:i;
            }
        }
        return -1;
    }

}

 
 
この方法は確かに中国語の漢字の改行を実現したが、「句読点が行頭にある」という問題ももたらした.彼は句読点も漢字と見なしているため、句読点もグループ化されて改行され、解決策は句読点をisChinese方法から取り除くことだ.ソースコードは次のとおりです.
/*
 * Breaker.java
 * Copyright (c) 2004, 2005 Torbj  n Gannholm, 
 * Copyright (c) 2005 Wisconsin Court System
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU Lesser General Public License
 * as published by the Free Software Foundation; either version 2.1
 * of the License, or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU Lesser General Public License for more details.
 *
 * You should have received a copy of the GNU Lesser General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 */
package org.xhtmlrenderer.layout;

import org.xhtmlrenderer.css.constants.IdentValue;
import org.xhtmlrenderer.css.style.CalculatedStyle;
import org.xhtmlrenderer.render.FSFont;

/**
 * A utility class that scans the text of a single inline box, looking for the 
 * next break point.
 * @author Torbj  n Gannholm
 */
public class Breaker {

    public static void breakFirstLetter(LayoutContext c, LineBreakContext context,
            int avail, CalculatedStyle style) {
        FSFont font = style.getFSFont(c);
        context.setEnd(getFirstLetterEnd(context.getMaster(), context.getStart()));
        context.setWidth(c.getTextRenderer().getWidth(
                c.getFontContext(), font, context.getCalculatedSubstring()));
        
        if (context.getWidth() > avail) {
            context.setNeedsNewLine(true);
            context.setUnbreakable(true);
        }
    }
    
    private static int getFirstLetterEnd(String text, int start) {
        int i = start;
        while (i < text.length()) {
            char c = text.charAt(i);
            int type = Character.getType(c);
            if (type == Character.START_PUNCTUATION || 
                    type == Character.END_PUNCTUATION ||
                    type == Character.INITIAL_QUOTE_PUNCTUATION ||
                    type == Character.FINAL_QUOTE_PUNCTUATION ||
                    type == Character.OTHER_PUNCTUATION) {
                i++;
            } else {
                break;
            }
        }
        if (i < text.length()) {
            i++;
        }
        return i;
    }    
    
    public static void breakText(LayoutContext c, 
            LineBreakContext context, int avail, CalculatedStyle style) {
        FSFont font = style.getFSFont(c);
        IdentValue whitespace = style.getWhitespace();
        
        // ====== handle nowrap
        if (whitespace == IdentValue.NOWRAP) {
        	context.setEnd(context.getLast());
        	context.setWidth(c.getTextRenderer().getWidth(
                    c.getFontContext(), font, context.getCalculatedSubstring()));
            return;
        }

        //check if we should break on the next newline
        if (whitespace == IdentValue.PRE ||
                whitespace == IdentValue.PRE_WRAP ||
                whitespace == IdentValue.PRE_LINE) {
            int n = context.getStartSubstring().indexOf(WhitespaceStripper.EOL);
            if (n > -1) {
                context.setEnd(context.getStart() + n + 1);
                context.setWidth(c.getTextRenderer().getWidth(
                        c.getFontContext(), font, context.getCalculatedSubstring()));
                context.setNeedsNewLine(true);
                context.setEndsOnNL(true);
            } else if (whitespace == IdentValue.PRE) {
            	context.setEnd(context.getLast());
                context.setWidth(c.getTextRenderer().getWidth(
                        c.getFontContext(), font, context.getCalculatedSubstring()));  
            }
        }

        //check if we may wrap
        if (whitespace == IdentValue.PRE || 
                (context.isNeedsNewLine() && context.getWidth() <= avail)) {
            return;
        }
        
        context.setEndsOnNL(false);

        String currentString = context.getStartSubstring();
        int left = 0;
//        int right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);
        int right = getStrRight(currentString,left);
        int lastWrap = 0;
        int graphicsLength = 0;
        int lastGraphicsLength = 0;

        while (right > 0 && graphicsLength <= avail) {
            lastGraphicsLength = graphicsLength;
            graphicsLength += c.getTextRenderer().getWidth(
                    c.getFontContext(), font, currentString.substring(left, right));
            lastWrap = left;
            left = right;
//            right = currentString.indexOf(WhitespaceStripper.SPACE, left + 1);
            right = getStrRight(currentString,left+1);
        }

        if (graphicsLength <= avail) {
            //try for the last bit too!
            lastWrap = left;
            lastGraphicsLength = graphicsLength;
            graphicsLength += c.getTextRenderer().getWidth(
                    c.getFontContext(), font, currentString.substring(left));
        }

        if (graphicsLength <= avail) {
            context.setWidth(graphicsLength);
            context.setEnd(context.getMaster().length());
            //It fit!
            return;
        }
        
        context.setNeedsNewLine(true);

        if (lastWrap != 0) {//found a place to wrap
            context.setEnd(context.getStart() + lastWrap);
            context.setWidth(lastGraphicsLength);
        } else {//unbreakable string
            if (left == 0) {
                left = currentString.length();
            }
            
            context.setEnd(context.getStart() + left);
            context.setUnbreakable(true);
            
            if (left == currentString.length()) {
                context.setWidth(c.getTextRenderer().getWidth(
                        c.getFontContext(), font, context.getCalculatedSubstring()));
            } else {
                context.setWidth(graphicsLength);
            }
        }
        return;
    }

    private static boolean isChinese(char c) {
        Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
        if (ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS
                || ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
                || ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A) {
            return true;
        }
        return false;
    }

    private static int getStrRight(String s,int left){
        if(left>=s.length())
            return -1;
        char[] ch = s.toCharArray();
        for(int i = left;i<ch.length;i++){
            if(isChinese(ch[i]) || ' ' == ch[i]){
                return i==0?i+1:i;
            }
        }
        return -1;
    }    

}


 
対応するcore-renderer.JArファイルは添付ファイルにあります.