boost localeライブラリによる文字セット変換の問題

4070 ワード

プログラミングの心得

boost::locale::conv::to_utfはstd::stringをstd::wstirngに変換する関数を書き、インタフェースの形式を簡略化し、コードがプラットフォームにまたがる目的を達成することができる.
しかし、MSVC 2010はWindows 10でテスト中に問題が発生した.以下はテストコードの説明状況です.

int _tmain(int argc, _TCHAR *argv[])
{
  try {
    std::locale loc = boost::locale::generator().generate("" );

    std::wstring a = L"1: Five Chinese words[ ]_by macro L";
    std::wstring b = boost::locale::conv::to_utf( "2: Five Chinese words[ ]_by boost to_utf", loc ); 
    std::wstring c = L"3: Five Chinese words[ ]_by macro L"; 

    std::wcout.imbue( std::locale("") );
    std::wcout << a <<  std::endl;
    std::wcout << b <<  std::endl;
    std::wcout << c <<  std::endl;

    std::cout << "
  charset of std::locale(\"\") is " 
              << std::use_facet<:locale::info>(loc).encoding()
              << std::endl;

  } catch ( std::exception &e ) {
    std::cout << e.what() << std::endl;
  }
}

本期待出力4行、

1: Five Chinese words[ ]_by macro L
2: Five Chinese words[ ]_by boost to_utf
3: Five Chinese words[ ]_by macro L

  charset of std::locale("") is cp936

実際の結果は次のとおりです.

1: Five Chinese words[ ]_by macro L
2: Five Chinese words[
  charset of std::locale("") is utf-8

1行目、正しい;
2行目は、中国語から始まり、表示されません.3行目、表示されません.
4行目、表示システムのデフォルト文字セットはutf-8である.(実際に出力された3行目)
2行目の中国語からwcoutが正常に働かなくなったことがわかります.これはboostのto_とutf関数の説明が一致しません.4行目の結果「utf-8」は問題の原因を説明している.boost::locale::generator().generate(")によって生成されたstd::locale loc.ここで、文字セット符号化は、Windowsシステムで実際に使用されているコードページcp 936(文字セットGBKに対応)ではなくutf-8である.でも、to_utfはboost::locale::generatorによって生成されたlocaleのみを受け入れることができ、std::loale("")を直接受け入れることができず、後者は「bad_cast」異常を引き起こす.したがってwindowsではto_しか使用できませんutfの第2の形式は、文字セット名を直接指定して変換することである.to_utfのLinuxでの挙動は予想に合致した.以上の理由から、WindowsとLinuxを区別せざるを得ない.Windowsでは文字セット名を使用し、Linuxではboostの標準表記を使用します.完全な例は次のとおりです.

#ifdef WIN32
#include 
#endif

#include 
#include 
#include 
#include 

#pragma comment(lib,"libboost_thread-vc100-mt-gd-1_55.lib")
#pragma comment(lib,"libboost_system-vc100-mt-gd-1_55.lib")

std::wstring toutf( std::string & src )
{
  #ifdef WIN32
      static std::string codepage;
    if ( codepage.empty() ) {
      //  。 Windows, 
      CPINFOEX  cpinfo;
      GetCPInfoEx( CP_ACP, 0, &cpinfo );
      cpinfo.CodePageName;
      codepage = "CP" + boost::lexical_cast<:string>(cpinfo.CodePage);
    }
    
    std::wstring dst = boost::locale::conv::to_utf( src, codepage.c_str() );
  #else
    std::locale loc = boost::locale::generator().generate("");
    std::wstring dst = boost::locale::conv::to_utf( src, loc );
  #endif
  
  return dst;
}

int main(int argc, char *argv[])
{
  try {
    std::locale loc = boost::locale::generator().generate("" );

    std::wstring a = L"1: Five Chinese words[ ]_by macro L";
    std::wstring b = toutf( std::string("2: Five Chinese words[ ]_by boost to_utf") ); 
    std::wstring c = L"3: Five Chinese words[ ]_by macro L"; 

    std::wcout.imbue( std::locale("") );
    std::wcout << a <<  std::endl;
    std::wcout << b <<  std::endl;
    std::wcout << c <<  std::endl;

    std::cout << "
  charset of std::locale(\"\") is " 
      << std::use_facet<:locale::info>(loc).encoding()
      << std::endl;

  } catch ( std::exception &e ) {
    std::cout << e.what() << std::endl;
  }
}

出力の結果は次のとおりです.

1: Five Chinese words[ ]_by macro L
2: Five Chinese words[ ]_by boost to_utf
3: Five Chinese words[ ]_by macro L

  charset of std::locale("") is utf-8

ピットシリーズ--通過.と[]取得属性値の違い

PHPMailerの設置方法及び簡単な例