WTF String
1.KURL:
it's the type of String in WTF,
we use
to return the char * in the KURL.
And how does it come out? a KURL is defined in WebCore/Platform/KURL.h, it has a member function string().
b KURL.string(), return a class of String, which was defined in wtf/text/WTFString.h, it has a member function utf8()
c String.utf8() return a class of CString , which was defined in wtf/text/CString.h, it has a member function data().
d CString.data() return the type of char* , which can be printed directly.
to be continued
2. String in WTF
a. StringImplBase
defined in "WTF/text/StringImplBase.h"
it has no concrete implementation
b.StringImpl
defined in "WTF/text/StringImpl.h"
StringImplBase<--StringImpl
it hold a member UChar (wchar_t)//16 or 32 bit.
c.String
defined in "WTF/text/WTFString.h"
String hold a member of "StringImpl".
it has function to return it's data in CString
was defined in "WTF/text/CString.h"
it has a function "data()"to return the type of "char*".
how is the UChar come to char *
the most imporant function was String.utf8() in "WTF/text/WTFString.h"
using "convertUTF16ToUTF8".//so UChar is 16 bit.
which was defined in "WTF/unicode/utf8.cpp"
so the most common type used in webcore is "UChar",
the most common type we use to print is "char*"
the most common way for the conversion is "UChar->utf8->char*".
definition
for 2.3
it's the type of String in WTF,
we use
KURL.string().utf8().data();
to return the char * in the KURL.
And how does it come out? a KURL is defined in WebCore/Platform/KURL.h, it has a member function string().
b KURL.string(), return a class of String, which was defined in wtf/text/WTFString.h, it has a member function utf8()
c String.utf8() return a class of CString , which was defined in wtf/text/CString.h, it has a member function data().
d CString.data() return the type of char* , which can be printed directly.
to be continued
2. String in WTF
a. StringImplBase
defined in "WTF/text/StringImplBase.h"
it has no concrete implementation
b.StringImpl
defined in "WTF/text/StringImpl.h"
StringImplBase<--StringImpl
it hold a member UChar (wchar_t)//16 or 32 bit.
c.String
defined in "WTF/text/WTFString.h"
String hold a member of "StringImpl".
it has function to return it's data in CString
CString ascii() const;
CString latin1() const;
CString utf8(bool strict = false) const;
d. CString was defined in "WTF/text/CString.h"
it has a function "data()"to return the type of "char*".
how is the UChar come to char *
the most imporant function was String.utf8() in "WTF/text/WTFString.h"
using "convertUTF16ToUTF8".//so UChar is 16 bit.
which was defined in "WTF/unicode/utf8.cpp"
ConversionResult convertUTF16ToUTF8(
const UChar** sourceStart, const UChar* sourceEnd,
char** targetStart, char* targetEnd, bool strict)
{
ConversionResult result = conversionOK;
const UChar* source = *sourceStart;
char* target = *targetStart;
while (source < sourceEnd) {
UChar32 ch;
unsigned short bytesToWrite = 0;
const UChar32 byteMask = 0xBF;
const UChar32 byteMark = 0x80;
const UChar* oldSource = source; // In case we have to back up because of
//target overflow.
ch = static_cast<unsigned short>(*source++);
// If we have a surrogate pair, convert to UChar32 first.
if (ch >= 0xD800 && ch <= 0xDBFF) {
// If the 16 bits following the high surrogate are in the source buffer...
if (source < sourceEnd) {
UChar32 ch2 = static_cast<unsigned short>(*source);
// If it's a low surrogate, convert to UChar32.
if (ch2 >= 0xDC00 && ch2 <= 0xDFFF) {
ch = ((ch - 0xD800) << 10) + (ch2 - 0xDC00) + 0x0010000;
++source;
} else if (strict) { // it's an unpaired high surrogate
--source; // return to the illegal value itself
result = sourceIllegal;
break;
}
} else { // We don't have the 16 bits following the high surrogate.
--source; // return to the high surrogate
result = sourceExhausted;
break;
}
} else if (strict) {
// UTF-16 surrogate values are illegal in UTF-32
if (ch >= 0xDC00 && ch <= 0xDFFF) {
--source; // return to the illegal value itself
result = sourceIllegal;
break;
}
}
// Figure out how many bytes the result will require
if (ch < (UChar32)0x80) {
bytesToWrite = 1;
} else if (ch < (UChar32)0x800) {
bytesToWrite = 2;
} else if (ch < (UChar32)0x10000) {
bytesToWrite = 3;
} else if (ch < (UChar32)0x110000) {
bytesToWrite = 4;
} else {
bytesToWrite = 3;
ch = 0xFFFD;
}
target += bytesToWrite;
if (target > targetEnd) {
source = oldSource; // Back up source pointer!
target -= bytesToWrite;
result = targetExhausted;
break;
}
switch (bytesToWrite) { // note: everything falls through.
case 4: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6;
case 3: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6;
case 2: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6;
case 1: *--target = (char)(ch | firstByteMark[bytesToWrite]);
}
target += bytesToWrite;
}
*sourceStart = source;
*targetStart = target;
return result;
}
so the most common type used in webcore is "UChar",
the most common type we use to print is "char*"
the most common way for the conversion is "UChar->utf8->char*".
definition
ConversionResult convertUTF8ToUTF16(
const char** sourceStart, const char* sourceEnd,
UChar** targetStart, UChar* targetEnd, bool strict = true);
ConversionResult convertUTF16ToUTF8(
const UChar** sourceStart, const UChar* sourceEnd,
char** targetStart, char* targetEnd, bool strict = true);
example: ConversionResult result = convertUTF16ToUTF8(&characters, characters + length,
&buffer, buffer + bufferVector.size(), strict);
for 2.3
#include "PlatformString.h"
#include "CString.h"
to include the head file