Invalid byte 2 of 2-byte UTF-8 sequence. Nested exception: Invalid byte 2 of 2-b

4841 ワード

XML解析時符号化エラー

インタフェースの解析中に発生したエラー:
Invalid byte 2 of 2-byte UTF-8 sequence. Nested exception: Invalid byte 2 of 2-byte UTF-8 sequence.
明らかにXMLファイルを読み込むときに発生する符号化の問題です!
テストの過程で、主な原因はxmlファイルに宣言された符号化がxmlファイル自体が保存されているときの符号化と一致しないことであることが分かった.
今解決する方法はいくつかあります.主に私がテストした2つの方法です.
XMLファイルのUTF-8符号化を変更してGBKまたはGB 2312に変更するには、ファイル形式で直接読み込むことができます.
もう1つは、Documentオブジェクトに変換されたままURLでInputStreamストリームを取得する形で読み込むことです.この方法の解決策は,まずdownしてローカルに保存することである.比較的簡単にOutputStreamストリームを保存したいディレクトリに書き込むことができます.ダウンしたファイルのうちSAXReader saxReader=new SAXReader()を再解析します.
その後Document document=sax.read(new File(file));前
XMLファイル符号化フォーマットを処理すると、次の処理方法が呼び出されます.
方法:


/**
 *  
* */
	public static String get_charset(File file) {
		String charset = "GBK";
		byte[] first3Bytes = new byte[3];
		try {
			boolean checked = false;
			;
			BufferedInputStream bis = new BufferedInputStream(
					new FileInputStream(file));
			bis.mark(0);
			int read = bis.read(first3Bytes, 0, 3);
			if (read == -1)
				return charset;
			if (first3Bytes[0] == (byte) 0xFF && first3Bytes[1] == (byte) 0xFE) {
				charset = "UTF-16LE";
				checked = true;
			} else if (first3Bytes[0] == (byte) 0xFE
					&& first3Bytes[1] == (byte) 0xFF) {
				charset = "UTF-16BE";
				checked = true;
			} else if (first3Bytes[0] == (byte) 0xEF
					&& first3Bytes[1] == (byte) 0xBB
					&& first3Bytes[2] == (byte) 0xBF) {
				charset = "UTF-8";
				checked = true;
			}
			bis.reset();
			if (!checked) {
				// int len = 0;
				int loc = 0;

				while ((read = bis.read()) != -1) {
					loc++;
					if (read >= 0xF0)
						break;
					if (0x80 <= read && read <= 0xBF) //  BF ， GBK
						break;
					if (0xC0 <= read && read <= 0xDF) {
						read = bis.read();
						if (0x80 <= read && read <= 0xBF) //   (0xC0 - 0xDF)
							// (0x80
							// - 0xBF), GB 
							continue;
						else
							break;
					} else if (0xE0 <= read && read <= 0xEF) {//  ， 
						read = bis.read();
						if (0x80 <= read && read <= 0xBF) {
							read = bis.read();
							if (0x80 <= read && read <= 0xBF) {
								charset = "UTF-8";
								break;
							} else
								break;
						} else
							break;
					}
				}

			}

			bis.close();
		} catch (Exception e) {
			e.printStackTrace();
		}

		return charset;
	}


/**
*down    
*/
public static void writeFile(String strUrl, String filePath, String fileName) {
		try {
			URL url = new URL(strUrl);
			InputStream is = url.openStream();
			File f = new File(filePath);
			f.mkdirs();

			OutputStream os = new FileOutputStream(filePath + fileName);

			int bytesRead = 0;
			byte[] buffer = new byte[8192];

			while ((bytesRead = is.read(buffer, 0, 8192)) != -1) {
				os.write(buffer, 0, bytesRead);
			}
		} catch (Exception e2) {
			e2.printStackTrace();
		}
	}


/**
 *  
 * */
private static byte[] InputStreamToByte(InputStream is) throws IOException {
		ByteArrayOutputStream byteArrOut = new ByteArrayOutputStream();
		byte[] temp = new byte[1024];
		int len = 0;
		while ((len = is.read(temp, 0, 1024)) != -1) {
			byteArrOut.write(temp, 0, len);
		}
		byteArrOut.flush();
		byte[] bytes = byteArrOut.toByteArray();
		return bytes;
	}

InputStreamis:ストリームinputStreamオブジェクトでもfileパスでも自分で変換できます!
テストクラスで
次の方法を次の方法に変更することができます(あなたはそうではないかもしれません):


SAXReader sax = new SAXReader();//  dom4j 	
Document document=sax.read(new File(file));
Element element=document.getRootElement();
System.out.println(element.getName());


SAXReader saxReader = new SAXReader();
// 
[color=red]byte[] bytes = InputStreamToByte(new FileInputStream(file));
InputStream in = new ByteArrayInputStream(bytes);
InputStreamReader strInStream = new InputStreamReader(in,"GBK");[/color]
Document root = saxReader.read(strInStream);
Element element = root.getRootElement();
System.out.println(element.getName());

これで正常に出力できます.
最も重要なのは【変換ストリーム符号化タイプ方法】ネット上のいくつかの解決方法よりずっと簡単です!

TIL no.3-CSSプロパティ、セレクタ

[TIL#3]React-Props、Stateは?