Java – how to avoid losing punctuation when extracting data from MySQL database using JDBC?

First, I'm using:

Java 1.7.0_02
MysqL 5.1.50
ZendServer CE (if that matters)

The jdbc driver I use to connect from Java to MySQL is com MysqL. jdbc. Driver. The connection to the database is working properly

My connection string is:

jdbc:MysqL://localhost:3306/table

And try to solve the problems I have added

?useUnicode=true&characterEncoding=UTF-8

To the connection string

I'm using Wikipedia dump. All the text is in MediaWiki format. I'm using jwpl to parse the content, which is very beautiful for me, and I'm extracting, parsing and displaying me through HTML. I've lost characters like '–' and single quotation marks, but take the earth instead of the earth

After some tests, I have come to the conclusion that there are no correctly encoded / decoded characters between MySQL query and processing strings in Java. I come to this conclusion because the text in the database (stored as mediablob) has the correct characters, as it should be, and the immediate output of strings in Java after DB call destroys / loses characters ('?' instead of Japanese characters, etc.)

I have verified the system getProperty(“file.encoding”); It is UTF-8, so the JVM should encode the string when printing correctly (unless there is a problem with the JVM's UTF-8 > utf-16 > UTF-8 conversion)

I also created a UTF-8 table with UTF-8 columns and moved the data to the database for testing, but did not solve any problems Another attempt to repair is to replace:

return result.getString("old_text");

It pulls the text in the result set to:

return new String(result.getString("old_text").getBytes("utf8"),"utf8");

This gives me the same result as the previous statement

Can this character data loss be avoided when accessing MySQL using JDBC? If not, is there a way to process characters and restore correct characters for display? Two and three random character blocks instead of standard punctuation types break the user experience

edit

A little note, the data in the database is very good - characters exist and all characters are visible Access date thru phpMyAdmin returns data with correctly encoded characters The problem is between MySQL and Java, maybe JDBC I'm looking for a setting or a solution (this works because the ones I've tried don't work for me), which will prevent the loss of these character codes

Solution

After some research and reading, I found a solution to the problem I encountered I can't say why, but it seems to have been converting mediumblob to string type in Java

This is how I return text from the result:

if (result.next())
    return result.getString("old_text");
else
    return null;

I haven't done much about JDBC in the past, and I didn't realize that there was a blob class, so I changed the code to:

if (result.next()) {
    Blob blob = result.getBlob("old_text");
    InputStream is = blob.getBinaryStream();
    byte[] bytes = new byte[is.available()];
    is.read(bytes);
    is.close();

    return new String(bytes,"UTF-8");
}
else
    return null;

And the effect is very good

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>