UTF-8 GBK lookup in ruby

GBK and utf-8 are two most popular charsets in Chinese websites.  UTF-8 suppose to be the standard, but people use GBK for some reason, one of the reason is the windows default.

Conversion string between those two is not too hard, but my situation is that I need to find a lookup between them. For example, “探” in UTF code is 63a2, while in GBK code is CCBD. Usually, the display formats are \u63a2 for utf-8, and %CC%BD for GBK.

Code example:


#get hex
hex = "0x"+s.slice(/\u(.{4})/, 1).to_s

#utf-8 decode
utf = [hex.to_i(16)].pack('U*')

gbk = Iconv.conv( 'gbk','UTF-8',  (utf))

gbk_encode = URI.encode(gbk)

ref:http://www.herongyang.com/gb2312/

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s