Running cron job with custom gem lib on dreamhost

Problem:

Installed parse-ruby-client gem on dreamhost VPS.

Ruby script runs OK in console mode (use logged in)

Crontab jobs doesn’t work, complains ‘parse-ruby-client’ not found.

Reason:

gem installed in user ~/.gem folder, not searchable by system process like crontab.

Solution:

Add those line into ~/.bash_profile
export GEM_HOME="$HOME/.gems"
export GEM_PATH="/usr/ruby/ruby/gems/1.8:$GEM_HOME"

Add this to ~/.bashrc
if [ -f ~/.bash_profile ]; then
.  ~/.bash_profile
fi

Those two should be good enough, but just in case. Add this before your shell.
. ~/.bashrc
/urs/bin/ruby $HOME/my_ruby_script.rb

Advertisements

Revisit Date.parse, time zone, dst, and sqlite issues

I thought Date.parse() was very easy, until some time zone and DST (daylight saving time) issues came to me today.

First problem, missing time zone means UTC

DateTime.parse(“2012-04-19 12:12:12”) is default to UTC, I need find out the correct time zone. OK, MDT for now. But, what if DST finished?

That means I can’t hard code MDT in date string.

Need to find out a way to check if date_to_parse is in DST period or not.

Assuming the machine in which the code is running on is at the same time zone as the date_to_parse.

        date_s = "2012-04-19 12:12:12"
        date = DateTime.parse("#{date_s}")
        time_zone = Time.local(date.year, date.month, date.day).isdst ? "MDT" : "MST"
        puts DateTime.parse("#{date_s} #{time_zone}")

OK, it works very well. I don’t have to worry about to change the time zone back when Winter is coming.

Second issue, date in sqlite3.

Sqlite3 doesn’ t have a datetime internal date type. It convert it to string or integer.

So when saving date into sqlite3, right way should be either convert date to integer (unix epoch),

Time.parse(date_s).to_i

or convert it to string WITH TIME ZONE.

date.strftime("%Y-%m-%d %H:%M:%S %Z")

Third one, comparing date in sqlite3

Remember no date_diff in sqlite3, date is either string (not compare friendly) or integer.


select date, strftime('%s',date) as orig_date,strftime('%s','#{occurrence.date.strftime("%Y-%m-%d %H:%M:%S %Z")}') as new_date from occurrence
where abs(orig_date - new_date) < 3600 * 10

xml2json in Ruby

I am so addict to json recently and can not stand for passing xml around services, time to switch from xml to json in my Ruby code, google told me this post is the best.

As simple as:

puts Hash.from_xml(response).to_json

Chances are you might have mixed version of activesupport installed on your server and dev machine. I decided to remove the higher version from my dev machine, but RubyMine doesn’t support GUI mode gem uninstall. What a shock!

Had to back to my old buddy NetBeans to do the GUI gem uninstall.

Got this free RubyMine license during some promotion last month. Loving it and badly need some improvments:

  1. spec template is not nice as the one in NetBeans, maybe I can customize it. Not a big deal.
  2. gem management is a joke. NetBeans is way more professional, while recently NetBeans kept getting no-response error.
  3. load path management is not eady to find. Also, will use to it someday.

Killing features:

  1. Be able to run selected spec/story, never found this feature in NetBeans.
  2. Same shortcut series from intelliJ. Reuse your ReSharper memories.
  3. Output panel charset default o UTF8, Woohoo! Have Being looking for this for years! Also this can switch to many other charsets.

Character encoding in Ruby

About character encoding, why do we need encoding

Classic character encoding, (ANSIor ascii) only supports a few standard English characters, for non-English chars, encoding is necessary, the purpose is kind of expanding ANSI/ascii to use more than one byte to represent non-ASNI characters, like copyright sign.

Different encoding standard

UTF-8 supposed to be standard encoding, for Chinese characters GBK/GB2312 is still popularly using because it’s efficiency. UTF-8 will use 1-6 bytes to represent a char, most Chinese characters need 3 bytes storage, while GBK only take 2 bytes.

Given a bit stream like ‘111000111000’,  GBK will treat it as 11|10|00|11|10|00, while UTF-8 will treat it as 111|000|111|000,  knowing the right string encoding is a must to avoid messy display.

Unicode

When developers represent the actual character encoding in code, usually they can add a prefix U/u with the hex code. This is what unicode look like.

For example:  “中文”  and  “\\u4e2d\\u6587” are same thing while the latter is just using unicode representation.

Conversion

Ruby 1.9 does support unicode to utf-8 conversion, by simply calling to switch between

Iconv.iconv("utf-8","unicode",escaped)

in Ruby 1.8, there are a few solutions.

Solution 1) Using JSON library,

escaped = "\\u4e2d\\u6587"
 JSON.parse( %Q{["#{escaped}"]} )[0].should == "中文"

Solution 2) Manually convert

   escaped = "\\u4e2d\\u6587"
   unicode_utf8(escaped).should == "中文"

   def unicode_utf8(unicode_string)
    unicode_string.gsub(/\\u\w{4}/) do |s|
      str = s.sub(/\\u/, "").hex.to_s(2)
      if str.length < 8
        CGI.unescape(str.to_i(2).to_s(16).insert(0, "%"))
      else
        arr = str.reverse.scan(/\w{0,6}/).reverse.select{|a| a != ""}.map{|b| b.reverse}
        hex = lambda do |s|
          (arr.first == s ? "1" * arr.length + "0" * (8 - arr.length - s.length) + s : "10" + s).to_i(2).to_s(16).insert(0, "%")
        end
        CGI.unescape(arr.map(&hex).join)
      end
    end

Encoding in JSON

JSON doesn’t have a HEAD section so no where we can set charset meta, using unicode is recommended, otherwise client won’t know how to display. In JSON for Ruby library, this can be done by just turning on ascii_only option.

    json_string = JSON.fast_generate(@sut,
      :ascii_only => true
      )

The other way (not using JSON) to get unicode given utf-8 in RUBY 1.8?

p "\\u"+@sut.title.unpack("U*").map{|c|"%04x" %c}.join("\\u")

Complete code demo:

  it "should convert different encoding" do
    @sut.title = "中文"
    unicoded_title = "\\u4e2d\\u6587"

    utf8_to_unicode(@sut.title).should == unicoded_title

    json_string = JSON.fast_generate(@sut,
      :ascii_only => true
      )

    JSON.parse(json_string)['title'].should == @sut.title

    JSON.parse( %Q{["#{unicoded_title}"]} )[0].should == @sut.title

    unicode_to_utf8(@sut.title).should == @sut.title
  end

   def unicode_to_utf8(unicode_string)
    unicode_string.gsub(/\\u\w{4}/) do |s|
      str = s.sub(/\\u/, "").hex.to_s(2)
      if str.length < 8
        CGI.unescape(str.to_i(2).to_s(16).insert(0, "%"))
      else
        arr = str.reverse.scan(/\w{0,6}/).reverse.select{|a| a != ""}.map{|b| b.reverse}
        hex = lambda do |s|
          (arr.first == s ? "1" * arr.length + "0" * (8 - arr.length - s.length) + s : "10" + s).to_i(2).to_s(16).insert(0, "%")
        end
        CGI.unescape(arr.map(&hex).join)
      end
    end
  end

  def utf8_to_unicode(string) # :nodoc:
      '\\u'+string.unpack("U*").map{|c|"%04x" %c}.join('\\u')
  end

About GBK
When getting GBK encoded webpage in Ruby using net/http, sometimes it just mess up all the characters. It happens to cUrl as well.
Switch to wget to get page to file, then parsing file is OK.
Don’t know why wget is better in dealing with different encoding.

Run eRuby on shared host

For small jobs like tiny webservices, I don’t want bother Ruby On Rails. Instead I used to pick php to build a quick restful web service. Actually we call also use ruby as a cgi platform to do the same job. as described in book.

One thing different for shared host user is, we can’t modify httpd file directly, instead, should use .htaccess as dreamhost wiki said.

Another problem is, how to set those environment variables, e.g. RUBYOPT, GEM_HOME, to load developer installed gem package?

Some google results said it’s impossible, at least if you try to set it code, and it only works for sub process, as mentioned in this one and this one.

Here is another post for how to achieve this by tweeting cgi, a bit of messy, but it works. Still don’t how to do it in cgi, (might need to change dispatch.cgi somewhere) but for eruby, create a shell wrapper for eruby.cgi to pre-set environment variables is quite easy.

 

 

Update: running into gem not load in eruby issue on a new VPS. couldn’t figure out why, just did a simple gem update. problem solved.

gem/rubygems version before update was 1.3.7

Maybe should also run a gem update –system in case.

UTF-8 GBK lookup in ruby

GBK and utf-8 are two most popular charsets in Chinese websites.  UTF-8 suppose to be the standard, but people use GBK for some reason, one of the reason is the windows default.

Conversion string between those two is not too hard, but my situation is that I need to find a lookup between them. For example, “探” in UTF code is 63a2, while in GBK code is CCBD. Usually, the display formats are \u63a2 for utf-8, and %CC%BD for GBK.

Code example:


#get hex
hex = "0x"+s.slice(/\u(.{4})/, 1).to_s

#utf-8 decode
utf = [hex.to_i(16)].pack('U*')

gbk = Iconv.conv( 'gbk','UTF-8',  (utf))

gbk_encode = URI.encode(gbk)

ref:http://www.herongyang.com/gb2312/

ruby can not load file — ?

I kept getting this error in my ruby env on Dreamhost, open this post will be the final answer to it.

To install rubygems, after done ruby install,

  1. wget rubygems from http://rubyforge.org/frs/?group_id=126,
  2. ruby setup.rb –prefix=$HOME
  3. add export RUBYOPT=rubygems to $HOME/.bash_profile or .bashrc on your preference, you wanna chain them together anyway.
  4. add . ~/bashrc to ruby shell script if it’s in crontab. (This is new to me after migrated from ruby 1.8.5 to 1.8.7, I might forgot something simple during migration. According to this post, this is the standard way in crontab.)