Parse and generate podcast rss feed using Ruby

I am a podcaster, audio only. The first thing after I wake up every morning is to sync my iPod/iShuffle. Recently, I noticed one of my favorite channels failed in update. Later I figured out the reason is the podcast owner put too many items in their rss feed. Even it was only 66 items, but due to the source is from Singapore, the slow network also kill the podcast sync.

I need to find a way to re-generate their podcast rss feed then.

Start thinking of php, then python, finally I chose Ruby. A few more resources than the other two. The example to parse and generate the regular rss feed using ruby is not hard to find. But, there is only one demo of how to generate itunes feed, which is using a rss lib created by a Japanese guy, 須藤功平(Kouhei Sutou).

My re-generate code looks like this:

require ‘rss/itunes’
require ‘open-uri’

# url of original rss feed
source = “http://podcast.overseakids.com/moviecafe/index.xml”
content = “” # raw content of rss feed will be loaded here

# read rss feed
open(source) do |s| content = s.read end
rss = RSS::Parser.parse(content, false)

# start re-genereat the podcast rss feed
feed = RSS::Rss.new(“2.0”)

feed.channel = rss.channel

# only keep the top 5 items, slice the rest of them
feed.channel.items.slice!( 5 ,feed.channel.items.size – 1 )

puts feed.to_s

When I tried to run this code on my DH share host, it failed due to no rss lib installed. I don’t have root rights, then I found another way to install my own Ruby on DH.

One lesson I learn from last night is, after download lastest ruby bin package using wget and the following installation:

./configure prefix=[YOUR_OWN_RUBY_PREFIX]
make
make install

I moved the ruby folder into another place I think it’s more appropriate. That extra action almost drove me nuts, I got ‘Unable to find rbconfig.rb’ problem later when I tried to run any ‘setup.rb’ code.

I looked into ruby/lib folder, there are some file with hard code path info in it. So I decided re-do my installation procedure. The problem went away.

To tell which ruby you are running, run: which ruby. What a natural command!

Advertisements

2 thoughts on “Parse and generate podcast rss feed using Ruby

  1. class AudioItem
       attr_accessor :page_url, :mp3_id, :mp3_url, :title, 
       :description, :pub_date, :author, :mp3_file_size,
       :image_url
    end
    
    class Collection
      attr_accessor :title ,:link, :description, :author, :audio_items
    end
    
    class FeedMaster
      
       # Read the feed from the podcast this app created.
       # to do the comparision for checking new items from source.
       def self.read_podcast_feed(feed_content)
        require 'rss/itunes'
        require 'open-uri'
    
        #~ source = "http://podcast.overseakids.com/moviecafe/index.xml" # url or local file
        #source = "http://www.podcast.sg/rsi_chinese/Movie_Reviews/movie_feed.xml" # url or local file
        #~ content = "" # raw content of rss feed will be loaded here
        #~ open(feed_url) do |s| content = s.read end
        RSS::Parser.parse(feed_content, false)
    
      end
    
    
      # generate rss feed based on collection data
      # then merge / concat with the existing data in old_rss (hash)
      def self.generate_podcast_feed(collection, old_rss)
        require 'rss/2.0'
        require 'rss/itunes'
         
          feed = RSS::Rss.new("2.0")
          feed.encoding = 'utf-8'
        
          channel = RSS::Rss::Channel.new
         
          category = RSS::ITunesChannelModel::ITunesCategory.new("Arts")
          category.itunes_categories <<   \
                          RSS::ITunesChannelModel::ITunesCategory.new("Literature")
          channel.itunes_categories << category
         
          channel.title = collection.title
          channel.description = collection.description
          channel.link = collection.link #collection.link  # mandatory!!!!
          # channel.language = 'cn' #collection.language
          # channel.copyright = '2007' #collection.copyright
          # channel.lastBuildDate = Time.now
          # the above uses a method I built on the Audio model that finds 
          # the last modified file and makes that the build date for the 
          # whole podcast channel
    
          # below is your "album art"
          channel.image = RSS::Rss::Channel::Image.new
          channel.image.url = @channel_image_url #collection.image.url
          channel.image.title = 'image title' #collection.image.title
          channel.image.link =  channel.image.url #collection.image.link
    
          # channel.itunes_author = collection.author
          # channel.itunes_owner = RSS::ITunesChannelModel::ITunesOwner.new
          # channel.itunes_owner.itunes_name = collection.author
          # channel.itunes_owner.itunes_email= 'no@email.com' #collection.email
    
          # channel.itunes_keywords = %w(Common Misspellings of Key Words)
    
          # channel.itunes_subtitle = 'subtitle' #collection.itunes_subtitle             
          # channel.itunes_summary = 'summary' #collection.itunes_summary
    
           # below is what iTunes uses for your "album art", different from RSS standard
           #~ channel.itunes_image = channel.image.url #collection.tunes_image
           #~ channel.itunes_explicit = 'Clean' #collection.itunes_explicit
          # above could also be "Yes" or "Clean"
    
        unless collection.audio_items.nil?
          collection.audio_items.each do |r|
    
            item = RSS::Rss::Channel::Item.new
            item.title = r.title
            item.link = r.mp3_url
            #~ item.link =  r.image_url
            # item.itunes_keywords = %w(Keywords For This Particular Audio Clip)
            item.guid = RSS::Rss::Channel::Item::Guid.new
            item.guid.content = r.image_url
            item.guid.isPermaLink = true
           item.pubDate = r.pub_date.strftime("%a, %d %b %Y %H:%M:%S %z")
    #        item.pubDate = r.pub_date.rfc822 
    
            item.description = r.description
    #        item.itunes_summary = "<img src=\""+r.image_url+"\" />"
    # I use guid to save image url instead.
    #        item.itunes_summary = r.image_url
            # item.itunes_subtitle = "audio.nice_title"
            # item.itunes_explicit = '' #r.itunes_explicit
            #item.image = r.image_url
            item.itunes_author = r.author
           
            # TODO can add duration once we can compute that somehow
           
            item.enclosure = \
              RSS::Rss::Channel::Item::Enclosure.new(item.link, r.mp3_file_size, 'audio/mpeg')     
            channel.items << item
             
           end
        end
         
        if old_rss.nil?
          puts 'old_rss is nil'
        else
          channel.items.concat( old_rss.items  ).uniq!
          
          # re-sort
          channel.items.sort!{|a,b|b.pubDate <=> a.pubDate}
        end 
    
        # feed.channel.items.slice!( 5 ,feed.channel.items.size - 1 )
        feed.channel = channel
        
        #~ puts feed
    
        feed
         
       end
       
    
    
    end
    

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s