Parsing encoded content in Rss Feed

I was transfering one of my wordpress blogs to blogger using Google API. When parsing wp’s rss feed, I got a CData type:

<description><![CDATA[..........
<content:encoded><![CDATA[.............

I want to parse the content item, but with the keyword ‘encoded’ in it’s name, I almost failed to parse it when struggling with ‘serialize’ problem. The correct way using python is:

# get feeds from my site.
feed = feedparser.parse( my_wp_rss_url )

for item in feed[‘items’]:
post_title = item[‘title’]
post_description = item[‘description’]
post_content = item[‘content’].pop(0).value

…….

When running this python app in dos or linux environment I got some other coding related problem, gbk converting or ascii charset stuff. Switched to run this in IDLE, a python shell program, no problem at all.

Advertisements

3 thoughts on “Parsing encoded content in Rss Feed

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s