Sean O'Donnells Weblog
The sad news that Yahoo plans to shut down del.icio.us reached me this week (although theres still hope). I use del.icio.us pretty much every day and was a little traumatized upon hearing this. Once I had finished wailing and gnashing my teeth I set out looking for somewhere to go.
There are many bookmarking sites/services out there, but I fear change, and pinboard.in seemed like the closest thing to a plain replacement. It even supports the same API as del.icio.us. Theres a small charge for signing up, but no recurring fee, so I broke out the credit card and joined up.
The next step was to figure out how to migrate my bookmarks. del.icio.us provides a export to html feature in its settings area, but a quick look at the export revealed some data was missing (mostly extended descriptions). Rabid googling revealed a lesser known XML export mechanism. To use it visit https://api.del.icio.us/v1/posts/all , enter your username and password and save the resulting XML file.
Now to get my bookmarks into pinboard.in. I broke out my trusty text editor and battered together the script below which works just fine, a few hours later all my bookmarks are in pinboard.in, their bookmarklets are installed in my browser, and I'm loving their read later features. Sean is a happy geek again.
You can download my migration script. To use it :
python delmigrate.py backup.xml username password
Heres the source for the curious.
from xml.dom import minidom import sys import urllib import urllib2 import time user = sys.argv[2] password = sys.argv[3] endpoint = "https://api.pinboard.in" url = "/v1/posts/add?" #open the xml file to import from and parse it f = open(sys.argv[1], "r") doc = minidom.parse(f).documentElement #keep count of how many urls have been imported urlcount = 0 count = 0 ellength = len(doc.childNodes) failcount = 0 while count < ellength: e = doc.childNodes[count] if e.nodeType == e.ELEMENT_NODE: print "import url %s" % urlcount #get the attributes from the xml href = e.getAttribute("href") description = e.getAttribute("description") extended = e.getAttribute("extended") tags = e.getAttribute("tag") dt = e.getAttribute("time") rargs = dict(url=href, description=description, extended=extended, tags=tags, dt=dt) shared = e.getAttribute("shared") if shared.strip() == 'no': rargs['shared'] = 'no' #convert them to unicode rargs = dict([k, v.encode('utf-8')] for k, v in rargs.items()) print rargs #build the request to send #set up http auth for pinboard.in #doing this for every request may seem wasteful, but urllib2 #seems to forget the auth details after a half dozen requests # if you dont password_manager = urllib2.HTTPPasswordMgrWithDefaultRealm() password_manager.add_password(None, endpoint, user, password) auth_handler = urllib2.HTTPBasicAuthHandler(password_manager) opener = urllib2.build_opener(auth_handler) urllib2.install_opener(opener) request = urllib2.Request(endpoint + url + urllib.urlencode(rargs)) #set the user agent request.add_header('User-Agent','SeansDeliciousMigrater') try: r = opener.open(request) #send the request and read the response response = minidom.parse(r).documentElement.getAttribute("code") except Exception, e: response = str(e) #if we get an invalid response, abort, proabbly throttled if response !="done": failcount += 1 print "Failure: Invalid response: %s" % response if failcount > 4: print "Aborting: Invalid response %s" break else: print "waiting for 30 seconds and retrying" time.sleep(30) else: failcount = 0 count += 1 #put in a delay between requests to reduce odds of throttling time.sleep(1) urlcount += 1 else: count += 1 print "%s urls imported" % urlcount
Hi Tim, take a look at the output from the html backup vs the XML backup, not as much information is preserved and carried accross.
Ah, cool - that answers that :)
Aside from the fun in using python to do this, why not simply follow the instructions at http://pinboard.in/howto#import ? No scripting needed! :-)