GIDForums  

Go Back   GIDForums > Computer Programming Forums > Python Forum
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 10-Aug-2008, 11:21
trihaitran trihaitran is offline
New Member
 
Join Date: Aug 2008
Posts: 5
trihaitran is on a distinguished road

Help with cookies/authentication


Hi I am trying to pull some data from a Web site: schoolfinder.com

The issue is that I want to use the advanced search feature which requires logging into the Web site. I have a username and password, however I want to connect programmatically from Python. I have done data capture from the Web before so the only new thing here to me is the authentication stuff. I need cookies as this page describes: schoolfinder.com/login/login.asp

I already know how to enter POST/GET data to a request, but how do I deal with cookies/authentication? I have read a few articles without success:

urllib2:
voidspace.org.uk/python/articles/urllib2.shtml#id6

urllib2 Cookbook:
personalpages.tds.net/~kent37/kk/00010.html

basic authentication:
voidspace.org.uk/python/articles/authentication.shtml#id19

cookielib:
voidspace.org.uk/python/articles/cookielib.shtml

Is there some other resource I am missing? Is it possible that someone could setup a basic script that would allow me to connect to schoolfinder.com with my username and password? My username is "greenman", password is "greenman". All I need to know is how to access pages as if I logged in by Web browser.

Thank you very much.
  #2  
Old 12-Aug-2008, 00:29
Howard_L Howard_L is offline
Regular Member
 
Join Date: Apr 2007
Location: Maryland/PA, USA
Posts: 845
Howard_L is a jewel in the roughHoward_L is a jewel in the roughHoward_L is a jewel in the rough

Re: Help with cookies/authentication


Have you gotten anywhere with this yet?
  #3  
Old 30-Aug-2008, 15:02
trihaitran trihaitran is offline
New Member
 
Join Date: Aug 2008
Posts: 5
trihaitran is on a distinguished road

Re: Help with cookies/authentication


I was able to solve this problem after a lot more research and tinkering. Here is the solution for anyone interested.

CPP / C++ / C Code:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-

import cookielib
import urllib
import urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
resp = opener.open('http://schoolfinder.com') # save a cookie

theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
body={'usr':'greenman','pwd':'greenman'}
txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration


try:
    req = urllib2.Request(theurl, txdata, txheaders) # create a request object
    handle = opener.open(req) # and open it to return a handle on the url
    HTMLSource = handle.read()
    f = file('test.html', 'w')
    f.write(HTMLSource)
    f.close()

except IOError, e:
    print 'We failed to open "%s".' % theurl
    if hasattr(e, 'code'):
        print 'We failed with error code - %s.' % e.code
    elif hasattr(e, 'reason'):
        print "The error object has the following 'reason' attribute :", e.reason
        print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
        sys.exit()

else:
    print 'Here are the headers of the page :'
    print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)
 
 

Recent GIDBlogProblems with the Navy (Enlisted) by crystalattice

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 01:55.


vBulletin, Copyright © 2000 - 2010, Jelsoft Enterprises Ltd.