GIDForums  

Go Back   GIDForums > Computer Programming Forums > Python Forum
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 04-Sep-2008, 22:29
trihaitran trihaitran is offline
New Member
 
Join Date: Aug 2008
Posts: 5
trihaitran is on a distinguished road

Using mechanize to do website authentication


I am trying to write a web scraper and am having trouble accessing pages that require authentication. I am attempting to utilise the mechanize library, but am having difficulties. The site I am trying to login is http://www.princetonreview.com/Login3.aspx?uidbadge=

user: bugmenot2008@yahoo.com
pass: letmeinalready

Previously I did something similar to another site: schoolfinder.com. Here is my code for that:

Python Code:
import cookielib
import urllib
import urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
resp = opener.open('http://schoolfinder.com') # save a cookie

theurl = 'http://schoolfinder.com/login/login.asp' # an example url that sets a cookie, try different urls here and see the cookie collection you can make !
body={'usr':'greenman','pwd':'greenman'}
txdata = urllib.urlencode(body) # if we were making a POST type request, we could encode a dictionary of values here - using urllib.urlencode
txheaders =  {'User-agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'} # fake a user agent, some websites (like google) don't like automated exploration


try:
    req = urllib2.Request(theurl, txdata, txheaders) # create a request object
    handle = opener.open(req) # and open it to return a handle on the url
    HTMLSource = handle.read()
    f = file('test.html', 'w')
    f.write(HTMLSource)
    f.close()

except IOError, e:
    print 'We failed to open "%s".' % theurl
    if hasattr(e, 'code'):
        print 'We failed with error code - %s.' % e.code
    elif hasattr(e, 'reason'):
        print "The error object has the following 'reason' attribute :", e.reason
        print "This usually means the server doesn't exist, is down, or we don't have an internet connection."
        sys.exit()

else:
    print 'Here are the headers of the page :'
    print handle.info() # handle.read() returns the page, handle.geturl() returns the true url of the page fetched (in case urlopen has followed any redirects, which it sometimes does)

This method does not work on the Princeton Review site however. Interestingly I cannot even get mechanize to access the schoolfinder.com site. Here is the code I am using:

Python Code:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
import mechanize

theurl = 'http://www.princetonreview.com/Login3.aspx?uidbadge='
mech = mechanize.Browser()
mech.open(theurl)

mech.select_form(nr=0)
mech["ctl00$MasterMainBodyContent$txtUsername"] = "bugmenot2008@yahoo.com"
mech["ctl00$MasterMainBodyContent$txtPassword"] = "letmeinalready"
results = mech.submit().read()

f = file('test.html', 'w')
f.write(results) # write to a test file
f.close()

This code is so short and I just cannot figure out what I am doing wrong. What is incorrect about this? Thank you in advance.
Last edited by admin : 05-Sep-2008 at 08:17. Reason: Please insert your example Python codes between [PY] and [/PY] tags
  #2  
Old 05-Sep-2008, 08:13
crystalattice's Avatar
crystalattice crystalattice is offline
Aspiring author
 
Join Date: Apr 2004
Location: Japan (again)
Posts: 1,635
crystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nice

Re: Using mechanize to do website authentication


I wasn't even aware of the mechanize module until this post, hence I can't offer you any particular advice.

However, I can point you towards http://wwwsearch.sourceforge.net/mechanize/, in case you haven't looked at it yet. Unfortunately, it really doesn't say to much but it does offer some examples that may help you.

Sorr I can't be more help.
__________________
Start Programming with Python-A beginner's guide to programming and the Python language.
-------------
Common Sense v2.0-Striving to make the world a little bit smarter.
  #3  
Old 05-Sep-2008, 13:30
trihaitran trihaitran is offline
New Member
 
Join Date: Aug 2008
Posts: 5
trihaitran is on a distinguished road

Re: Using mechanize to do website authentication


I did see that page and unfortunately the documentation was not very helpful to me. I was just wondering if anyone else had experience with the mechanize library.
 
 

Recent GIDBlogInstall Adobe Flash - Without Administrator Rights by LocalTech

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Here Are 12 sure ways to launch your website for success jackbdirty Search Engine Optimization Forum 7 30-May-2008 14:26
Help appreciated in writing up a bid request for website development Scott_Jensen Webmaster / Web Designing Advertisements & Offers 0 27-Feb-2007 16:08
Techtas.com - Get a free website with loads of features now! lizard Free Web Hosting 0 16-Nov-2006 14:47

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 05:20.


vBulletin, Copyright © 2000 - 2010, Jelsoft Enterprises Ltd.