Simple Python Web Scraper

I needed a simple html only scraper. (This doesn’t use js, won’t pull down data via AJAX). I found an example on another site, thetaranights.com, but it wasn’t exactly what I needed. It only pulled the data and printed it to screen. I added a list to loop through and auto saving by url name to a html file.

import mechanize  #pip install mechanize

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [("User-agent","Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]

sign_in = br.open("https://this.example.com/login")  #the login url

br.select_form(nr = 0) #accessing form by their index. Since we have only one form in this example, nr =0.
#br.select_form(name = "form name") Alternatively you may use this instead of the above line if your form has name attribute available.

br["email"] = "email or username" #the key "username" is the variable that takes the username/email value

br["password"] = "password"    #the key "password" is the variable that takes the password value

logged_in = br.submit()   #submitting the login credentials

logincheck = logged_in.read()  #reading the page body that is redirected after successful login

urls = ["https://this.example.com/some/page","https://this.example.com/some/page2"]

for url in urls:
	req = br.open(url).read()
	filename = url.split('/')[-1] + ".html"
	f = open(filename, 'w')
	f.write(req)
	f.close()

Which produces 2 files:
page.html
page2.html

How to call a python function by using a variable python

I needed to, in a programmatic way, determine the function name and then call the function.

Here’s an example use case. I have a bunch of functions called, [“function1”, “function2”, “function3”, “function4”, etc] Then I have a function that takes a number as a param. Then should return the proper function.

class my_awesome_class():
    def function1(self, number):
        # do something cool
        pass

    def function2(self, number):
        # do something cool
        pass

[...]

    def test_function(self, number, data):
        function_string = ''.join(['function',str(number)])
        try:
            t = getattr(my_awesome_class(), function_string)
            return t
        except AttributeError:
            print 'function not found "%s" (%s)' % (function_string, data)