GIDForums  

Go Back   GIDForums > Computer Programming Forums > Python Forum
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 02-Dec-2006, 14:15
vbnet2005 vbnet2005 is offline
New Member
 
Join Date: Dec 2006
Posts: 5
vbnet2005 is on a distinguished road

Writing A Basic Spam Filter


hi there. i have an small exercise to write a basic spam filter but im stuck on some stuff

Python Code:
#!/usr/bin/python

import getpass, imaplib, string
from StringIO import StringIO

#
#  getBody - strips of the header from an email message
#            Only the body of the message is returned.
#

def getBody(message):
    body=""
    seenStart = 0
    for line in StringIO(message).readlines():
        l = string.strip(line)
        if (seenStart == 1):
            body = body + "\n" + l
        elif l == "":
            seenStart = 1
    return body

#
#  getField - returns the field value "field" in the header
#             section of the message
#

def getField(message, field):
    for line in StringIO(message).readlines():
        l = string.strip(line)
        if (string.find(l, field) != -1):
            words = string.split(l, field)
            return words[1]
    return "unknown"


#
#  containsTag - returns TRUE if an html tag <'t' is found
#
            
def containsTag(l, t):
    return ((string.find(l, "<" + t) != -1) or
            (string.find(l, "</" + t) != -1))

#
#  isSpam - returns 1 if the message looks like spam
#           returns 0 if the message looks ok
#

def isSpam(message):
    total = 0
    htmlFound = 0
    for line in StringIO(getBody(message)).readlines():
        print line
   [b] # incomplete and at the moment everything is treated as
    # non spam[/b]
    return 0
        

#
# postEmail - sends message to the smtp port
#

def postEmail(message):
    print "message from", getField(message, "From:")
    print "subject is", getField(message, "Subject:")
    [b]# incomplete[/b]

#
#  handleEmail - tests whether the message is spam or not.
#                If spam then it adds it to a file
#                otherwise post it to the smtp port
#

def handleEmail(message):
    if (isSpam(message)):
        print "message is spam"
    else:
        print "message is good"
        postEmail(message)
    [b]# incomplete[/b]
    

#
#  collectEmail - retrieves the email from the imap server
#

def collectEmail():
    m = imaplib.IMAP4_SSL('url here')
    m.login(getpass.getuser(), getpass.getpass())
    m.select()
    typ, data = m.search(None, 'ALL')
    print data
    for num in string.split(data[0]):
        print num
        typ, data = m.fetch(num, '(RFC822)')
        handleEmail(data[0][1])
    m.logout()

collectEmail()

the methods i have to complete are listed below.

def isSpam(message):
def postEmail(message):
def handleEmail(message):

basically i have to check each line of the email and if html > 60% of the email then it is considerd as SPAM .

ive started some of the missing codes but i dont know how to finish them off. any help is apprechiated. thanks alot
Last edited by LuciWiz : 03-Dec-2006 at 04:30. Reason: Please insert your Python code between [py] & [/py] tags
  #2  
Old 02-Dec-2006, 16:05
crystalattice's Avatar
crystalattice crystalattice is offline
Aspiring author
 
Join Date: Apr 2004
Location: Japan (again)
Posts: 1,635
crystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nice

Re: Writing A Basic Spam Filter


It looks like you're almost there. You're already doing readlines() for each line in the body. All you have to do is call the function containsTag() while you parse each line and then increment htmlFound. You'll also need to keep track of the number of words in the body so you can divide that total by htmlFound; that will tell you if # of tags is >60%.

Do you need help on the actual parsing of the lines?
__________________
Start Programming with Python-A beginner's guide to programming and the Python language.
-------------
Common Sense v2.0-Striving to make the world a little bit smarter.
  #3  
Old 03-Dec-2006, 07:08
vbnet2005 vbnet2005 is offline
New Member
 
Join Date: Dec 2006
Posts: 5
vbnet2005 is on a distinguished road

Re: Writing A Basic Spam Filter


Quote:
Originally Posted by crystalattice
Do you need help on the actual parsing of the lines?

that would be great help. im still new to python and im just slightly confused now on what im doing since ive spent a while doing this as ive been using the documentation to do the code so far. thanks for your help
  #4  
Old 03-Dec-2006, 14:16
crystalattice's Avatar
crystalattice crystalattice is offline
Aspiring author
 
Join Date: Apr 2004
Location: Japan (again)
Posts: 1,635
crystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nice

Re: Writing A Basic Spam Filter


Here's an example of searching strings from one of my books:
Python Code:
# Fig. 13.5: fig13_05.py
# Searching strings for a substring.

# counting the occurrences of a substring
string1 = "Test1, test2, test3, test4, Test5, test6"

print '"test" occurs %d times in \n\t%s' % \
   ( string1.count( "test" ), string1 )
print '"test" occurs %d times after 18th character in \n\t%s' % \
   ( string1.count( "test", 18, len( string1 ) ), string1 )
print

# finding a substring in a string
string2 = "Odd or even"

print '"%s" contains "or" starting at index %d' % \
   ( string2, string2.find( "or" ) )

# find index of "even"
try:
   print '"even" index is', string2.index( "even" )
except ValueError:
   print '"even" does not occur in "%s"' % string2

if string2.startswith( "Odd" ):
   print '"%s" starts with "Odd"' % string2

if string2.endswith( "even" ):
   print '"%s" ends with "even"\n' % string2

# searching from end of string 
print 'Index from end of "test" in "%s" is %d' \
   % ( string1, string1.rfind( "test" ) )
print

# find rindex of "Test"
try:
   print 'First occurrence of "Test" from end at index', \
      string1.rindex( "Test" )
except ValueError:
   print '"Test" does not occur in "%s"' % string1

print

# replacing a substring
string3 = "One, one, one, one, one, one"

print "Original:", string3
print 'Replaced "one" with "two":', \
   string3.replace( "one", "two" )
print "Replaced 3 maximum:", string3.replace( "one", "two", 3 )

########################################################################## 
# (C) Copyright 2002 by Deitel & Associates, Inc. and Prentice Hall.     #
# All Rights Reserved.                                                   #
#                                                                        #
# DISCLAIMER: The authors and publisher of this book have used their     #
# best efforts in preparing the book. These efforts include the          #
# development, research, and testing of the theories and programs        #
# to determine their effectiveness. The authors and publisher make       #
# no warranty of any kind, expressed or implied, with regard to these    #
# programs or to the documentation contained in these books. The authors #
# and publisher shall not be liable in any event for incidental or       #
# consequential damages in connection with, or arising out of, the       #
# furnishing, performance, or use of these programs.                     #
##########################################################################

Here's an example of regular expressions, in case you need to use them:
Python Code:
# Fig. 13.7: fig13_07.py
# Simple regular-expression example.

import re

# list of strings to search and expressions used to search
testStrings = [ "Hello World", "Hello world!", "hello world" ]
expressions = [ "hello", "Hello", "world!" ]

# search every expression in every string
for string in testStrings:

   for expression in expressions:

      if re.search( expression, string ):
         print expression, "found in string", string
      else:
         print expression, "not found in string", string

   print         

########################################################################## 
# (C) Copyright 2002 by Deitel & Associates, Inc. and Prentice Hall.     #
# All Rights Reserved.                                                   #
#                                                                        #
# DISCLAIMER: The authors and publisher of this book have used their     #
# best efforts in preparing the book. These efforts include the          #
# development, research, and testing of the theories and programs        #
# to determine their effectiveness. The authors and publisher make       #
# no warranty of any kind, expressed or implied, with regard to these    #
# programs or to the documentation contained in these books. The authors #
# and publisher shall not be liable in any event for incidental or       #
# consequential damages in connection with, or arising out of, the       #
# furnishing, performance, or use of these programs.                     #
##########################################################################
__________________
Start Programming with Python-A beginner's guide to programming and the Python language.
-------------
Common Sense v2.0-Striving to make the world a little bit smarter.
  #5  
Old 03-Dec-2006, 15:20
vbnet2005 vbnet2005 is offline
New Member
 
Join Date: Dec 2006
Posts: 5
vbnet2005 is on a distinguished road

Re: Writing A Basic Spam Filter


okay, thanks.

so i need to use this to complete the def isSpam(message): function?
  #6  
Old 04-Dec-2006, 10:51
crystalattice's Avatar
crystalattice crystalattice is offline
Aspiring author
 
Join Date: Apr 2004
Location: Japan (again)
Posts: 1,635
crystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nice

Re: Writing A Basic Spam Filter


Yes. You want to do a readlines() from the file then do a substring search on each line, looking for the HTML tags.
__________________
Start Programming with Python-A beginner's guide to programming and the Python language.
-------------
Common Sense v2.0-Striving to make the world a little bit smarter.
  #7  
Old 07-Dec-2006, 05:43
vbnet2005 vbnet2005 is offline
New Member
 
Join Date: Dec 2006
Posts: 5
vbnet2005 is on a distinguished road

Re: Writing A Basic Spam Filter


hmm, um. im still not sure what im suppose to be doing :S

alll this programming gone to my head
  #8  
Old 07-Dec-2006, 10:52
crystalattice's Avatar
crystalattice crystalattice is offline
Aspiring author
 
Join Date: Apr 2004
Location: Japan (again)
Posts: 1,635
crystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nicecrystalattice is just really nice
Thumbs up

Re: Writing A Basic Spam Filter


It should go roughly like this:
  1. Open the file
  2. Get lines from file
  3. Process each line
    Python Code:
    file = open("foo.bar", "r")
    textLines = file.readlines()
    for line in textLines:
        # finding a substring in a string
        marker = textLines
    
        print '"%s" contains "<" starting at index %d' % \
           ( marker, marker.find( "<" ) )

This isn't the exact way to do it since I'm trying to merge two different code snippets from my book, but it should give you an idea of how to proceed.
__________________
Start Programming with Python-A beginner's guide to programming and the Python language.
-------------
Common Sense v2.0-Striving to make the world a little bit smarter.
 
 

Recent GIDBlogNot selected for officer school by crystalattice

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Median Filter Coding june_C21 C Programming Language 4 27-Feb-2006 08:57
[ANN] New script engine: Open Basic (Basic syntax) MKTMK Computer Programming Advertisements & Offers 0 01-Sep-2005 06:13
Google attempts to curb blog & guestbook spam Div Search Engine Optimization Forum 7 19-Feb-2005 20:11
How do I report spam? JdS Open Discussion Forum 26 09-Dec-2003 07:44

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 20:29.


vBulletin, Copyright © 2000 - 2010, Jelsoft Enterprises Ltd.