GIDForums  

Go Back   GIDForums > Computer Programming Forums > MySQL / PHP Forum
User Name
Password
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

 
 
Thread Tools Search this Thread Rate Thread
  #1  
Old 28-Jan-2003, 17:43
Elmseeker's Avatar
Elmseeker Elmseeker is offline
Awaiting Email Confirmation
 
Join Date: Jan 2003
Posts: 87
Elmseeker is on a distinguished road

Stripping SESSIONS for spiders?!


Hmm...I just used a simulated spider at www.searchengineworld.com it shows you what a serach engine would see if it hit your site. My site links show up like:

www.kid-stop.com
aef98c202f7bb9332d82e71e84aa41

How can I get rid of the session ID stuff? It doesn't show that way on normal browsers like IE, netscape, etc...
  #2  
Old 28-Jan-2003, 23:29
JdS's Avatar
JdS JdS is offline
Senior Member
 
Join Date: Aug 2001
Location: KUL, Malaysia
Posts: 3,371
JdS will become famous soon enough

Sample: A simple session id removal code


It really depends on your code and how the session data affects the rest of the page... but basically the idea is this:

PHP Code:

<?php
// where you start the session, edit it to:

// you can use a function in the line below if you put all
// your 'spiders' in an array and if there are too many...
if( !stristr($_SERVER['HTTP_USER_AGENT'], 'googlebot') || !stristr($_SERVER['HTTP_USER_AGENT'], 'slurp/cat') ):
  // regular traffic, so start session
  session_name('s');
  session_start();
endif;
?>

  #3  
Old 28-Jan-2003, 23:40
jrobbio's Avatar
jrobbio jrobbio is offline
Regular Member
 
Join Date: Jan 2003
Location: Loughborough, England
Posts: 840
jrobbio will become famous soon enough
It didnt' occur to me until you try something like that and then I realised that my meta tag description wasn't appearing. I had left the " "'s out and the spider didn't see it and also I have two sets of meta tags for the same reason that it completely bypassed the one's with no gap and went to the ones with spaces. I'm sure that this isn't the most important thing to be worrying about but every little helps eh?
As for the session ID's I did a quick scout and didn't know if any of this was useful at all in fact
http://www.webmasterworld.com/forum3...php+session+id
sorry
  #4  
Old 29-Jan-2003, 01:31
JdS's Avatar
JdS JdS is offline
Senior Member
 
Join Date: Aug 2001
Location: KUL, Malaysia
Posts: 3,371
JdS will become famous soon enough
Let me explain what 'removing session ids' mean in this context. Actually, session ids, by themselves, are quite okay for progressive search bots / spiders like googlebot and slurp, to index. The fact that most sessions, in other words, 'visits', generate NEW session ids IS the problem.

The only 'smart' thing Google (and some others) can do (at this moment) is to stop itself from crawling throughout the rest of your website. I doubt very much that Google or any other decent search engine would ban a site for this.

The solution is either:

remove them (session ids) altogether,

OR

put some code to generate one constant session id for all bots. (this is too much work + the links that appear on the SERPs will contain the sessionid, something you'd probably like to avoid).
  #5  
Old 29-Jan-2003, 04:22
Garth Farley Garth Farley is offline
Invalid Email Address
 
Join Date: May 2002
Location: Ireland
Posts: 638
Garth Farley is a jewel in the roughGarth Farley is a jewel in the roughGarth Farley is a jewel in the rough
And the session id doesn't show up on browsers address bars as they accept cookies which hold the SID. But if the browser doesn't accept cookies, like bots, PHP detects this and automatically appends the SID to the end of the URL, to keep the session running. It just looks nasty for bots.

GF
  #6  
Old 29-Jan-2003, 10:04
Elmseeker's Avatar
Elmseeker Elmseeker is offline
Awaiting Email Confirmation
 
Join Date: Jan 2003
Posts: 87
Elmseeker is on a distinguished road
Yeah, I understand why the bots aren't crawling the whole site, what I need to know is how to remove them? I had thought of the idea Jay posted but there are like 500+ bot out there and I don't know a way to get the user agents for all of them., not to mention it would be one long if statement to check them all...
  #7  
Old 29-Jan-2003, 15:18
JdS's Avatar
JdS JdS is offline
Senior Member
 
Join Date: Aug 2001
Location: KUL, Malaysia
Posts: 3,371
JdS will become famous soon enough
Elm, you don't need all 500 bots! Just the ones that belong to the major SEs like :

PHP Code:

<?php
$this->bots = array(
  'googlebot',   // Google
  'ask jeeves',  // Ask Jeeves / Teoma
  'slurp',       // Inktomi
  'fast',        // Alltheweb / Fast
  'scooter',     // AltaVista
  'zyborg',      // Looksmart
  'msnbot'       // MSN Search
);
?>


Then instead of a long line of statement checking the USER_AGENT, use a function:

PHP Code:

<?php
function _isBot( $ua ) // $ua = $_SERVER['HTTP_USER_AGENT']
{
  foreach( $this->bots as $bot )
  {
    if( stristr($ua, $bot) )
    {
      return( TRUE );
    }
  }
  return( FALSE );
}
?>


You might have to change this a bit since this is just code I cut and pasted from my class file.
  #8  
Old 02-Dec-2004, 03:03
stevie_t_uk stevie_t_uk is offline
New Member
 
Join Date: Dec 2004
Posts: 3
stevie_t_uk is on a distinguished road
Quote:
Originally Posted by JdS
Elm, you don't need all 500 bots! Just the ones that belong to the major SEs like :

PHP Code:

<?php
$this->bots = array(
  'googlebot',   // Google
  'ask jeeves',  // Ask Jeeves / Teoma
  'slurp',       // Inktomi
  'fast',        // Alltheweb / Fast
  'scooter',     // AltaVista
  'zyborg',      // Looksmart
  'msnbot'       // MSN Search
);
?>


Then instead of a long line of statement checking the USER_AGENT, use a function:

PHP Code:

<?php
function _isBot( $ua ) // $ua = $_SERVER['HTTP_USER_AGENT']
{
  foreach( $this->bots as $bot )
  {
    if( stristr($ua, $bot) )
    {
      return( TRUE );
    }
  }
  return( FALSE );
}
?>


You might have to change this a bit since this is just code I cut and pasted from my class file.

Hi JDS,

I'm building a e-commerce site, which uses mod_rewrite and some other SEO techniques. I was wondering if the bot list you posted is up-to-date and if not, would you be able to post the current list please?

I imagine your detection script would do something like the following:

PHP Code:

if(!_isBot($_SERVER['HTTP_USER_AGENT'])){

    session_start();

} 



Also, in your experience, are there any other techniques one might use when developing, to ensure the pages are search engine friendly? For instance, DevShed seem to have 2 versions of each forum post, one of which the search engines are more likely to pick up. Please see forums.devshed.com. Note the "read with formatting" option.

Thanks for your time,

Steve (fellow PHP developer struggling to ensure bots are kept happy)
  #9  
Old 02-Dec-2004, 16:51
JdS's Avatar
JdS JdS is offline
Senior Member
 
Join Date: Aug 2001
Location: KUL, Malaysia
Posts: 3,371
JdS will become famous soon enough
Hello stevie_t_uk,

As you may already know there a million bots running around the Net these days... but if you ask me, you only need to "feed" just 3 or 4.

The only ones I would focus any real effort to please would be the following:
  • googlebot
  • slurp // yahoo's
  • msnbot // for the future
  • ask // ask jeeves (but expect very little or no traffic)

I won't comment on what another forum or website is doing with regards to SE optimising their web pages. I personally would NEVER serve duplicate content to any search engine spider/bot. Take a look at my robots.txt, you will see that I deliberately block them access to showthread.php, which is simply another version of this very page.
  #10  
Old 05-Dec-2004, 08:12
JasonMichael's Avatar
JasonMichael JasonMichael is offline
Awaiting Email Confirmation
 
Join Date: Jul 2004
Posts: 135
JasonMichael has a spectacular aura about
You might also want to feed the AdSense bot, if you're running Google AdSense ads on your site:

Mediapartners-Google
 
 

Recent GIDBlogWelcome to Baghdad by crystalattice

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PHP sessions problem andy MySQL / PHP Forum 19 18-Jan-2007 11:34
Googlebot and sessions JdS Search Engine Optimization Forum 0 26-Nov-2002 03:53

Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The

All times are GMT -6. The time now is 01:10.


vBulletin, Copyright © 2000 - 2008, Jelsoft Enterprises Ltd.