![]() |
|
#1
|
||||
|
||||
Stripping SESSIONS for spiders?!Hmm...I just used a simulated spider at http://www.searchengineworld.com/cgi-bin/sim_spider.cgi it shows you what a serach engine would see if it hit your site. My site links show up like:
http://www.kid-stop.com/index.php?page=privpol&PHPSESSID=af aef98c202f7bb9332d82e71e84aa41 How can I get rid of the session ID stuff? It doesn't show that way on normal browsers like IE, netscape, etc... |
||||
|
#2
|
||||
|
||||
Sample: A simple session id removal codeIt really depends on your code and how the session data affects the rest of the page... but basically the idea is this:
PHP Code:
__________________
J de Silva Learning Journal | GIDForums™ | GIDNetwork™ | GIDWebhosts™ | GIDSearch™ |
|
#3
|
||||
|
||||
|
It didnt' occur to me until you try something like that and then I realised that my meta tag description wasn't appearing. I had left the " "'s out and the spider didn't see it and also I have two sets of meta tags for the same reason that it completely bypassed the one's with no gap and went to the ones with spaces. I'm sure that this isn't the most important thing to be worrying about but every little helps eh?
As for the session ID's I did a quick scout and didn't know if any of this was useful at all in fact http://www.webmasterworld.com/forum3...php+session+id sorry |
|
#4
|
||||
|
||||
|
Let me explain what 'removing session ids' mean in this context. Actually, session ids, by themselves, are quite okay for progressive search bots / spiders like googlebot and slurp, to index. The fact that most sessions, in other words, 'visits', generate NEW session ids IS the problem.
The only 'smart' thing Google (and some others) can do (at this moment) is to stop itself from crawling throughout the rest of your website. I doubt very much that Google or any other decent search engine would ban a site for this. The solution is either: remove them (session ids) altogether, OR put some code to generate one constant session id for all bots. (this is too much work + the links that appear on the SERPs will contain the sessionid, something you'd probably like to avoid). __________________
J de Silva Learning Journal | GIDForums™ | GIDNetwork™ | GIDWebhosts™ | GIDSearch™ |
|
#5
|
|||
|
|||
|
And the session id doesn't show up on browsers address bars as they accept cookies which hold the SID. But if the browser doesn't accept cookies, like bots, PHP detects this and automatically appends the SID to the end of the URL, to keep the session running. It just looks nasty for bots.
GF |
|
#6
|
||||
|
||||
|
Yeah, I understand why the bots aren't crawling the whole site, what I need to know is how to remove them? I had thought of the idea Jay posted but there are like 500+ bot out there and I don't know a way to get the user agents for all of them., not to mention it would be one long if statement to check them all...
|
|
#7
|
||||
|
||||
|
Elm, you don't need all 500 bots! Just the ones that belong to the major SEs like :
PHP Code:
Then instead of a long line of statement checking the USER_AGENT, use a function: PHP Code:
You might have to change this a bit since this is just code I cut and pasted from my class file. __________________
J de Silva Learning Journal | GIDForums™ | GIDNetwork™ | GIDWebhosts™ | GIDSearch™ |
|
#8
|
|||
|
|||
|
Quote:
Hi JDS, I'm building a e-commerce site, which uses mod_rewrite and some other SEO techniques. I was wondering if the bot list you posted is up-to-date and if not, would you be able to post the current list please? I imagine your detection script would do something like the following: PHP Code:
Also, in your experience, are there any other techniques one might use when developing, to ensure the pages are search engine friendly? For instance, DevShed seem to have 2 versions of each forum post, one of which the search engines are more likely to pick up. Please see forums.devshed.com. Note the "read with formatting" option. Thanks for your time, Steve (fellow PHP developer struggling to ensure bots are kept happy) |
|
#9
|
||||
|
||||
|
Hello stevie_t_uk,
As you may already know there a million bots running around the Net these days... but if you ask me, you only need to "feed" just 3 or 4. The only ones I would focus any real effort to please would be the following:
I won't comment on what another forum or website is doing with regards to SE optimising their web pages. I personally would NEVER serve duplicate content to any search engine spider/bot. Take a look at my robots.txt, you will see that I deliberately block them access to showthread.php, which is simply another version of this very page. __________________
J de Silva Learning Journal | GIDForums™ | GIDNetwork™ | GIDWebhosts™ | GIDSearch™ |
|
#10
|
||||
|
||||
|
You might also want to feed the AdSense bot, if you're running Google AdSense ads on your site:
Mediapartners-Google |
Recent GIDBlog
Toyota - 2009 May Promotion by Nihal
| Thread Tools | Search this Thread |
| Rate This Thread | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PHP sessions problem | andy | MySQL / PHP Forum | 19 | 18-Jan-2007 12:34 |
| Googlebot and sessions | JdS | Search Engine Optimization Forum | 0 | 26-Nov-2002 04:53 |
Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The