![]() |
|
#1
|
||||
|
||||
Building a search engine and handling HTML entities.I am trying to build a little search engine of my own for my next site. I am currently working on how to store the data that my little crawler will extract off my own web pages for all my present and future sites.
The question I am asking myself and I hope you will help me decide, is this: what am I supposed to do with the HTML entities like &, ", , ™, > or <... etc? Do I remove them from the content? Or do I translate them before saving the content to the search index? __________________
J de Silva Learning Journal | GIDForums™ | GIDNetwork™ | GIDWebhosts™ | GIDSearch™ |
|
#2
|
||||
|
||||
|
If you are going to do cached pages, you will need to leave them.
Can you explain some more why you would not want them? __________________
Mr. Bob's Web Design - Tirelessly looking for ways to enhance the customer base of your business. |
|
#3
|
||||
|
||||
|
No, no cached pages...
__________________
J de Silva Learning Journal | GIDForums™ | GIDNetwork™ | GIDWebhosts™ | GIDSearch™ |
Recent GIDBlog
Last Week of IA Training by crystalattice
| Thread Tools | Search this Thread |
| Rate This Thread | |
|
|
Network Sites: GIDNetwork · GIDWebHosts · GIDSearch · Learning Journal by J de Silva, The