ITC594 E-Systems Infrastructure Development: Exercise 23. Searching mechanisms

1. An Internet spider is a program designed to "crawl" over the World Wide Web, the portion of the Internet most familiar to general users, and retrieve locations of Web pages. It is sometimes referred to as a webcrawler. Many search engines use webcrawlers to obtain links, which are filed away in an index. When a user asks for information on a particular subject, the search engine pulls up pages retrieved by the Internet spider. Without spiders, the vast richness of the Web would be all but inaccessible to most users, rather as the Library of Congress would be if the books were not organized.

Some search engines are human-based, meaning that they rely on humans to submit links and other information, which the search engine categorizes, catalogues, and indexes. Most search engines today use a combination of human and crawler input. Crawler-based engines send out spiders, which are actually computer programs that have sometimes been likened to viruses because of their ability to move between, and insert themselves into, other areas in cyberspace.

Spiders visit Web sites, record the information there, read the meta tags that identify a site according to subjects, and follow the site's links to other pages. Because of the many links between pages, a spider can start at almost any point on the Web and keep moving. Eventually it returns the data gathered on its journey to the search engine's central depository of information, where it is organized and stored. Periodically the crawler will revisit the sites to check for changed information, but until it does so, the material in the search engine's index remains the same. It is for this reason that a search at any time may yield "dead" Web pages, or ones that can no longer be found.

No two search engines are exactly the same, the reason being (among other things) a difference in the choice of algorithm by which the indices are searched. Algorithms can be adjusted to scan for the frequency of certain keywords, and even to circumvent attempts at keyword stuffing or "spamdexing," the insertion of irrelevant search terms intended simply to draw traffic to a site.

2. Meta-Search Engine - appears like an aggregator for receiving your input (search requests) and forward the request to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Metasearch engines enable users to enter search criteria once and access several search engines simultaneously. Metasearch engines operate on the premise that the Web is too large for any one search engine to index it all and that more comprehensive search results can be obtained by combining the results from several search engines. This also may save the user from having to use multiple search engines separately.

3. I know this from my personal experience in SEO (without paying anyone to list to top spots)

Tried "Little Scientist" or "BOYTOYS", my listings are at the top.

Metatags keywords - by creating lots of relevant keywords within the tags
Use lots of texts in the content body, sometimes links to outside websites, and also have external websites (eg. Blogs, forums, other community sites such as linkedin, youtube, wikipedia to link back). I have also tried building google and SEO friendly CMS like DRUPAL which is extremely friendly to search engines and have them linked to the sites.

In the past, both Yahoo and Google allows a customer to enter the URL to get it listed.
The name of the page is also important, avoid using redirect clause as spiders do not like them.

ITC594 E-Systems Infrastructure Development

Monday, May 18, 2009

Exercise 23. Searching mechanisms

No comments:

Post a Comment

About Me

Blog Archive

Followers

Other Fellow Students' Blog