E
Search Mechanics - Perfect ongoing registration with search engines.
HOME  :  WHITE PAPERS

Relevancy, Spam, Technology, Cloaking and Ranking : The Ethical Guide


Relevancy

The search engines referred to by this document work by sending an automated browser, known as a spider, around the Web to find pages. Examples of such search engines are AltaVista, Excite, Fast and Google. The search engine makes a copy of each page found and so builds a copy of the Web. When a human searches using a search engine, the search engine searches its copy of the Web and displays a list of results. This list of results is in order of relevancy, with the most relevant results first.

Relevancy is the search engine’s measure of how well a particular Web page matches a search. For example, a search for "car insurance" would list pages that were on the topic of car insurance above pages that were just about insurance, generally, or cars, generally, because the pages about car insurance would be more relevant. Some search engines quote relevancy as a percentage but many don’t quote it at all.

Each search engine uses a different algorithm to calculate relevancy. Most use a combination of on-the-page and off-the-page factors.

Examples of on-the-page factors are whether the keywords (the words searched for) appear in the page title or meta tags, and how early and often the keywords appear in the body copy of the page.

Examples of off-the-page factors are the number and quality of other Web pages that link to the page, or whether somebody has paid to have the page included in the search engine's index.

Spam

In a perfect world, each search engine would produce the most relevant set of results for its audience. This may not be the same set of results for all search engines as each search engine may tailor its algorithm to a different audience. Unfortunately the world is not perfect, and some webmasters attempt to trick search engines into listing their pages (instead of the most relevant pages) in response to a keyword search. They do this by feeding spam to the search engine spider. Spam causes problems for search engines because it does not allow them to deliver the most relevant results to their audiences. Their ability to calculate relevancy is distorted by the spam. Spam has caused many search engines to move heavily towards using off-the-page factors to calculate relevancy. Nevertheless, spam is still rife.

To help search engines deliver good results, do not feed spam to their spiders! How can you tell if what you are doing is spam, rather than just "search engine friendly"? What exactly is spam?

Unfortunately, there are no hard and fast guidelines on what is and is not spam. Some search engines, for example, will insist that you must feed the same content to their spiders as would be seen by all human visitors to your pages. However, if you already deliver different content from the same URL to different human visitors, then under these search engines' rules you must be delivering spam to them. This seems unreasonable.

Spammers, on the other hand, often insist that they deliver relevant content to search engine spiders. The problem with this argument is that it is up to the search engine, and nobody else, to determine relevance. The search engine, if it is to use on-the-page factors to calculate relevance, must use the page that is shown to humans otherwise what is the point of using on-the-page factors at all?

We have prepared this step-by-step guide to help you determine if you are delivering spam to search engine spiders:
  1. If the fact that search engines exist has no bearing on the way you write your Web pages, or have structured your Web site or your Web serving architecture, then you are not delivering spam.
  2. If you are paying careful attention to the keywords that your prospects are likely to be searching for whilst writing your title, meta tags and body copy, that is not spam – you simply know how to write good well structured documents.
  3. If you are trying to influence the appearance of your listing on a search engine (e.g. using a carefully crafted title, meta description, or opening sentence) then, as long as your intended listing accurately describes what visitors to the page will see, that is not spam. It’s only fair that if your pages are adding value to a search engine index, you have some right to control how you are represented by that search engine. Your efforts will also help the search engine’s users to determine whether to click through to your page. If, however, your intended listing does not accurately describe what visitors to the page will see, that is spam.
  4. If you are delivering content to a search engine spider where the spider is not owned or paid for by you; and that content is designed to influence the search engine's relevancy calculations; and that content is not designed to be viewed (or is actively prevented from being viewed) by a human visitor from a location typical of the search engine’s audience and using a browser of equivalent capabilities to the search engine spider, that is spam.
  5. If none of the above applies to you, then you are probably not delivering spam (unless we forgot something!).

Technology

There are many different technologies that can be used to deliver spam. The problem is that, in almost all cases, the same technology can also be used for perfectly valid reasons. The table below shows some of the technologies which can be used to deliver spam. It is not an exhaustive list. The third column of the table gives examples of why the technology may not always be delivering spam.

Technology Description Why it might not be spam
Invisible text
For example, white text on a white background, or text in a DHTML layer that is never displayed. Humans can’t see the text, search engine spiders can Text may overlay an image or table cell or advanced use of DHTML or JavaScript may be deployed. Not all search engine spiders (few, in fact) support all these technologies.
Tiny text
Text in a tiny font, often put at the bottom of the page following many line breaks so it can’t be seen. Detailed copyright or disclaimer messages or annotations may be displayed in this way.
Meta tags
<TITLE>, <META NAME="DESCRIPTION"> and <META NAME="KEYWORDS"> tags that can be placed in the <HEAD> section of a HTML document. The same title and meta tags have to be used for lots of different purposes – e.g. bookmarks, navigation stacks, site search engines, external search engines. The primary purpose may not be to satisfy external search engines.
Frames Putting different content in the NOFRAMES section of a frameset than is displayed by the frames that comprise the frameset. NOFRAMES content is read by search engines but isn’t displayed by most browsers – instead the frames content is. NOFRAMES can be used to deliver different yet entirely appropriate content to less capable browsers. Search engine spiders are equivalent to less capable browsers. This is a kind of poor relation to Agent-based delivery…
Agent-based delivery Delivers variable content according to the client platform capabilities.

Once this is done, the search engine spider has to be classified as a platform and will receive different content from some other platforms.
Content is often tailored to a client’s platform (e.g. text / WAP / TV / PC) to provide a better user experience. The search engine platform may be treated identically to some human clients, e.g. users of the Lynx browser.
IP Delivery Delivers variable content according to the location of the client, determined by the client IP address.

Once this is done, the search engine spider’s IP address has to be classified as representing a particular location and will receive different content from some other locations.
Content is often tailored to a client’s location (e.g. UK / USA / India / China) to provide a better user experience. The search engine’s location may be treated identically to some human clients, e.g. residents of the USA.

IP delivery may also be used to provide secure communication between two willing parties.
IP Cloaking

(a.k.a Stealth)
Builds and maintains, or buys from a third party, a database of IP addresses used by search engine spiders, then delivers "cloaked" content to those spiders designed to influence ranking.

Human visitors do not see the "cloaked" content. They see different content, often via redirection to another site.
This is always spam. Spam is content delivered specifically to a search engine spider with the intention of influencing relevancy calculations, and hiding the same content from human visitors to the page. That is what IP Cloaking is designed to do.

IP Cloaking is used to provide secure communication between a willing party – the cloaker – and an unwilling party – the search engine. If the search engine is willing it is not IP Cloaking but IP Delivery.

From the above table, we can see that it is possible to spam "accidentally", i.e. pages that do not contain spam may be interpreted as containing spam. If this occurs, those innocent pages, the site they are on, and even the IP address that the site is on (that may be shared by many other innocent sites) can be permanently barred from search engines. This is a major problem caused by spam. Because spam exists, the innocent can be penalised as well as the guilty. In fact, the innocent are more likely to be penalised because they don’t know they are breaking any rules. A lot of good content on those innocent sites will never be found.

IP Cloaking

The only technology with which it is not possible to spam "accidentally" is IP Cloaking. IP Cloaking always delivers spam and it is impossible to do it accidentally with this technology, since you cannot "accidentally" harvest search engine spider IP addresses.

However, it is possible for somebody from Corporation A who is not technically minded to commission Company B to do their Web positioning, without realising the reputational damage that may occur to Corporation A because Company B delivers spam using IP Cloaking. The real issue is not IP cloaking but the delivery of spam, but (using our reasonable definition of spam) IP Cloaking always delivers spam. What’s more, IP Cloaking is the only technology that can deliver spam without anybody from Corporation A (including technical staff) or anywhere else being able to check that the representation of Corporation A (itself, its clients, its associates and its competitors) that is being made by Company B is fair or even legal. This is a problem with IP Cloaking that goes far beyond the issue of spam.

Note: with our definition of spam, exactly the same content could be delivered with agent-based delivery, where it might not be spam, and IP Cloaking, where it definitely would be spam. The reason for this is that the use of IP Cloaking technology turns meaningful content into spam, since content that is not readable is not meaningful.

Ranking


What can you do if you are convinced that your content is the best, most relevant on the Web for certain keywords and you deserve a number one ranking, but search engines don’t seem to agree with you? For a start, don’t resort to spam or stealth! Instead:
  1. Write search engine friendly content. Ensure that each page is structured so it is obvious to a search engine that the page is relevant to a keyword. Ensure that your pages and site can be read by a search engine spider. Use the Prepare features of Search Mechanics to do this.
  2. Create a Web "centre of excellence" associated with your keywords. A "centre of excellence" is a balance of an authority (has lots of links to it from other sites and pages associated with the keyword) and a hub (has links to lots of other sites and pages associated with the keyword). Actively try to get other sites to link to you. A good place to start is by registering with major Web directories such as Yahoo, Looksmart and DMOZ. Use the Promote features of Search Mechanics to do this.
  3. Check that you are not being penalised unfairly for content that may be interpreted as spam, or for being on the same IP address as someone who has been barred from search engines for delivering spam. If possible, contact the search engines that are not listing you and check what is wrong.
  4. Let market forces decide. Market forces should mean that those search engines that produce the best, most relevant results will flourish, and those that don’t will ultimately die. The current prevalence of spam is preventing this from happening.

Finally, if you think you can calculate relevancy better than the current crop of search engines, don’t deliver spam to attempt to control their listings. The honest thing to do would be to start a search engine of your own!

Interested parties are invited to submit their comments to whitepapers@ebrandmanagement.com.

Written by Alan Perkins of SilverDisc Search Marketing.
Sign up for our newsletter to receive notification of more White Papers