- Conceive, design and organize your site to be exactly what it is: a web site.
The Web is not print. While this may seem like an overtly obvious statement, designers, programmers and users trip up on this very issue every day. It's a common misconception, which branches off to notions like "a site author can control how a site looks to the pixel" and "a well-written web page will look exactly the same on all browsers." Let go of these ideas from the very start. Accessing a web site is a client-server interaction which varies in ways dependent upon several variables, not the least of which include connection speeds and client hardware, software and configuration.
The rest of these recommendations aim to help achieve this end.
- Validate your markup.
There are rules which specify how to create documents renderable by web clients. Follow them. Markup is either correctly written or it's not. If you're using pure CSS and XHTML or just plain ol' HTML, make sure your markup is correctly written by using a validator: software which checks markup for mistakes. This one and this one work just fine. If your markup is valid it has the best chance of rendering on the widest array of clients. If you have invalid markup, don't assume that just because your browser is forgiving that everyone else's is.
- Avoid frames and splash pages.
Frames on a web site are not ideal for lots of reasons. Frames prevent the user from being able to bookmark individual documents on a site. They present related information in separate documents, which keeps search engines from associating related information. They require that a browser make more than one document request per document, which increases client-server connections and eats server CPU cycles, network bandwidth and users' time. Frames are also, coincidentally, being deprecated.
When I say "splash page", I am referring to a welcome page with one link on it to "enter" the site. Splash pages are unnecessary and meaningless. The first time a user goes to your site, it might seem like a nice effect. But every time after that, a splash page just gets in the way. For a search engine it makes the bulk of your site another needless step into the hierarchy. Don't make users, robots and your server work harder than necessary to deliver the content on your site.
- Optimize your site to be as small a download as possible.
Making a user wait for your site to download is the best way to get him or her to go elsewhere. While creating your site, remember that more than half the web surfers in the US in February of 2003 used a 56k or less dial-up connection. Entire books and web sites are dedicated to the subject of how to optimize a site, so I won't even attempt to cover the subject here.
- Make your site URLs as short, descriptive, static, technology-inspecific and permanent as possible.
Remember that your site's navigation URLs can be totally independent of the physical file system on your server. What I mean is, if you have a
file on your server, the URL to the about section does not have to be (and should not be)
Decide on your site URL structure before you begin creating the documents which will present the information. Make them as short and descriptive of the content as possible, and avoid any indicators of the technology behind them. Avoid file extensions (like .php, .htm, .html, .asp) and don't expose query string parameters. Google specifically recommends using "static" (querystring-less) links to every document on your site. For example, if you have a section which describes the staff of a company, don't use
to point to the staff page. Use
instead. Then use
for Joe Smith's page, instead of
Once you've determined the URLs for your site, use server-side technology to make them work.
Finally, once you create a URL which points to a section on your site, stick to it. If you follow these suggestions from the start and then re-organize your site, your URLs don't have to change. However, if you absolutely must change a URL, make sure the original URL redirects or points to the new section, so that cached search engine referrals and bookmarks still work.
Images on a web document, while meaningful to human eyes, are actually just a collection of 1's and 0's to search engine indexing software and non-graphical browsers. Make sure all of the information on your site exists in a text format. For example, if your site has a masthead which is an image that contains the title of your site in it, make sure you set the alt attribute to describe the content of the image. You should even ensure that the most relevant information on a page appears first in your markup, and make other elements (navigation, etc) follow.
Short of installing a text browser like Lynx, a good test to see what your site looks like to an indexing robot or a non-graphical browser is to turn off images in your browser. If you're using Internet Explorer, to do this, in the Tools menu choose Options, and on the Advanced tab go to Multimedia, and uncheck "Show pictures." In Mozilla, go to Tools, Image Manager, and choose "Block Images from this Site." Then view your site, and make sure that without images, all information is adequately represented. This same concept applies to all objects (like Flash movies and Java applets.)
- Actively direct search engine indexing robots.
Search engine robots want instructions on how to correctly index your site, so give 'em to 'em. Read up on search engine guidelines and features (like caching site text and image search). Determine how and what areas of your site should be indexed. The use of meta tags and the robots.txt file are the most common methods of directing robots to your content. Use this robots.txt validator to ensure your robots.txt file is correct.
For example, Google has been Scribbling.net's biggest referrer since day one, but I noticed that often users from Google would land on pages that weren't the most relevant to their search terms. So I checked out how the Googlebot indexes sites. I wanted robots to index only the permanent locations of posts, but not the front page (as it constantly changes to show the latest post). I don't want any of the images or text cached and presented out of context. I also have a page or two that I don't want anyone to find via a search at all. So here's my robots.txt file which lays out some of these instructions. Additionally, the robots meta tags on my front page say "noindex,follow,noarchive", which effectively tells robots to follow links but not to index or archive the front page. The same tag on any post page says "index,follow,noarchive" which tells robots to index the content on that page but not to archive it.
This way, if the day the Googlebot indexes my site is the day I have a post on the front page about a dog, with a link to the dog post's permanent URL, the Googlebot will index only the permanent location of the dog post. Four days later, when my front page has a post on it about a cat, and someone searches for site:scribbling.net dog, the only pages returned should be the dog post (and any associated documents) and not the front page.
- Serve "friendly" error messages.
The most unhelpful, dead-end message you can get from a web server is:
404 Not Found The web server cannot find the file or script you asked for. Please check the URL to ensure that the path is correct. Please contact the server's administrator if this problem persists.
A usable web site does a lot better than that. Hook up friendly error messages which include navigation to documents that do exist or don't throw an error, a search box and/or a contact email address. Get creative.
- Don't "click here."