Jump to content
xisto Community
Sign in to follow this  
FirefoxRocks

Converting Html 4.01 Sites To Xhtml 1.0

Recommended Posts

Converting from HTML 4.01 Transitional (Loose) to XHTML 1.0 Transitional Many of you who write HTML are probably writing HTML 4.01 Transitional. This includes all of the deprecated elements and attributes such as <font> and bgcolor=””. This tutorial will cover how to convert from HTML 4.01 Transitional to XHTML 1.0 Transitional. The main reason here is for cleaner code. This tutorial does not cover why XHTML is better or worse than HTML. You can Google that for more information if you wish. 1. Add a document type declaration. This is the first step to being valid. If your document has an existing document type declaration, it should look like this: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://forums.xisto.com/no_longer_exists/; It should be the very first line in your document. If you have that line, replace it with: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://forums.xisto.com/no_longer_exists/; If you did not have any of those lines before then please add it to the top. 2. The <html> tag needs the xmlns attribute. You may have something like <html> in your document. It needs to become:

<html xmlns=”http://www.w3.org/1999/xhtml/”>
It is also a good idea to add the language of the web page to the <html> tag. Therefore, you can use: <html xmlns=”http://forums.xisto.com/no_longer_exists/lang=”en-CA” xml:lang=”en-CA”> en-CA is for Canadian English. Use the appropriate language code for the web page in the lang and xml:lang attributes. You can find a list of language codes here. 3. The optional HTML structure tags have to be added. If you do not have <html>, <head>, <title> and <body>, they need to be added at the appropriate places in the document. Also they need to be closed properly. The last line of your HTML document should always be </body></html>. 4. Some tags are now self-closing. In HTML, some tags are considered “empty” and did not need closing tags. In XHTML, all tags need to be properly closed. These tags are self-closing in XHTML: <area>, <base>, <basefont>, <br>, <hr>, <input>, <img>, <link>, <meta> Self closing means adding a slash inside the tag to close it off. Examples of this: <hr> becomes <hr /> <link rel=”stylesheet” type=”text/css” href=”styles.css”> becomes <link rel=”stylesheet” type=”text/css” href=”styles.css” /> <meta name=”keywords” value=”xyz”> becomes <meta name=”keywords” value=”xyz” /> 5. Alternate text in images is now mandatory. Some of you may still specify images without any alternate text. This is bad practice as screen readers will not know what the image is, also there will be nothing (or an X) shown when the image cannot be loaded (e.g. broken URIs, users turning off images, etc.). This is easy to fix: <img src=”dogs.jpg” /> becomes <img src=”dogs.jpg” alt=”Dogs playing in my yard” /> If for any reason the image does not need any alternate text (e.g. decorative images), you can just use alt=””. However, decorative images should really be specified in CSS, but that’s another tutorial. 6. Tags and attributes must be lower case. Older HTML books and tutorials may have specified tags in uppercase letters. In XHTML, all tags must be written in lowercase letters. <TABLE BORDER=”0” RULES=”rows”> because <table border=”0” rules=”rows”> 7. Attribute values must be quoted. In HTML, you didn’t need to quote attributes that had values containing only letters, numbers, dots (periods) or hyphens. In XHTML, all attribute values need to be quoted. <th scope=row> becomes <th scope=”row”> It doesn’t matter if you use single quotes or double quotes but if the value contains a certain type of quote then you must use the opposite type to quote the value. In this example, you can only use double quotes because the value contains an apostrophe: <p title=”Don’t touch that”> Similarly in this example, you can only use single quotes because the value contains a quotation mark: <p title=’Exponential “growth” pattern’> 8. Minimized attributes need to be rewritten in the non-minimized form. These attributes can be minimized in HTML: compact, checked, readonly, disabled, selected, defer, ismap, nohref, noshade, nowrap , multiple, noresize To write proper XHTML, you need to do this: <input type=”checkbox” checked /> becomes <input type=”checkbox” checked=”checked” /> <select multiple> becomes <select multiple=”multiple”> <script type=”text/javascript” defer> becomes <script type=”text/javascript” defer=”defer”> There is nothing wrong with using the full version of a minimized attribute in HTML 4.01. 9. Encode special characters. These characters should be encoded in HTML: < > & and “ If you use special characters within the text of your HTML, they may not display properly because they have special purposes in HTML. They should be written as: < < > > & & “ " 10. Don’t use embed, blink or marquee. When you “embed” Flash content, you may use the <embed> tag. This is wrong, you should use the object tag, like this: <object type=”application/x-shockwave-flash” data=”someMovie.swf” width=”640” height=”480”> <param name=”movie” value=”someMovie.swf” /> </object> <blink> has been replaced by CSS, it was never really a valid tag. <marquee> is something that I think should never be used, but if you wish, use a JavaScript equivalent of <marquee>. There is a marquee module in CSS 3, but at this time that hasn’t been released yet nor is it supported in any browser. 11. Paragraphs, list items and table cells need to be closed. When you were writing HTML, it was perfectly fine to write this: <p>Paragraph 1. <p>This is a new paragraph Or this: <ul> <li>Item 1 <li>Item 2 </ul> Or even this: <tr> <td>Cell 1 <td>Cell 2 </tr> But in XHTML, all tags need to be closed. Therefore: <p>Paragraph 1.</p> <p>This is a new paragraph</p> <ul> <li>Item 1</li> <li>Item 2</li> </ul> And this: <tr> <td>Cell 1</td> <td>Cell 2</td> </tr> 12. Validate. Validate your page at http://validator.w3.org/. If you have any other errors, either post here or learn how to fix them. The more valid pages you produce, the better you’ll get at it. Also, with writing valid XHTML 1.0, it is easier to see the code especially if you have an editor which supports syntax highlighting (e.g. Notepad++, HTML-Kit, gedit, and other bigger development software IDEs such as NetBeans) I may write future tutorials on how to convert from Transitional to Strict or from XHTML 1 to HTML 5 but I hope this helps clean up your pages a little.

Share this post


Link to post
Share on other sites

Nice read, I just wanted to add my two cents, that there are quite good tools which dies these things automatically, for example I just found this and used it several times, after I been bored by doing it with hands:

http://www.it.uc3m.es/jaf/html2xhtml/

Of course it saves time, but you still need to go and fix some things with hands to make it look the same, as sometimes the tags aren't closed there you would want them to be closed, but usually it works well. ;]

Share this post


Link to post
Share on other sites

FireFox users can download an addon for FireFox called "HTML Tidy"This will check if your website is validated according the the W3C standards.If your website is validated, this will help getting your website into the search engines such as google. Awesome tool for web developers. It can help you pick up simple mistakes or typos.

Share this post


Link to post
Share on other sites

One of the things not mentioned in this is serving the correct mime-type for XHTML.

Here's a PHP script that I wrote to serve the correct mime-type, but only if the browser supports it. It also serves the right mime-type to web bots, which a lot of the scripts on the internet do not cater for, so this would definitely help you pass W3C validation.

<?phpdefine('CHARSET', 'UTF-8');if(!defined(DATE_RFC2822)) {	define('DATE_RFC2822', 'D, d M Y H:i:s O');}$isXHTML = false;$content_type = 'text/html; charset=' . CHARSET;if(!isset($_SERVER['HTTP_ACCEPT']) || strpos($_SERVER['HTTP_ACCEPT'], 'xhtml')) {	$isXHTML = true;	$content_type = 'application/xhtml+xml; charset=' . CHARSET;	header('Content-Type: ' . $content_type);}header('Expires: ' . gmdate(DATE_RFC2822, gmmktime(0, 0, 0, 1, 1, 1970)));header('Last-Modified: ' . gmdate(DATE_RFC2822, getlastmod()));header('ETag: "' . dechex(getmyinode()) . '-' . dechex(filesize($_SERVER['SCRIPT_FILENAME'])) . '-' . dechex(getlastmod()) . '"');if($isXHTML) {	echo '<' . '?xml version="1.0" encoding="' . CHARSET . '"?' . '>' . "\n";}?>

There are bits in this code that are not needed like sending the headers for Expires, Last-Modified or ETag, but I thought I would leave them in here just if anyone wanted them. I also echo the XML preprocessing line since you had to do it this way because IE6 would be thrown into Quirks mode if it was there. So, it's only served to browsers that support XHTML.


Cheers,


MC

Share this post


Link to post
Share on other sites

FireFox users can download an addon for FireFox called "HTML Tidy"
This will check if your website is validated according the the W3C standards.

If your website is validated, this will help getting your website into the search engines such as google. Awesome tool for web developers. It can help you pick up simple mistakes or typos.

Yes I definitely would recommend HTML Tidy, as it is an add-on that is still actively in development and can check accessibility as well (to the point where machines can automatically do it). Recently they have added HTML5 experimental support, which is good as we are moving more from HTML4.01/XHTML 1.0 sites to HTML5.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.