Jump to content
xisto Community
iGuest

Converting HTML over to XHTML Crossing over to the darkside

Recommended Posts

Allow for alterations

 

Well, I've just had to convert another HTML 4.01 Transitional website over to XHTML (eXtensible HyperText Markup Language) 1.0 Transitional, I will later on convert over to XHTML 1.0 Strict as soon as I write the CSS (cascading stylesheet) file for it but it had to be a quick update and removing all presentational elements and placing them in a CSS file is not quicker than just altering some tags. So this is where I got the idea to write a How to Convert HTML over to XHTML.

 

Let's begin

 

I'm having trouble knowing where to start so let's write us a valid HTML 4.01 Transitional page with most elements that require alterations (purposely made mistakes) and then I can explain how we can fix it up so it will be a valid XHTML 1.0 Transitional page.

 

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN""http://forums.xisto.com/no_longer_exists/;'>http://forums.xisto.com/no_longer_exists/;  <title>HTML Basic</title>  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">  <base href="http://localhost">'>http://localhost">  <link rel="stylesheet" type="text/css" href="/style.css"></head><body bgcolor="#FFFFFF" link="#003399">  <hr>  <form method="post" action="/cgi-bin/script.cgi" name="Script">    <table border=0 cellspacing=0 cellpadding=2 align="center" width=    "95%" summary="Script Entry Form" height="44">      <tr>        <td height="44" colspan="2"><img src="script.png"         width="142" height="28"></td>      </tr>      <tr>        <td width="66%">Fill in the script entries.  After you submit your        entry, you will be returned. <br> The * mean required fields.</td>                <td width="34%" align="right"> </td>      </tr>            <tr>        <td colspan="2">          <table border="0" cellspacing="1" cellpadding="4" width="95%"          summary="Time for Input">            <tr>              <td bgcolor="#EFEFEF" width="179">Input* :</td>                            <td bgcolor="#EFEFEF" width="460"><input type="text" name=              "inputtext" size="40" maxlength="40"></td>            </tr>          </table>        </td>      </tr>    </table>  </form></body></HTML>

So here's our base HTML file, so how do we go about altering it so that it's a valid XHTML 1.0 Transitional file. Let's start from the top. The first line which is recommended by W3C should be the XML processing instruction line so let's add that.

<?xml version="1.0" encoding="iso-8859-1"?>

Next we will have to update our Document Type to reflect XHTML 1.0 Transitional. So that means we need to alter it

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://forums.xisto.com/no_longer_exists/;'>http://forums.xisto.com/no_longer_exists/;

This is the order it should go in, for HTML it use to be doctype comes first, but for XHTML the processing line is first, then doctype.

 

Next we alter the <HTML> tag, because XHTML is based on the combination of XML and HTML, the rules really reflect the combination of them both. XML requires that there has to be one root element for a document. In XHTML, all tags should be enclosed within the <HTML> tag, since this is the root element for the document.

 

Now to alter <HTML>

<html xmlns="http://forums.xisto.com/no_longer_exists/;'>http://forums.xisto.com/no_longer_exists/; lang="EN">

xmlns is the attribute for XML namespace. We associate XHTML documets with it. The attribute lang is for the language of this document.

 

All XHTML tag elements and attributes should be in lower case. That means <HTML> should be rewritten as <html> and same with the end tag </HTML> should be </html> so now fix that.

 

Next we will fix up empty tags, in XHTML empty tags should be ended with />

 

The empty tags in our HTML that needs fixing up are:

 

<meta> tags to <meta />

<base> tag to <base />

<link> tags to <link />

<hr> tags to <hr />

<img> tags to <img />

<br> tags to <br />

<input> tags to <input />

 

We should have been finished with it, but I left in some other problems.

 

Here's some more rules:

 

XHTML attribute names should also be in lowercase as well as all attribute values should be inside quote marks. The first <table> tag, I left out the quote marks for some of the values, this is still valid HTML but not valid XHTML as XHTML states we should enclose all values in quotes, so do that now.

 

Another problem, with the first <table> tag, XHTML don't not support the attribute height within the <table> tag, so we will cut it from there and add it onto our next <td> tag, since that supports the height attribute.

 

One more problem, It is mandatory (absolutely needed) that we have the alt attribute for any image we have on our site, this is to help those who use text-based browsers, or other browsers that help our disability comrades to enjoy our sites too. So now fix our img tag to have an alt attribute, I will be calling it Script Logo, another thing, which is not a problem but should be considered is having a summary for every table we make, again to help others enjoy our site better. It's much appreciated if we do this.

 

Last but not least, the name attribute is being deprecated (slowly wiped out) and it's replacement is now going to be the attribute id, what we do now is we search for whole words with exact case "name" and replace it with "id", that was our last step.

 

Now our site should be fixed up and what we now have should look like this

 

<?xml version="1.0" encoding="iso-8859-1"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN""http://forums.xisto.com/no_longer_exists/ xmlns="http://forums.xisto.com/no_longer_exists/;'>http://forums.xisto.com/no_longer_exists/; lang="EN"><head>  <title>HTML Converted to XHTML</title>  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />  <base href="http://localhost" />  <link rel="stylesheet" type="text/css" href="/style.css" /></head><body bgcolor="#FFFFFF" link="#003399">  <hr />  <form method="post" action="/cgi-bin/script.cgi" id="Script">    <table border="0" cellspacing="0" cellpadding="2" align="center" width=    "95%" summary="Script Entry Form">      <tr>        <td height="44" colspan="2"><img src="script.png" alt=        "Script Logo" width="142" height="28" /></td>      </tr>      <tr>        <td width="66%">Fill in the script entries.  After you submit your        entry, you will be returned. <br /> The * mean required fields.</td>                <td width="34%" align="right"> </td>      </tr>            <tr>        <td colspan="2">          <table border="0" cellspacing="1" cellpadding="4" width="95%"          summary="Time for Input">            <tr>              <td bgcolor="#EFEFEF" width="179">Input* :</td>                            <td bgcolor="#EFEFEF" width="460"><input type="text" id=              "inputtext" size="40" maxlength="40" /></td>            </tr>          </table>        </td>      </tr>    </table>  </form></body></html>

Both of these HTML and XHTML should validate as valid Transitional documents using the validator over at W3C, you can check it for yourself.

 

Well that's it for me, just remember all the keys and rules and you'll have no problem with the transition.

 

 

Cheers, MC

Share this post


Link to post
Share on other sites

or if you have DreamWeaver you can just go to File | Convert and choose XHTML... I think

<{POST_SNAPBACK}>


overture,

 

I can confirm DreamWeaver MX converts to XHTML, and it works near perfect except it added a PHP script to add the line for the processing line, which shouldn't happen on an HTML extension file, it still leaves the name attribute in, but they haven't fully removed the name attribute from XHTML so it's suggested to use id and name attribute together, but I'm trying to force it's removal by not leaving it in and hoping all browsers now support the id attribute.

 

At least now you can get an understanding of the difference XHTML has over HTML.

 

 

Cheers, MC

Share this post


Link to post
Share on other sites

ahh ok i did not know that Dreamweaver doesn't remove the Name attribute, thanks for telling me, i rarely use Dreamweaver to do anything. oh and i already know XHTML :) :)

Share this post


Link to post
Share on other sites

I have a quesion... What makes xhtml better than html? i'm not getting it... I'm so lost lol... Is xhtml more flexible? I've never capitalized my html in things like <html></html>... does that impact anything?

Share this post


Link to post
Share on other sites

I have a quesion... What makes xhtml better than html? i'm not getting it... I'm so lost lol... Is xhtml more flexible? I've never capitalized my html in things like <html></html>... does that impact anything?

<{POST_SNAPBACK}>


The answer to that is YOU. It's just what W3C (World Wide Web Consortium) recommend, it will eventually replace HTML so it would be wise too. Whatever you are comfortable with, you choose what to do, XHTML helps developers by keeping their code tidy with better syntaxing and removing a lot of elements that are just repeated tags that do exactly the same thing. It's also quite strict to make sure you all stay in line, it follows those rules of XML.

 

What I'm saying is you don't have to learn another web language, if you know HTML 4.01, then you can easily convert over, it's just like HTML but cleaner and easier to use in some circumstances.

 

Capitalising tags in HTML 4.01 is no problem, it's still considered valid, this however is not allowed in XHTML, you can view many different sites and they can combine a mixture of tags uppercase/lowercase, poorly indented and written, etc and this is why W3C as well as many other developers felt the need to make it suitable so everyone conforms to the same standards, that way developers can work with each other without being sloppy and knowing that the sites will be well-formed.

 

You can look at XHTML as the newer version of HTML if you like, I do believe HTML will be dropped for XHTML so it can be seen as this, I'm not expecting another HTML standard. It's just an improvement and at least now the browser wars have minimalised their attacks on the standards. XHTML could be seen as stepping backwards, but if your remember HTML 3.2 (during the browser wars) HTML 4.01 was a definite step back as every browser had a lot of features they wanted.

 

 

Cheers, MC

Share this post


Link to post
Share on other sites

Lovely tutorial, will definitely look into making my site XHTML... I'm already following a lot of the rules so it shouldn't be that big a problem.Question 1: The thing about name and id, I look at your example and I only see it being used in the input and form tags. How about anchor tags? When you specify an anchor the attribute is also "name", so does that mean I have to do < a id="anchor" > now, instead of < a name="anchor" >?Question 2: Just checking, is XHTML more compatible with all browsers than HTML?

Share this post


Link to post
Share on other sites

To answer a few questions, and to add input of my own...

 

The whole XHTML conversion is bringing the HTML towards XML. The main focus is to concentrate on what the data is and not how it looks. If any of you have seen an XML file then you'll know it only contains data and it doesn't display anything in any special way. The reason fot his is the advancements in technology which means no longer is the internet available only to computers but to PDA's and mobile phones. They have to display that same data on a different screen and will not use the same rendering browsers as computers.

 

Internet Explorer automatically closes some tags and what not which means a lot of website owners thought there site was completely fine but other devices, and other browsers displayed the pages like they got beat with an ugly tree.

 

So the internet moved towards a more standards compliant code in which only the data was displayed and they had to follow a certain code. XML was working towards this goal so instead of completely destroying all current websites, they made XHTML which is read the same by most browsers. It now means that your code is more than likely gonna display the same thing in many browsers.

 

The main advantage of XHTML is it's recommended integration with CSS. XHTML held the data and the CSS displays how the data should look so although it is NOT against the rules to include "bgcolor", "color" and "size" attributes as well as the <font> and <br /> tags, it does defeat the meaning of XHTML and what it's aiming for.

 

Consider someone who has poor eyesight. Do they want to be squinting to read your tiny text, painlessly trying to stare past the glare or trying to make sense of your funky text? No, they want to be able to literally customise the appearance of the whole interent to suit their needs. Screenreaders for the completely blind, will read out well formed XHTML more sensibly than not formed. By styling your font or page layout in the actual XHTML, you remove this ability and, possibly, dissappoint people.

 

Another advantage is that you get to describe how your website prints out on paper. Say you have loads of information on a subject, good quality information. It contains links to other passages on the same subject with "click here" links on it. When that is printed out and given to someone who may want to look up that information, what does "click here" mean? With XHTML, you give the links a title, and using CSS skills, tell the printer to print out the title beside the link along with the actual link.

 

So although the tutorial was very informative, I considerably recommend you to not just follow the mandatory rules, but understand what XHTML is all about and make your site true so that in the future, your site will continue to work effortlessly. So although <font>,<br> and <hr> are technically still allowed, DONT use them because they do not contain any meaning other than for displaying purposes.

 

Also use other tags that are not widely known, such as quote, code, definitions, abbr and acronym. If you want to read more go to http://www.htmldog.com/.

 

Learn when to use the correct tags. Don't use tables for layout, only use them to display structured data. Put menu items into an unordered list as they are effectively lists of navigation links. Small things like that.

 

XHTML is more flexible to all browsers because they will read it exactly the same. How they display it, is entirely different in relation to the browsers CSS support or lack of it. But making it valid XHTML will ensure your site is more accessible as it doesn't rely on the internet browser to be able to know how to close tags for you.

 

And about the name and ID attribute question. Name is deprecating but it is nowhere near out of phase. If you are familiar to PHP when to find out the variable of a form input box you need name tag. I spent four hours debugging my PHP code to find out why it wasn't working only to find out that i hadn't given the input boxes a name, only ID. So my advice is to include both, since it doesn't do any harm.

 

And lastly, to add a bit to the tutorial, XHTML must be well formed (nested) so if you open a bold tag, then an italic tag, you must first close the italic tag then the bold tag for valid purposes.

 

<b><i>Hello</b></i> isn't valid.

<b><i>Hello</i></b> is.

 

To close with a related note, I made a mini site last month that worked on WAP on mobile phones. It was a pretty cool experiment to see the website on both a phone and a computer's browser.

Share this post


Link to post
Share on other sites

If you Google several keywords about XHTML and HTML, you'll see that there is much controversy to XHTML, notably the Appendix C section where it states that XHTML 1.0 may be served as text/html. (Thing is, when served as text/html, the browser still thinks it is HTML).

Can it do more? XHTML 1.0 and HTML 4.01 are equivalent in functions in transitional, frameset and strict modes.
Does it have more options? It can make your code look cleaner, but you can do the same in HTML.
Is it easier? That depends. It certainly isn't harder, but it isn't easier either.

A famous article is this: http://hixie.ch/advocacy/xhtml

In fact, the controversy about XHTML and HTML has led browser developers, a working group called the WHATWG and (later) the W3C to develop a new version of HTML called HTML5. XHTML2 isn't suitable for today's web applications apparently, so HTML5 is hoping to remedy that problem.

Share this post


Link to post
Share on other sites

xHTML can be used either with XML, if you're using XML requests and responses, but if you just have an ordinary site, when you can stay with HTML4 and you won't have problems, because xHTML can be used with XML, it has to be stricter than HTML4, it has to have end tags and needs to validate or there may be a lot of errors, a lot of designers still use HTML4 and they are happy, because usually the main thing is for the client to see the site and not the source.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.