Many tags take text as the value of attributes, and many elements take text as content. Text is what you think it would be: plain character strings in the encoding of your choice. You can always use character entity references (such as "ñ" for the letter "ñ") wherever you can use text, even in attribute values.
HTML differentiates between two types of text: unparsed character data (CDATA) and parsed character data (PCDATA). Software reading an HTML document are supposed to take CDATA as is, and not try to interpret it (with the exception of any character entity references, as noted above). The values of attributes are usually CDATA. In theory, this would allow you to put HTML tags inside URI's, for example. In XHTML, though, all delimiter characters (such as the less-than symbol and the ampersand) have to be escaped, so if you wanted to link to a file named "five<seven.html," you must use <a href="five<seven.html"> and not <a href="five<seven.html">.
PCDATA, on the other hand, is processed by HTML software. This processing is just to ensure that you do not stick in forbidden tags in places where they are not allowed. The content of the <title> element, for example, is PCDATA. Had it been CDATA, you might be able to sneak in markup between <title> and </title> (markup is not allowed between these two tags) and then defiantly claim, "Ha! You can't do anything about that, since you can't notice the markup inside because you're not allowed to think about CDATA." In practice, though, this is a moot point, because, as explained before, you can't have delimiters inside CDATA in XHTML anyway, so you couldn't have put markup between <title> and </title> regardless of whether the content is CDATA or PCDATA.
Trying to write an XHTML document that contains valid markup and a valid script or style sheet is infuriating. The problem is that the syntax of XHTML differs so much from that of scripts and style sheets. It's hard enough keeping the document valid before and after the scripts are run. Trying to keep script and style sheets from triggering the XML processor and trying to keep markup from triggering the scripting and style-sheet engines is an exercise in madness.
In XHTML, scripts contained between <script> and </script> and style sheets contained between <style> and </style> are PCDATA. This insane rule means that anything that looks remotely like markup will confuse the XML parser and invalidate your document. (Remember that, in theory, an XML document must be parsed before it can be rendered by a user agent, and that if the document is not well formed, the parser must give up and die instead of letting the user agent try to guess what the author intended.) Here is an example of what will not work:
<script type="text/javascript"> if (d < e) f(); </script>
The parser will think that the less-than operator starts a tag (because, after all, it is PCDATA that's inside the script), so the parser will get confused, and it will give up, and die. Back in the days of HTML 4, the above fragment of code would have been perfectly legal, because scripts were CDATA. (HTML 4, however, infuriates in its own little way: it ends a script or style sheet at the first occurance of "</," even if the sequence doesn't really correspond to the end of the script.)
Even more insidious, however, is that the parser will try to find entity references inside your scripts and style sheets. Thus, this fragment is lethal to your parser as well:
<script type="text/javascript"> if (h && i) j(); </script>
The parser thinks the boolean operator && is a character entity. Sadly, many of the most popular characters in programming have a special meaning in XML.
There really isn't an easy solution. Encasing scripts and style sheets in comment delimiters (<!-- -->) does not officially work. According to the W3C, the parser may remove all comments before passing the code onto the user agent. In addition, C-like languages, including Javascript, have a decrement operator ("--") that just happens to be the SGML comment delimiter.
You could try writing scripts and style sheets using entity references, trusting the parser to convert them before passing them on to the scripting and style-sheet engine. The last example might be rewritten like this:
<script type="text/javascript"> if (h && i) j(); </script>
Unfortunately, few browsers really runs a parser on your document first, then feed the results to the scripting engine. Besides, such an arrangement would break HTML 4 pages, which have scripts as unparsed CDATA. Most browsers instead would pass the &'s and any other character entity references straight to the scripting engine.
Interestingly, XML has a special construct designed to deal with the script and style sheet problem. Anything wrapped between "<![CDATA[" and "]]>" is treated as CDATA. Thus, using the same example, the fragment of code could be rewritten this way:
<script type="text/javascript"> <![CDATA[ if (h && i) j(); ]]> </script>
The problem with this solution is that not many browsers understand this synatx either. You might try wrapping the CDATA markers inside comments. (Use the comments of your scripting or style-sheet language, mind you. If you use the SGML-style comments, all sorts of nastiness may ensue.) The other problem is that if your script or style-sheet actually contains the sequence "]]>," you're out of luck again.
Finally, the last problem with embedding scripting and style sheets inside webpages is that the XML parser must compact a string of spaces into a single space, and, yet again, because scripts and style sheets are PCDATA, the parser is free to mutilate with them. You can prevent this by using the CDATA markers just described, or simply set the xml:space attribute in <script> or <style> to "preserve."
Lastly, your best solution may be just to use external scripts and style sheets, avoiding this whole big mess.
But there's a catch to external scripts, though: you can't use them in event handlers. If you insist on putting code of any complexity inside an event handler, you're in trouble again. It is true that the value of attributes, including event handlers, is CDATA. The wrinkle is that, inside attributes, and only inside attributes, CDATA is parsed for character entity references. And as explained earlier, CDATA inside attributes are also parsed for delimiters like the less-than symbol. Rather than trying to whip up a new bag of tricks to deal with these problems, you best bet is probably instead to put event handlers in an external script and call them from inside the webpage using function calls. (Or you can try to be clever and program handicapped without a few of your favorite operators.)
A common mistake that many beginners to HTML (including me!) make is to put text willy-nilly all over a web page. This is prohibited! You can only put text where I say (or the W3C says, actually) you can put text. The <p> element, for example, may contain inline elements and text, so you may put text inside. The <body> element, on the other hand, may only contain heading, block, and list elements, so <body>Welcome to my web site</body> is quite illegal.
And of course, in addition to text, all the elements as well have rules about whether they may be nested inside another element or not. These rules are not always obvious, which is one of the reasons I wrote this guide.
<html>; <head>, <title>, <base>, <link>, <meta>, <script>, <style>; <body><h1>, <h2>, <h3>, <h4>, <h5>, <h6><address>, <blockquote>; <del>, <ins>; <div>, <fieldset>, <form>, <hr>, <noscript>, <p>, <pre>, <script>, <table><dl>, <ol>, <ul>; <dt>, <dd>, <li><a>; <abbr>, <acronym>, <cite>, <code>, <dfn>, <em>, <kbd>, <samp>, <strong>, <var>; <b>, <big>, <i>, <small>, <sub>, <sup>, <tt>; <bdo>, <br>, <button>; <del>, <ins>; <img>, <input>, <label>, <map>, <noscript>, <object>, <q>, <ruby>, <select>, <script>, <span>, <textarea><caption>, <col>, <colgroup>; <thead>, <tbody>, <tfoot>; <td>, <th>, <tr><legend>, <optgroup>, <option><area><param><rb>, <rbc>, <rp>, <rt>, <rtc><!-- -->)