Tidy is a validate and repair utility that allows you to identify and fix HTML errors within a file or string of HTML. Tidy can be done with procedural programming or object oriented programming. A tidy resource returned by a procedural function can be treated as a tidy object and a tidy object returned by an OOP method can be treated as a tidy resource
Tidy is not bundled with PHP. You can find out if you have it with phpinfo(). If you don't have it, your administrator will need to install libtidy which can be found at HTML Tidy Project Page
To repair a string, assign the string to a variable and use tidy_repair_string(). To repair a file, assign the file name to a variable and use tidy_repair_file(). If the file name is a local path and it contains PHP, Tidy will not touch the PHP code. You can use full URL's to repair non-local files.
<?
$string = "<html>
<body>
<p>
<b><i>test</b></i>
</body>
</html>";
?>
<br /><br />
<?
# show repaired HTML
$tidy = tidy_repair_string("$string");
echo "$tidy";
?>
The code above will output
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN"> <html> <head> <title></title> </head> <body> <p><b><i>test</i></b></p> </body> </html>
For the file examples I'll use example.html which has several errors. Code with errors...
<html>
<body>
<p style="text-indent:1em; color:blue"
<b><i>test</b></i> <font color="red" size=4>big red text</font>
</p>
<br><br>
<table border="5" bgcolor="aaaaff">
<tr>
<td>
Table
</td>
</tr>
</table>
</html>
Cleaning the code...
<?
$file = "example.html";
?>
<br /><br />
<?
# show repaired HTML
$tidy = tidy_repair_file("$file");
echo "$tidy";
?>
Output...
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title></title> </head> <body> <p style="text-indent:1em; color:blue"><b><i>test</i></b> <font color="red" size="4">big red text</font></p> <br> <br> <table border="5" bgcolor="#AAAAFF"> <tr> <td>Table</td> </tr> </table> </body> </html>
Tidy has many optional configurations that you can change. In the examples above, the output is a single line. If you put that in a file like that, it will be hard to read and edit. Some of the configurations that make it easier to read are called, "print pretty".
Both tidy_repair_string() and tidy_repair_file() accepts an optional argument for configuration. The config parameter can be an array or a string representing a configuration file. In the example below, I use an array named $config to hold the configuration settings. $config tells Tidy to wrap the lines at 54 characters, indent the out put to show the document tree one space, add a new line after <br>'s and also to indent attributes.
<?
$file = "example.html";
# set some configurations
$config = array(
'wrap' => '54',
'indent' => true,
"indent-spaces" => 1,
"break-before-br" => TRUE,
"indent-attributes" => TRUE,
);
# show repaired HTML
$tidy = tidy_repair_file("$file", $config);
$tidy = htmlspecialchars($tidy);
echo "$tidy";
?>
</pre>
Output...
<!DOCTYPE
html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
<p style="text-indent:1em; color:blue">
<b><i>test</i></b> <font color="red"
size="4">big red text</font>
</p>
<br>
<br>
<table border="5"
bgcolor="#AAAAFF">
<tr>
<td>
Table
</td>
</tr>
</table>
</body>
</html>
In the two previous examples, the output is HTML 4.01 Transitional. You can change the configuration to change the output type. The next example outputs XHTML strict. The "clean" option replaces presentational tags and attributes like <font> and <center> ?> and replaces them with style rules and structural markup.
<?
$file = "example.html";
# set some configurations
$config = array(
'indent' => true,
'wrap' => '54',
"indent-spaces" => 1,
"break-before-br" => TRUE,
"indent-attributes" => TRUE,
"output-xhtml" => TRUE,
"doctype" => "strict",
"clean" => TRUE,
);
# show repaired HTML
$tidy = tidy_repair_file("$file", $config);
$tidy = htmlspecialchars($tidy);
echo "$tidy";
?>
Output...
<!DOCTYPE
html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title></title>
<style type="text/css">
/*<![CDATA[*/
table.c4 {background-color: #AAAAFF}
p.c3 {text-indent:1em; color:blue}
span.c2 {color: red; font-size: 120%}
b.c1 {font-style: italic}
/*]]>*/
</style>
</head>
<body>
<p class="c3">
<b class="c1">test</b> <span class="c2">big red
text</span>
</p>
<br />
<br />
<table class="c4"
border="5">
<tr>
<td>
Table
</td>
</tr>
</table>
</body>
</html>
If you want to have a list of the corrections that will or has made, you can use tidy_get_error_buffer(). tidy_get_error_buffer() will return a strig containing a list of all warnings and errors but it will be one long unformated string so you will probable want to at least add line breaks to it. If the output is an HTML page, you will need to use htmlspecialchars() to replace < and > with entities.
<?
$file = "example.html";
# set some configurations
$config = array(
'indent' => true,
'wrap' => '54',
"indent-spaces" => 1,
"break-before-br" => TRUE,
"indent-attributes" => TRUE,
"output-xhtml" => TRUE,
"doctype" => "strict",
"clean" => TRUE,
);
$tidy = tidy_parse_file("$file", $config);
$errors = tidy_get_error_buffer($tidy);
$errors = htmlspecialchars($errors);
$errors = preg_replace( "@line \d+ column \d+@", "<br /><br /><b>$0</b>", $errors);
echo "$errors";
echo "<pre>";
echo htmlspecialchars("$tidy");
echo "</pre>";
?>
Output ...
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 3 column 1 - Warning: <p> missing '>' for end of tag
line 4 column 4 - Warning: replacing unexpected b by </b>
line 4 column 15 - Warning: inserting implicit <i>
line 2 column 1 - Warning: inserting missing 'title' element
line 8 column 1 - Warning: <table> attribute "bgcolor" had invalid value "aaaaff" and has been replaced
line 8 column 1 - Warning: <table> lacks "summary" attribute
line 4 column 15 - Warning: trimming empty <i><html> <head> <title></title> </head> <body> <p style="text-indent:1em; color:blue"> <b><i>test</i></b> big red text </p> <br /> <br /> <table border="5" bgcolor="#AAAAFF"> <tr> <td> Table </td> </tr> </table> </body> </html>
|
|
|