DJ Mike's Tutorials: PHP, Tidy


< ^

Tidy Functions And Options

Tidy

Introduction

Tidy is a validate and repair utility that allows you to identify and fix HTML errors within a file or string of HTML. Tidy can be done with procedural programming or object oriented programming. A tidy resource returned by a procedural function can be treated as a tidy object and a tidy object returned by an OOP method can be treated as a tidy resource



Tidy is not bundled with PHP. You can find out if you have it with phpinfo(). If you don't have it, your administrator will need to install libtidy which can be found at HTML Tidy Project Page

Repairing Strings And Files

To repair a string, assign the string to a variable and use tidy_repair_string(). To repair a file, assign the file name to a variable and use tidy_repair_file(). If the file name is a local path and it contains PHP, Tidy will not touch the PHP code. You can use full URL's to repair non-local files.

String Example

<?
$string 
"<html>
<body>
<p>
<b><i>test</b></i>
</body>
</html>"
;
?>
<br /><br />
<?
# show repaired HTML
$tidy tidy_repair_string("$string");
echo 
"$tidy";
?>


The code above will output



<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
<title></title>
</head>
<body>
<p><b><i>test</i></b></p>
</body>
</html>

File Example

For the file examples I'll use example.html which has several errors. Code with errors...



<html>
<body>
<p style="text-indent:1em; color:blue"
<b><i>test</b></i> <font color="red" size=4>big red text</font>

</p>
<br><br>
<table border="5" bgcolor="aaaaff">
<tr>
<td>
Table
</td>
</tr>
</table>
</html>


Cleaning the code...



<?
$file 
"example.html";
?>
<br /><br />
<?
# show repaired HTML
$tidy tidy_repair_file("$file");
echo 
"$tidy";
?>


Output...



<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title></title>
</head>
<body>
<p style="text-indent:1em; color:blue"><b><i>test</i></b>
<font color="red" size="4">big red text</font></p>
<br>
<br>
<table border="5" bgcolor="#AAAAFF">
<tr>
<td>Table</td>
</tr>
</table>
</body>
</html>

Configuration

Tidy has many optional configurations that you can change. In the examples above, the output is a single line. If you put that in a file like that, it will be hard to read and edit. Some of the configurations that make it easier to read are called, "print pretty".

Print Pretty

Both tidy_repair_string() and tidy_repair_file() accepts an optional argument for configuration. The config parameter can be an array or a string representing a configuration file. In the example below, I use an array named $config to hold the configuration settings. $config tells Tidy to wrap the lines at 54 characters, indent the out put to show the document tree one space, add a new line after <br>'s and also to indent attributes.



<?
$file 
"example.html";
# set some configurations
$config = array(
            
'wrap'   => '54',
            
'indent'  => true,
            
"indent-spaces" => 1,
            
"break-before-br" => TRUE,
            
"indent-attributes" => TRUE,
);

# show repaired HTML
$tidy tidy_repair_file("$file"$config);
$tidy htmlspecialchars($tidy);
echo 
"$tidy";
?>
</pre>


Output...



<!DOCTYPE 
html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
 <head>
  <title></title>
 </head>
 <body>
  <p style="text-indent:1em; color:blue">
   <b><i>test</i></b> <font color="red"
      size="4">big red text</font>
  </p>
  <br>
  <br>
  <table border="5"
         bgcolor="#AAAAFF">
   <tr>
    <td>
     Table
    </td>
   </tr>
  </table>
 </body>
</html>

Change Output Type

In the two previous examples, the output is HTML 4.01 Transitional. You can change the configuration to change the output type. The next example outputs XHTML strict. The "clean" option replaces presentational tags and attributes like <font> and <center> ?> and replaces them with style rules and structural markup.



<?
$file 
"example.html";
# set some configurations
$config = array(
            
'indent'  => true,
            
'wrap'   => '54',
            
"indent-spaces" => 1,
            
"break-before-br" => TRUE,
            
"indent-attributes" => TRUE,
            
"output-xhtml" => TRUE,
            
"doctype" => "strict",
            
"clean" => TRUE,
);
# show repaired HTML
$tidy tidy_repair_file("$file"$config);
$tidy htmlspecialchars($tidy);
echo 
"$tidy";
?>


Output...



<!DOCTYPE 
html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
 <head>
  <title></title>
  <style type="text/css">
/*<![CDATA[*/
  table.c4 {background-color: #AAAAFF}
  p.c3 {text-indent:1em; color:blue}
  span.c2 {color: red; font-size: 120%}
  b.c1 {font-style: italic}
  /*]]>*/
  </style>
 </head>
 <body>
  <p class="c3">
   <b class="c1">test</b> <span class="c2">big red
   text</span>
  </p>
  <br />
  <br />
  <table class="c4"
         border="5">
   <tr>
    <td>
     Table
    </td>
   </tr>
  </table>
 </body>
</html>

Show Errors And Warnings

If you want to have a list of the corrections that will or has made, you can use tidy_get_error_buffer(). tidy_get_error_buffer() will return a strig containing a list of all warnings and errors but it will be one long unformated string so you will probable want to at least add line breaks to it. If the output is an HTML page, you will need to use htmlspecialchars() to replace < and > with entities.



<?
$file 
"example.html";
# set some configurations
$config = array(
            
'indent'  => true,
            
'wrap'   => '54',
            
"indent-spaces" => 1,
            
"break-before-br" => TRUE,
            
"indent-attributes" => TRUE,
            
"output-xhtml" => TRUE,
            
"doctype" => "strict",
            
"clean" => TRUE,
);
$tidy tidy_parse_file("$file"$config);
$errors tidy_get_error_buffer($tidy);
$errors htmlspecialchars($errors);
$errors preg_replace"@line \d+ column \d+@""<br /><br /><b>$0</b>"$errors);
echo 
"$errors";
echo 
"<pre>";
echo 
htmlspecialchars("$tidy");
echo 
"</pre>";
?>


Output ...





line 1 column 1 - Warning: missing <!DOCTYPE> declaration

line 3 column 1 - Warning: <p> missing '>' for end of tag

line 4 column 4 - Warning: replacing unexpected b by </b>

line 4 column 15 - Warning: inserting implicit <i>

line 2 column 1 - Warning: inserting missing 'title' element

line 8 column 1 - Warning: <table> attribute "bgcolor" had invalid value "aaaaff" and has been replaced

line 8 column 1 - Warning: <table> lacks "summary" attribute

line 4 column 15 - Warning: trimming empty <i>
<html>
 <head>
  <title></title>
 </head>
 <body>
  <p style="text-indent:1em; color:blue">
   <b><i>test</i></b> big red text
  </p>
  <br />
  <br />
  <table border="5"
         bgcolor="#AAAAFF">
   <tr>
    <td>
     Table
    </td>
   </tr>
  </table>
 </body>
</html>



< ^
Tidy Functions And Options


Created by DJ Mike from Santa Barbara

DJ Mike