HTML Tidier
HTML Tidy is a great program for making HTML perfectly XML compliant, but it doesn't clean up redundancy. HTML Tidier is a PHP script that will not only make the HTML pretty, it will remove excessive redundancy.
Download
Click Here for the Source Code This is saved as a .txt file to keep the web server from messing with it. You will want to save it with either a .php or .inc extension.
Usage
To clean dirty HTML, use: $clean_html = html_tidier($dirty_html);
If the HTML is in a file, you can use: $clean_html = html_tidier_file($filename);
If you just have to send it straight to a file, you can use: $fp = fopen($clean_file, 'w'); fputs($fp, html_tidier_file($dirty_file)); fclose($fp);
What is "dirty" HTML?
I have clients who use WYSIWYG editors for their websites. It produces garbage like:
<b><strong><b><strong>some text</b></strong></b></strong>
That is just after they edit a file one or two times. After they edit it 30-40 times it gets much worse. They've produced files that are over 10MB and only have a few sentences of real output. HTML_Tidier will turn the above html into:
<b>some text</b>
HTML Tidier does not replace HTML Tidy
HTML Tidier is not a replacement for HTML Tidy. As the name implies, it makes HTML that has been tidied by HTML Tidy even tidier. However, you may use HTML Tidier without using HTML Tidy. The end result will be removal of HTML garbage without the addition code to ensure HTML 4.0 compliance.











