by anders pearson Fri 21 Nov 2003 12:05:13

making websites validate can be a pain. especially if your authoring software or CMS generates invalid markup. if you can't change your authoring software or CMS, but you still want your site to validate, you'll need to find another solution.

one way or another, that solution usually will end up involving HTML Tidy, which is a handy command-line tool for automatically cleaning up and fixing messy, invalid markup. it does an amazing job of turning ugly, tag-soup, pseudo-HTML into clean, nicely formatted XHTML. the usual pattern is to figure out some way to run tidy over your pages in batch mode.

now you can also do it live, on demand, right as the page is served by apache. i've written a nice little module that just wraps HTML Tidy and makes it available to a mod_perl enabled apache 1.x server: Apache::Tidy.

eg, here's the original static badly formed, invalid HTML and here's the tidied version.

i also wrote Apache::Tidy to be Apache::Filter compliant, so you can also use it to automatically correct dynamically generated content as long as it is generated by another Apache::Filter aware perl module. eg, some perl code being handled by Apache::RegistryFilter, here is the tidied output.

Mark Pilgrim points out that if you are running apache2, mod_tidy is available. mod_tidy is even nicer than Apache::Tidy because apache2 provides a native filtering framework so you can use mod_tidy to clean up dynamic content generated by anything, not just mod_perl handlers.

TAGS: perl html validation xhtml apache apache:tidy mod_tidy html tidy


Looks great, wish it could work directly with apache as a an apache module so that my php would be tidied.. But yeah you rock I love tidy :)

formatting is with Markdown syntax. Comments are not displayed until they are approved by a moderator. Moderators will not approve unless the comment contributes value to the discussion.

remember info?