| |
Resin includes an HTML parser. Parsing HTML is convenient:
- Parse documents created by HTML editors.
- Parse documents on the web.
- Provide a HTML interface for web designers.
Parsing HTML is like using the JAXP interface, except you'll be using
Resin's API.
Because printing HTML uses different rules from XML, e.g. <img> has
no end tag, you'll need to use printHtml instead of
just print.
Parsing HTML
import java.io.*;
import org.w3c.dom.*;
import com.caucho.xml.*;
...
Html parser = new HtmlParser();
// Parse the file into a DOM Document (org.w3c.dom)
Document doc = parser.parse("test.html");
// Create a new HTML printer (com.caucho.xml)
FileOutputStream os = new FileOutputStream("out.xml");
XmlPrinter printer = new XmlPrinter(os);
// Print the document using HTML rules
printer.printHtml(doc);
os.close();
|
You can also take advantage of Resin's VFS API and parse documents
directly from the web:
Parsing HTML
import java.io.*;
import org.w3c.dom.*;
import com.caucho.xml.*;
...
Html parser = new HtmlParser();
Path yahoo = Vfs.lookup("http://www.yahoo.com");
// Parse the file into a DOM Document (org.w3c.dom)
Document doc = parser.parse(yahoo);
|
Copyright © 1998-2002 Caucho Technology, Inc. All rights reserved.
Resin® is a registered trademark,
and HardCoretm and Quercustm are trademarks of Caucho Technology, Inc. | |
|