DOM

Wrappers around the PHP DOM classes <http://php.net/manual/en/book.dom.php>__ that handle the common DOM extension pitfalls.

.. image:: https://travis-ci.com/kuria/dom.svg?branch=master :target: https://travis-ci.com/kuria/dom

.. contents::

Features

HTML documents
- encoding sniffing
- optional tidy support (automatically fix broken HTML)
HTML fragments
XML documents
XML fragments
XPath queries
creating documents from scratch
optional error suppression
helper methods for common tasks, such as:
- querying multiple or a single node
- checking for containment
- removing a node
- removing all nodes from a list
- prepending a child node
- inserting a node after another node
- fetching <head> and <body> elements (HTML)
- fetching root elements (XML)

Requirements

PHP 7.1+

Container methods

These methods are shared by both HTML and XML containers.

Loading documents

.. code:: php

<?php use Kuria\Dom\HtmlDocument; // or XmlDocument, HtmlFragment, etc. // using loadString() $dom = new HtmlDocument(); $dom->setLibxmlFlags($customLibxmlFlags); // optional $dom->setIgnoreErrors($ignoreErrors); // optional $dom->loadString($html); // using static loadString() shortcut $dom = HtmlDocument::fromString($html); // using existing document instance $dom = new HtmlDocument(); $dom->loadDocument($document); // using static loadDocument() shortcut $dom = HtmlDocument::fromDocument($document); // creating an empty document $dom = new HtmlDocument(); $dom->loadEmpty(); Getting or changing document encoding ===================================== .. code:: php <?php // get encoding $encoding = $dom->getEncoding(); // set encoding $dom->setEncoding($newEncoding); .. NOTE:: The DOM extension uses UTF-8 encoding. This means that text nodes, attributes, etc.: - will be encoded using UTF-8 when read (e.g. ``$elem->textContent``) - should be encoded using UTF-8 when written (e.g. ``$elem->setAttribute()``) The encoding configured by ``setEncoding()`` is used when saving the document, see `Saving documents`_. Saving documents ================ .. code:: php <?php // entire document $content = $dom->save(); // single element $content = $dom->save($elem); // children of a single element $content = $dom->save($elem, true); Getting DOM instances ===================== After a document has been loaded, the DOM instances are available via getters: .. code:: php <?php $document = $dom->getDocument(); $xpath = $dom->getXpath(); Running XPath queries ===================== .. code:: php <?php // get a DOMNodeList $divs = $dom->query('//div'); // get a single DOMNode (or null) $div = $dom->query('//div'); // check if a query matches $divExists = $dom->exists('//div'); Escaping strings ================ .. code:: php <?php $escapedString = $dom->escape($string); DOM manipulation and traversal helpers ====================================== Helpers for commonly needed tasks that aren't easily achieved via existing DOM methods: .. code:: php <?php // check if the document contains a node $hasNode = $dom->contains($node); // check if a node contains another node $hasNode = $dom->contains($node, $parentNode); // remove a node $dom->remove($node); // remove a list of nodes $dom->removeAll($nodes); // prepend a child node $dom->prependChild($newNode, $existingNode); // insert a node after another node $dom->insertAfter($newNode, $existingNode); Usage examples ************** HTML documents ============== Loading an existing document ---------------------------- .. code:: php <?php use Kuria\Dom\HtmlDocument; $html = <<<HTML <!doctype html> <html> <head> <meta charset="UTF-8"> <title>Example document</title> </head> <body> <h1>Hello world!</h1> </body> </html> HTML; $dom = HtmlDocument::fromString($html); var_dump($dom->queryOne('//title')->textContent); var_dump($dom->queryOne('//h1')->textContent); Output: :: string(16) "Example document" string(12) "Hello world!" Optionally, the markup can be fixed by `Tidy <http://php.net/manual/en/book.tidy.php>`_ prior to being loaded. .. code:: php <?php $dom = new HtmlDocument(); $dom->setTidyEnabled(true); $dom->loadString($html); .. NOTE:: HTML documents ignore errors by default, so there is no need to call ``$dom->setIgnoreErrors(true)``. Creating an new document ------------------------ .. code:: php <?php use Kuria\Dom\HtmlDocument; // initialize empty document $dom = new HtmlDocument(); $dom->loadEmpty(['formatOutput' => true]); // add <title> $title = $dom->getDocument()->createElement('title'); $title->textContent = 'Lorem ipsum'; $dom->getHead()->appendChild($title); // save echo $dom->save(); Output: :: <!DOCTYPE html> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Lorem ipsum</title> </head> <body> </body> </html> HTML fragments ============== Loading an existing fragment ---------------------------- .. code:: php <?php use Kuria\Dom\HtmlFragment; $dom = HtmlFragment::fromString('<div id="test"><span>Hello</span></div>'); $element = $dom->queryOne('/div[@id="test"]/span'); if ($element) { var_dump($element->textContent); } Output: :: string(5) "Hello" .. NOTE:: HTML fragments ignore errors by default, so there is no need to call ``$dom->setIgnoreErrors(true)``. Creating a new fragment ----------------------- .. code:: php <?php use Kuria\Dom\HtmlFragment; // initialize empty fragment $dom = new HtmlFragment(); $dom->loadEmpty(['formatOutput' => true]); // add <a> $link = $dom->getDocument()->createElement('a'); $link->setAttribute('href', 'http://example.com/'); $link->textContent = 'example'; $dom->getBody()->appendChild($link); // save echo $dom->save(); Output: :: <a href="http://example.com/">example</a> XML documents ============= Loading an existing document ---------------------------- .. code:: php <?php use Kuria\Dom\XmlDocument; $xml = <<<XML <?xml version="1.0" encoding="utf-8"?> <library> <book name="Don Quixote" author="Miguel de Cervantes" /> <book name="Hamlet" author="William Shakespeare" /> <book name="Alice's Adventures in Wonderland" author="Lewis Carroll" /> </library> XML;

$dom = XmlDocument::fromString($xml);

foreach ($dom->query('/library/book') as $book) { /** @var \DOMElement $book */ var_dump("{$book->getAttribute('name')} by {$book->getAttribute('author')}"); }

Output:

string(34) "Don Quixote by Miguel de Cervantes" string(29) "Hamlet by William Shakespeare" string(49) "Alice's Adventures in Wonderland by Lewis Carroll"

Creating a new document

.. code:: php

<?php use Kuria\Dom\XmlDocument; // initialize empty document $dom = new XmlDocument(); $dom->loadEmpty(['formatOutput' => true]); // add <users> $document = $dom->getDocument(); $document->appendChild($document->createElement('users')); // add some users $bob = $document->createElement('user'); $bob->setAttribute('username', 'bob'); $bob->setAttribute('access-token', '123456'); $john = $document->createElement('user'); $john->setAttribute('username', 'john'); $john->setAttribute('access-token', 'foobar'); $dom->getRoot()->appendChild($bob); $dom->getRoot()->appendChild($john); // save echo $dom->save(); Output: :: <?xml version="1.0" encoding="UTF-8"?> <users> <user username="bob" access-token="123456"/> <user username="john" access-token="foobar"/> </users>

Handling XML namespaces in XPath queries

.. code:: php

<?php use Kuria\Dom\XmlDocument; $xml = <<<XML <?xml version="1.0" encoding="UTF-8"?>

<lib:root xmlns:lib="http://example.com/"> <lib:book name="Don Quixote" author="Miguel de Cervantes" /> <lib:book name="Hamlet" author="William Shakespeare" /> <lib:book name="Alice's Adventures in Wonderland" author="Lewis Carroll" /> </lib:root> XML;

$dom = XmlDocument::fromString($xml);

// register namespace in XPath $dom->getXpath()->registerNamespace('lib', 'http://example.com/');

// query using the prefix foreach ($dom->query('//lib:book') as $book) { /** @var \DOMElement $book */ var_dump($book->getAttribute('name')); }

Output:

string(11) "Don Quixote" string(6) "Hamlet" string(32) "Alice's Adventures in Wonderland"

XML fragments

Loading an existing fragment

.. code:: php

<?php use Kuria\Dom\XmlFragment; $dom = XmlFragment::fromString('<fruits><fruit name="Apple" /><fruit name="Banana" /></fruits>'); foreach ($dom->query('/fruits/fruit') as $fruit) { /** @var \DOMElement $fruit */ var_dump($fruit->getAttribute('name')); } Output: :: string(5) "Apple" string(6) "Banana" Creating a new fragment ----------------------- .. code:: php <?php use Kuria\Dom\XmlFragment; // initialize empty fragment $dom = new XmlFragment(); $dom->loadEmpty(['formatOutput' => true]); // add a new element $person = $dom->getDocument()->createElement('person'); $person->setAttribute('name', 'John Smith'); $dom->getRoot()->appendChild($person); // save echo $dom->save(); Output: :: <person name="John Smith"/>

Dom

Install / Use

README

Loading documents

Creating a new document

Handling XML namespaces in XPath queries

XML fragments

Loading an existing fragment