假设有这样一个HTML页面(部分),其内容如下:
$html = <<<HTML
<div class="container">
<img class="logo img" id="img1" src="/images/img1.jpg" />
<img class="icon img" id="img2" src="/images/img2.jpg" />
<img class="icon use" id="img3" src="/images/img3.jpg" />
<p class="icon" id="content">Welcome PHP!</p>
</div>
HTML;
我们把它赋值给字符串变量$html
。
我们将$html
加载到DOM对象,再用DOMXPath
解析处理。
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
1 DOMXPath用法
接下来我们将用DOMXPath
的方法来解析。
DOMXPath
有两个核心的部分:传入的表达式和返回值。
表达式是W3C标准的XPath表达式,语法:https://www.w3schools.com/xml/xpath_syntax.asp。
返回值是DOMNodeList
对象,有若干属性和方法,也是W3C标准:https://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030226/DOM3-Core.html。
2 获取img src
获取第一个图片的src
内容:
echo $src = $xpath->evaluate('string(//img/@src)');
/*输出:
/images/img1.jpg
*/
获取全部IMG SRC内容
$nodeList = $xpath->query("//img");
$srcList = [];
foreach ($nodeList as $node) {
$srcList[] = $node->attributes->getNamedItem('src')->nodeValue;
}
print_r($srcList);
/*输出:
Array
(
[0] => /images/img1.jpg
[1] => /images/img2.jpg
[2] => /images/img3.jpg
)
*/
3 获取特定class DOM
获取所有class
等于content
的id
值,这里class
值必须是唯一的:
$nodeList = $xpath->query('//*[@class="icon"]');
$result = [];
foreach ($nodeList as $node) {
$result[] = $node->attributes->getNamedItem('id')->nodeValue;
}
print_r($result);
/*输出:
Array
(
[0] => content
)
*/
获取所有class
包含icon
的节点的id
值:
$nodeList = $xpath->query('//*[contains(@class, "icon")]');
$result = [];
foreach ($nodeList as $node) {
$result[] = $node->attributes->getNamedItem('id')->nodeValue;
}
print_r($result);
/*输出:
Array
(
[0] => img2
[1] => img3
[2] => content
)
*/
获取所有class
包含icon
的节点的完整HTML内容:
$nodeList = $xpath->query('//*[contains(@class, "icon")]');
$result = [];
foreach ($nodeList as $node) {
$result[] = $dom->saveHTML($node);
}
print_r($result);
/*输出:
Array
(
[0] => <img class="icon img" id="img2" src="/images/img2.jpg">
[1] => <img class="icon use" id="img3" src="/images/img3.jpg">
[2] => <p class="icon" id="content">Welcome PHP!</p>
)
*/
下一篇:
PHP下载远程文件到指定目录
友情提示:垃圾评论一律封号...