dom - Parse text and link pairs from HTML into PHP array with same order -


consider html, littered whitespace or irrelevant tags div , span:

<div> <span><a href="#1">title 1</a></span> <p>paragraph 2</p> <p>outside 3 <a href="#4">title 4</a> </p> </div> 

how can convert php array of link , text pairs, in same order in html.

{"#1", "title 1"    }, {null, "paragraph 2"}, {null, "outside 3"  }, {"#4", "title 4"    }, 

the problem dom searches $html->find("a, p") capture 4 twice, once , once inside 3.

i'm wondering if solution traverse document "linearly", human read element element left right, , if node has text, pick parent node's href, if any.

if viable, how go through dom this? have solution, preferably simple html dom parser or simple regexp, alternatively built-in php framework.

i @ https://github.com/salathe/spl-examples/wiki/recursivedomiterator recursevly traverse dom structure.

$dom = new domdocument(); $dom->loadhtml('<html>'.$htmlstring.'</html>'); // wrap initial html in <html></html> since has well-formed $dit = new recursiveiteratoriterator(new recursivedomiterator($dom)); $result = array(); foreach ($dit $node) {     unset($r);     if(trim($node->nodevalue) == "" || $node->childnodes->length > 0){ // non-empty last level nodes         continue;     }     $parent = $node->parentnode;     if($parent->nodename == 'a'){         $r[0] = $parent->getattribute('href');     }     $r[1] = $node->nodevalue;     $result[] = $r; } 

Comments

Popular posts from this blog

unity3d - Rotate an object to face an opposite direction -

angular - Is it possible to get native element for formControl? -

javascript - Why jQuery Select box change event is now working? -