about preg_match_all statement

**pbmods** · Jul 27 '08, 01:50 PM

Heya, Swethak.

What is your code doing now that is different from what you want it to do?

**swethak** · Jul 28 '08, 04:29 AM

Originally posted by pbmods

Heya, Swethak.

What is your code doing now that is different from what you want it to do?

It is for capture the images from website. i Want capture the text information from website.

**Gulzor** · Jul 28 '08, 08:08 AM

If you are working with PHP5, you can use the DOM API for that.

Adapt this to your needs :
[php]
<?php
$htmlString = file_get_conten ts('url_or_path _to_html_file') ;
$htmlDoc = DOMDocument::lo adHTML($htmlStr ing);
$xpath = new DOMXPath($htmlD oc);

/* fetch the content of all tags */
$pNodesList = $xpath->query('//p');
for ($i=0; $i<$pNodesList->length; $i++) {
$pNode = $pNodesList->item($i);
echo $pNode->nodeValue, "\n";
}

?>
[/php]

May not be the best method but I prefer handling HTML document with the DOM API instead of knocking my head on the walls with regex :P

**swethak** · Jul 28 '08, 09:59 AM

Originally posted by Gulzor

If you are working with PHP5, you can use the DOM API for that.

Adapt this to your needs :
[php]
<?php
$htmlString = file_get_conten ts('url_or_path _to_html_file') ;
$htmlDoc = DOMDocument::lo adHTML($htmlStr ing);
$xpath = new DOMXPath($htmlD oc);

/* fetch the content of all tags */
$pNodesList = $xpath->query('//p');
for ($i=0; $i<$pNodesList->length; $i++) {
$pNode = $pNodesList->item($i);
echo $pNode->nodeValue, "\n";
}

?>
[/php]

May not be the best method but I prefer handling HTML document with the DOM API instead of knocking my head on the walls with regex :P

I used like that way i got below errors.plz tell that whats the mistake.

Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

Warning: DOMDocument::lo adHTML() [function.DOMDoc ument-loadHTML]: htmlParseEntity Ref: expecting ';' in Entity, line: 34 in C:\wamp\www\tes t\textdata.php on line 3

**Gulzor** · Jul 28 '08, 10:37 AM

These are "just" warnings resulting in wrong or unsupported html entities or something else. It's just impossible to parse a html document without getting these warnings...

If your texts are not between , you can replace //p by //td. Like I said, you need to adapt it to your needs.

**swethak** · Jul 28 '08, 10:55 AM

Originally posted by Gulzor

These are "just" warnings resulting in wrong or unsupported html entities or something else. It's just impossible to parse a html document without getting these warnings...

If your texts are not between , you can replace //p by //td. Like I said, you need to adapt it to your needs.

If i use the condition as if the data is in between tags it shows the data otherwise it didn't give any error.How i use the condition for that .Plz help me.

**Gulzor** · Jul 28 '08, 11:53 AM

Originally posted by swethak

If i use the condition as if the data is in between tags it shows the data otherwise it didn't give any error.How i use the condition for that .Plz help me.

I don't understand what your problem is now... not only tag hold texts. <li>, <td>, and more also do.

**mobs** · Aug 6 '08, 03:34 PM

Say that I just wanted to retrieve the number 30735 from the following code, how would you go about doing that?

Code:

<a href="/?item=30735">River Runner</a>

**pbmods** · Aug 6 '08, 11:03 PM

Heya, Mobs. Welcome to Bytes!

The only part that we really care about is:

Code:

<a href="/?item=30735

Now, we have to make a couple of assumptions:

The URL might have a path and/or other query variables prepended. E.g.:
Code:
```
<a href="/path/to/some.php?file=test&item=123456"
```
The URL might have some stuff after it. E.g.,:
Code:
```
<a href="/?item=654321&amp;visitor=1"
```
The anchor tag might have attributes before the href attribute. E.g.,:
Code:
```
<a target="_blank" href="/?item=13579"
```

We are going to assume that the tag is well-formed (ends with a '>' and the href attribute is properly-quoted with any quotes inside of it percent- or ampersand-escaped).

With that in mind, we need to be able to skip over anything we don't care about and focus only on what we want:

[code=regexp]
/<a[^>]*href="[^"]+item=(\d+)/
[/code]

This should be enough to harvest item IDs from anchor tags on the page.

**swethak** · Aug 7 '08, 06:08 AM

about preg_match_all statement

hi,

i write a code to capture all the information in between tags.But In between the tags some <img> tags also there.And i write a condition as i capture all the information in between tags and didn't take the img tags information.How i write the condition for that.plz help me.

[php]
<?php
$content= file_get_conten ts('http://www.website.com ');
preg_match_all( '/(.*)<\/p>/s', $content, $match, PREG_PATTERN_OR DER);

echo "Capture Images : ";
echo " ";
print_r($match[0]);
?>
[/php]

In that preg_match_all( ( '/(.* In that how i add the condition as not take image tags.Anybody plz give reply.

about preg_match_all statement

about preg_match_all statement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment