Discussion:
[Xmltv-devel] [ xmltv-Bugs-3382396 ] _huro: strip sponsored links not work after site change.
SourceForge.net
2011-07-29 19:26:54 UTC
Permalink
Bugs item #3382396, was opened at 2011-07-29 21:26
Message generated for change (Tracker Item Submitted) made by ullmus
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: tv_grab_huro
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nagy ullmus (ullmus)
Assigned to: Zsolt Bagoly (zbagoly)
Summary: _huro: strip sponsored links not work after site change.

Initial Comment:
_huro: strip sponsored links from port.ro which break the validation (fixed in from Revision 1.42) not work after site change.

Example form channel id 10199.port.ro

&nbsp;<span class="spons_link_in_event_box"><a onclick="window.open('http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_','_blank','scrollbars=yes,location=yes,menubar=yes,resizable=yes,toolbar=yes,width=860,height=580');return false;" href="http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_"><font color="blue"><b>Pariaţi LIVE pe rezultate!</b></font></a></span>

<div style="display: none;">
<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'" type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});
// ]]>
</script><script charset="iso-8859-2" src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=u8Fa8Nk2Ox2Be4Rx3K&amp;re=http%3A%2F%2Fport.ro%2F"></script>
<noscript><a href="http://ad.adverticum.net/click.prm?zona=62818" target="_blank" title="Click here!"><img border="0" src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement" /></a></noscript>
</div>

Please fix this again.
Thanks

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046
SourceForge.net
2011-07-29 19:47:53 UTC
Permalink
Bugs item #3382396, was opened at 2011-07-29 21:26
Message generated for change (Settings changed) made by ullmus
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: tv_grab_huro
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nagy ullmus (ullmus)
Assigned to: Attila Nagy (attila_nagy)
Summary: _huro: strip sponsored links not work after site change.

Initial Comment:
_huro: strip sponsored links from port.ro which break the validation (fixed in from Revision 1.42) not work after site change.

Example form channel id 10199.port.ro

&nbsp;<span class="spons_link_in_event_box"><a onclick="window.open('http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_','_blank','scrollbars=yes,location=yes,menubar=yes,resizable=yes,toolbar=yes,width=860,height=580');return false;" href="http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_"><font color="blue"><b>Pariaţi LIVE pe rezultate!</b></font></a></span>

<div style="display: none;">
<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'" type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});
// ]]>
</script><script charset="iso-8859-2" src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=u8Fa8Nk2Ox2Be4Rx3K&amp;re=http%3A%2F%2Fport.ro%2F"></script>
<noscript><a href="http://ad.adverticum.net/click.prm?zona=62818" target="_blank" title="Click here!"><img border="0" src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement" /></a></noscript>
</div>

Please fix this again.
Thanks

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046
SourceForge.net
2011-07-30 01:55:32 UTC
Permalink
Bugs item #3382396, was opened at 2011-07-29 21:26
Message generated for change (Comment added) made by miklosistvan
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: tv_grab_huro
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nagy ullmus (ullmus)
Assigned to: Attila Nagy (attila_nagy)
Summary: _huro: strip sponsored links not work after site change.

Initial Comment:
_huro: strip sponsored links from port.ro which break the validation (fixed in from Revision 1.42) not work after site change.

Example form channel id 10199.port.ro

&nbsp;<span class="spons_link_in_event_box"><a onclick="window.open('http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_','_blank','scrollbars=yes,location=yes,menubar=yes,resizable=yes,toolbar=yes,width=860,height=580');return false;" href="http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_"><font color="blue"><b>Pariaţi LIVE pe rezultate!</b></font></a></span>

<div style="display: none;">
<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'" type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});
// ]]>
</script><script charset="iso-8859-2" src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=u8Fa8Nk2Ox2Be4Rx3K&amp;re=http%3A%2F%2Fport.ro%2F"></script>
<noscript><a href="http://ad.adverticum.net/click.prm?zona=62818" target="_blank" title="Click here!"><img border="0" src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement" /></a></noscript>
</div>

Please fix this again.
Thanks

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 03:55

Message:
The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at
http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as
comments:

</ comment>
<! comment> notice how this is similar to
the first two and last characters of <! [cdata[...//]] >

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046
SourceForge.net
2011-07-30 02:03:42 UTC
Permalink
Bugs item #3382396, was opened at 2011-07-29 21:26
Message generated for change (Comment added) made by ullmus
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: tv_grab_huro
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nagy ullmus (ullmus)
Assigned to: Attila Nagy (attila_nagy)
Summary: _huro: strip sponsored links not work after site change.

Initial Comment:
_huro: strip sponsored links from port.ro which break the validation (fixed in from Revision 1.42) not work after site change.

Example form channel id 10199.port.ro

&nbsp;<span class="spons_link_in_event_box"><a onclick="window.open('http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_','_blank','scrollbars=yes,location=yes,menubar=yes,resizable=yes,toolbar=yes,width=860,height=580');return false;" href="http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_"><font color="blue"><b>Pariaţi LIVE pe rezultate!</b></font></a></span>

<div style="display: none;">
<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'" type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});
// ]]>
</script><script charset="iso-8859-2" src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=u8Fa8Nk2Ox2Be4Rx3K&amp;re=http%3A%2F%2Fport.ro%2F"></script>
<noscript><a href="http://ad.adverticum.net/click.prm?zona=62818" target="_blank" title="Click here!"><img border="0" src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement" /></a></noscript>
</div>

Please fix this again.
Thanks

----------------------------------------------------------------------

Comment By: nagy ullmus (ullmus)
Date: 2011-07-30 04:03

Message:
Using HTML::Parser to strip HTML tags from files is good idea.
I noticed how //<![cdata[ ... //]]> and the javascript between that is
not
stripped.
Any idea how to do this?
I'm new to perl and i need same help, please.

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 03:55

Message:
The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at
http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as
comments:

</ comment>
<! comment> notice how this is similar to
the first two and last characters of <! [cdata[...//]] >

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046
SourceForge.net
2011-07-30 11:20:34 UTC
Permalink
Bugs item #3382396, was opened at 2011-07-29 21:26
Message generated for change (Comment added) made by miklosistvan
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: tv_grab_huro
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nagy ullmus (ullmus)
Assigned to: Attila Nagy (attila_nagy)
Summary: _huro: strip sponsored links not work after site change.

Initial Comment:
_huro: strip sponsored links from port.ro which break the validation (fixed in from Revision 1.42) not work after site change.

Example form channel id 10199.port.ro

&nbsp;<span class="spons_link_in_event_box"><a onclick="window.open('http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_','_blank','scrollbars=yes,location=yes,menubar=yes,resizable=yes,toolbar=yes,width=860,height=580');return false;" href="http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_"><font color="blue"><b>Pariaţi LIVE pe rezultate!</b></font></a></span>

<div style="display: none;">
<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'" type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});
// ]]>
</script><script charset="iso-8859-2" src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=u8Fa8Nk2Ox2Be4Rx3K&amp;re=http%3A%2F%2Fport.ro%2F"></script>
<noscript><a href="http://ad.adverticum.net/click.prm?zona=62818" target="_blank" title="Click here!"><img border="0" src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement" /></a></noscript>
</div>

Please fix this again.
Thanks

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 13:20

Message:
Although the Regex that you request is straightforward,

It is seldom a good idea to parse html with regular expressions.

Apparently minor changes to your data or requirements can break them
completely.

It is safer to use any of several modules from CPAN.

If you are confident that this will not happen.
Example:

--------------Code------------------------------------------------------------------------

use strict;
use warnings;
use Readonly;
my $html = <<'END_HTML';

<div style="display: none;">

<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript
src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'"
type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});

// ]]>
</script><script charset="iso-8859-2"
src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=c8Gh1Xu1Gp2Ob5Fi7Q&amp;re=http%3A%2F%2Fport.ro%2F"></script>

<noscript><a href="http://ad.adverticm.net/click.prm?zona=62818"
target="_blank" title="Click here!"><img border="0"
src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement"
/></a></noscript>
</div>

END_HTML
Readonly::Scalar my $div_tag => qr{<div.*>};
Readonly::Scalar my $div_text => qr{.*}ms;
Readonly::Scalar my $div_end_tag => qr{</div>};
Readonly::Scalar my $div => qr{$div_tag$div_text$div_end_tag};
$html =~ s/$div//g;
print $html;
------------------------------------------------------------------------------------------

But probably, maintainer of this grab will fix bug otherwise (best
solution)
Greetings.

----------------------------------------------------------------------

Comment By: nagy ullmus (ullmus)
Date: 2011-07-30 04:03

Message:
Using HTML::Parser to strip HTML tags from files is good idea.
I noticed how //<![cdata[ ... //]]> and the javascript between that is
not
stripped.
Any idea how to do this?
I'm new to perl and i need same help, please.

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 03:55

Message:
The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at
http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as
comments:

</ comment>
<! comment> notice how this is similar to
the first two and last characters of <! [cdata[...//]] >

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046
SourceForge.net
2011-07-31 11:10:16 UTC
Permalink
Bugs item #3382396, was opened at 2011-07-29 22:26
Message generated for change (Comment added) made by pojar-george
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: tv_grab_huro
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nagy ullmus (ullmus)
Assigned to: Attila Nagy (attila_nagy)
Summary: _huro: strip sponsored links not work after site change.

Initial Comment:
_huro: strip sponsored links from port.ro which break the validation (fixed in from Revision 1.42) not work after site change.

Example form channel id 10199.port.ro

&nbsp;<span class="spons_link_in_event_box"><a onclick="window.open('http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_','_blank','scrollbars=yes,location=yes,menubar=yes,resizable=yes,toolbar=yes,width=860,height=580');return false;" href="http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_"><font color="blue"><b>Pariaţi LIVE pe rezultate!</b></font></a></span>

<div style="display: none;">
<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'" type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});
// ]]>
</script><script charset="iso-8859-2" src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=u8Fa8Nk2Ox2Be4Rx3K&amp;re=http%3A%2F%2Fport.ro%2F"></script>
<noscript><a href="http://ad.adverticum.net/click.prm?zona=62818" target="_blank" title="Click here!"><img border="0" src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement" /></a></noscript>
</div>

Please fix this again.
Thanks

----------------------------------------------------------------------

Comment By: Pojar George (pojar-george)
Date: 2011-07-31 14:10

Message:
I made ​​a patch for this bug.

I hope that will be accepted.

Greetings.

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 14:20

Message:
Although the Regex that you request is straightforward,

It is seldom a good idea to parse html with regular expressions.

Apparently minor changes to your data or requirements can break them
completely.

It is safer to use any of several modules from CPAN.

If you are confident that this will not happen.
Example:

--------------Code------------------------------------------------------------------------

use strict;
use warnings;
use Readonly;
my $html = <<'END_HTML';

<div style="display: none;">

<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript
src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'"
type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});

// ]]>
</script><script charset="iso-8859-2"
src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=c8Gh1Xu1Gp2Ob5Fi7Q&amp;re=http%3A%2F%2Fport.ro%2F"></script>

<noscript><a href="http://ad.adverticm.net/click.prm?zona=62818"
target="_blank" title="Click here!"><img border="0"
src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement"
/></a></noscript>
</div>

END_HTML
Readonly::Scalar my $div_tag => qr{<div.*>};
Readonly::Scalar my $div_text => qr{.*}ms;
Readonly::Scalar my $div_end_tag => qr{</div>};
Readonly::Scalar my $div => qr{$div_tag$div_text$div_end_tag};
$html =~ s/$div//g;
print $html;
------------------------------------------------------------------------------------------

But probably, maintainer of this grab will fix bug otherwise (best
solution)
Greetings.

----------------------------------------------------------------------

Comment By: nagy ullmus (ullmus)
Date: 2011-07-30 05:03

Message:
Using HTML::Parser to strip HTML tags from files is good idea.
I noticed how //<![cdata[ ... //]]> and the javascript between that is
not
stripped.
Any idea how to do this?
I'm new to perl and i need same help, please.

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 04:55

Message:
The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at
http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as
comments:

</ comment>
<! comment> notice how this is similar to
the first two and last characters of <! [cdata[...//]] >

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046
SourceForge.net
2011-08-01 15:13:10 UTC
Permalink
Bugs item #3382396, was opened at 2011-07-29 21:26
Message generated for change (Comment added) made by ullmus
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: tv_grab_huro
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nagy ullmus (ullmus)
Assigned to: Attila Nagy (attila_nagy)
Summary: _huro: strip sponsored links not work after site change.

Initial Comment:
_huro: strip sponsored links from port.ro which break the validation (fixed in from Revision 1.42) not work after site change.

Example form channel id 10199.port.ro

&nbsp;<span class="spons_link_in_event_box"><a onclick="window.open('http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_','_blank','scrollbars=yes,location=yes,menubar=yes,resizable=yes,toolbar=yes,width=860,height=580');return false;" href="http://ad2.ip.ro/please/redirect/5405/1/1/10/?param=165779/169144_0_"><font color="blue"><b>Pariaţi LIVE pe rezultate!</b></font></a></span>

<div style="display: none;">
<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'" type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});
// ]]>
</script><script charset="iso-8859-2" src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=u8Fa8Nk2Ox2Be4Rx3K&amp;re=http%3A%2F%2Fport.ro%2F"></script>
<noscript><a href="http://ad.adverticum.net/click.prm?zona=62818" target="_blank" title="Click here!"><img border="0" src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement" /></a></noscript>
</div>

Please fix this again.
Thanks

----------------------------------------------------------------------
Post by SourceForge.net
Comment By: nagy ullmus (ullmus)
Date: 2011-08-01 17:13

Message:
I made changes from the patch file and it seems good.
Thank you very much.

----------------------------------------------------------------------

Comment By: Pojar George (pojar-george)
Date: 2011-07-31 13:10

Message:
I made ​​a patch for this bug.

I hope that will be accepted.

Greetings.

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 13:20

Message:
Although the Regex that you request is straightforward,

It is seldom a good idea to parse html with regular expressions.

Apparently minor changes to your data or requirements can break them
completely.

It is safer to use any of several modules from CPAN.

If you are confident that this will not happen.
Example:

--------------Code------------------------------------------------------------------------

use strict;
use warnings;
use Readonly;
my $html = <<'END_HTML';

<div style="display: none;">

<!-- Adserver zone (write): 62818, port_bet365_av_mero -->
<script type="text/javascript">
// <![CDATA[
if(!window.goA)document.write('<sc'+'ript
src="http://imgs.adverticum.net/scripts/gwloader.js?ord='+Math.floor(Math.random()*1000000000)+'"
type="text/javascript"><\/sc'+'ript>');
// ]]>
</script><script type="text/javascript">
// <![CDATA[
if(window.goA)goA.addZone(62818,{displayOptions:{bannerhome:'http://ad.adverticum.net'}});

// ]]>
</script><script charset="iso-8859-2"
src="http://ad.adverticum.net/js.prm?zona=62818&amp;ord=c8Gh1Xu1Gp2Ob5Fi7Q&amp;re=http%3A%2F%2Fport.ro%2F"></script>

<noscript><a href="http://ad.adverticm.net/click.prm?zona=62818"
target="_blank" title="Click here!"><img border="0"
src="http://ad.adverticum.net/img.prm?zona=62818" alt="Advertisement"
/></a></noscript>
</div>

END_HTML
Readonly::Scalar my $div_tag => qr{<div.*>};
Readonly::Scalar my $div_text => qr{.*}ms;
Readonly::Scalar my $div_end_tag => qr{</div>};
Readonly::Scalar my $div => qr{$div_tag$div_text$div_end_tag};
$html =~ s/$div//g;
print $html;
------------------------------------------------------------------------------------------

But probably, maintainer of this grab will fix bug otherwise (best
solution)
Greetings.

----------------------------------------------------------------------

Comment By: nagy ullmus (ullmus)
Date: 2011-07-30 04:03

Message:
Using HTML::Parser to strip HTML tags from files is good idea.
I noticed how //<![cdata[ ... //]]> and the javascript between that is
not
stripped.
Any idea how to do this?
I'm new to perl and i need same help, please.

----------------------------------------------------------------------

Comment By: miklos istvan (miklosistvan)
Date: 2011-07-30 03:55

Message:
The CDATA tag can be looked upon as being a comment in HTML.

According to the documentation at
http://search.cpan.org/~gaas/HTML-Parser-3.56/Parser.pm
you have to disable the strict_comment switch to strip such tags:

$p->strict_comment( $bool )
By default, comments are terminated by the first occurrence of "-->".
This is the behaviour of most popular browsers (like Mozilla, Opera
and MSIE), but it is not correct according to the official HTML
standard. Officially, you need an even number of "--" tokens before
the closing ">" is recognized and there may not be anything but
whitespace between an even and an odd "--".

The official behaviour is enabled by enabling this attribute.

Enabling of 'strict_comment' also disables recognizing these forms as
comments:

</ comment>
<! comment> notice how this is similar to
the first two and last characters of <! [cdata[...//]] >

----------------------------------------------------------------------

You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=424135&aid=3382396&group_id=39046
Loading...