Opened 19 years ago

Closed 18 years ago

Last modified 18 years ago

#91 closed defect (released)

javascript can block html indexing

Reported by: maarten Owned by: Olly Betts
Priority: normal Milestone:
Component: Omega Version: 0.9.6
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All


When using "omindex" to index a directory filled with html-files, some javascript will stop the body from being indexed. For example on the following simple page:

<html> <head> <script language="Javascript"> function test(i) {

if(1<i) row=2;

} </script> </head> <body> The javascript bug </body> </html>

This page can't be found afterwords. The problem lies in "<". The program thinks its opening a tag and there for ignores all of the following text. Since the "tag" is never closed for the rest of the document.

Attachments (1)

omega-htmlparser-ignore-javascript-lessthan.patch (1.5 KB ) - added by Olly Betts 18 years ago.
Patch to fix this bug

Download all attachments as: .zip

Change History (6)

comment:1 by Olly Betts, 19 years ago

op_sys: LinuxAll
rep_platform: PCAll
Status: newassigned

Hmm, indeed - we're going to need to treat <script> specially I think.

comment:2 by Olly Betts, 18 years ago

Resolution: fixed
Status: assignedclosed

Fixed in SVN (rev 7176).

I'll attach the patch so you can verify it, and use it if you wish.

by Olly Betts, 18 years ago

Patch to fix this bug

comment:3 by maarten, 18 years ago

Thx, that did the trick.

comment:4 by Olly Betts, 18 years ago

Resolution: fixedverified

comment:5 by Olly Betts, 18 years ago

Operating System: All
Resolution: verifiedreleased
Note: See TracTickets for help on using tickets.