Opened 16 years ago

Closed 16 years ago

#293 closed defect (fixed)

sample field is not converted to html in non-utf8 html documents

Reported by: ruslan shevchenko Owned by: Olly Betts
Priority: normal Milestone: 1.0.8
Component: Omega Version: 1.0.7
Severity: normal Keywords:
Cc: Blocked By:
Blocking: Operating System: All

Description (last modified by Olly Betts)

Problem, that sample field is not converted to utf8, because omega set sample from value of description attribute, but html parsers does not convert attribute values to utf8.

Patch to fix is attached. (against 1.0.7, but I can;t see 1.0.7 in trac version options lins)

Attachments (1)

omega-rssh-293.patch (762 bytes ) - added by ruslan shevchenko 16 years ago.

Download all attachments as: .zip

Change History (8)

by ruslan shevchenko, 16 years ago

Attachment: omega-rssh-293.patch added

comment:1 by Olly Betts, 16 years ago

Description: modified (diff)
Summary: sample field is not converted to html8 in non-utf8 htmpl documentssample field is not converted to html in non-utf8 htmpl documents
Version: other1.0.7

1.0.7 is in the list, but not next to 1.0.6. We imported data from bugzilla. The import script orders the versions with the newest first, but trac adds new entries to the end. I don't know how to correct this stupidity.

This patch doesn't look right to me either. It would lead to a double conversion to UTF-8 in some situations.

I'm suspecting that this and #292 are actually the same issue and we need to parse the document to find any meta http-equiv which specifies the character set, then convert the document to that and reparse.

Could you supply a sample document which shows this problem too? Or a single document which shows both.

comment:2 by Olly Betts, 16 years ago

Summary: sample field is not converted to html in non-utf8 htmpl documentssample field is not converted to html in non-utf8 html documents

comment:3 by Olly Betts, 16 years ago

Well, I fixed the versions at least - it turns out that trac sorts them by date so by setting the dates for all the old releases, they now sort sensibly.

comment:4 by ruslan shevchenko, 16 years ago

Example of document is:

<html>

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> <description content="Моя сторінка (My page)" </description>

</html>

As I understand, process_text is called only for text fragement in htmplarse.cc

And htmlpase.cc contains one and only one call for convert_to_utf8: before process_text (line 216) and attributes are passed to open_tag as is, without converting.

So, I still guess, that path is correct.

comment:5 by ruslan shevchenko, 16 years ago

I'm suspecting that this and #292 are actually the same issue and we need to parse the document to find any meta http-equiv which specifies the character set, then convert the document to that and reparse.

Yes - this can be third solutions for #292 (but not for #293)

comment:6 by Olly Betts, 16 years ago

Milestone: 1.0.8

Ah, I'd misread the code around the call to process_text().

Fixed in trunk [11162].

comment:7 by Olly Betts, 16 years ago

Resolution: fixed
Status: newclosed

Backported to 1.0 branch [11167].

Note: See TracTickets for help on using tickets.