Ticket #293 (closed defect: fixed)

Opened 3 months ago

Last modified 3 months ago

sample field is not converted to html in non-utf8 html documents

Reported by: rssh Owned by: olly
Priority: normal Milestone: 1.0.8
Component: Omega Version: 1.0.7
Severity: normal Keywords:
Cc: Blocked By:
Operating System: All Blocking:

Description (last modified by olly) (diff)

Problem, that sample field is not converted to utf8, because omega set sample from value of description attribute, but html parsers does not convert attribute values to utf8.

Patch to fix is attached. (against 1.0.7, but I can;t see 1.0.7 in trac version options lins)

Attachments

omega-rssh-293.patch (0.7 kB) - added by rssh 3 months ago.

Change History

Changed 3 months ago by rssh

Changed 3 months ago by olly

  • version changed from other to 1.0.7
  • description modified (diff)
  • summary changed from sample field is not converted to html8 in non-utf8 htmpl documents to sample field is not converted to html in non-utf8 htmpl documents

1.0.7 is in the list, but not next to 1.0.6. We imported data from bugzilla. The import script orders the versions with the newest first, but trac adds new entries to the end. I don't know how to correct this stupidity.

This patch doesn't look right to me either. It would lead to a double conversion to UTF-8 in some situations.

I'm suspecting that this and #292 are actually the same issue and we need to parse the document to find any meta http-equiv which specifies the character set, then convert the document to that and reparse.

Could you supply a sample document which shows this problem too? Or a single document which shows both.

Changed 3 months ago by olly

  • summary changed from sample field is not converted to html in non-utf8 htmpl documents to sample field is not converted to html in non-utf8 html documents

Changed 3 months ago by olly

Well, I fixed the versions at least - it turns out that trac sorts them by date so by setting the dates for all the old releases, they now sort sensibly.

Changed 3 months ago by rssh

Example of document is:

<html>

<meta http-equiv="Content-Type" content="text/html; charset=windows-1251"> <description content="Моя сторінка (My page)" </description>

</html>

As I understand, process_text is called only for text fragement in htmplarse.cc

And htmlpase.cc contains one and only one call for convert_to_utf8: before process_text (line 216) and attributes are passed to open_tag as is, without converting.

So, I still guess, that path is correct.

Changed 3 months ago by rssh

I'm suspecting that this and #292 are actually the same issue and we need to parse the document to find any meta http-equiv which specifies the character set, then convert the document to that and reparse.

Yes - this can be third solutions for #292 (but not for #293)

Changed 3 months ago by olly

  • milestone set to 1.0.8

Ah, I'd misread the code around the call to process_text().

Fixed in trunk [11162].

Changed 3 months ago by olly

  • status changed from new to closed
  • resolution set to fixed

Backported to 1.0 branch [11167].

Note: See TracTickets for help on using tickets.