Commit 50c2c16a4d8ca52c4abcbef638f5105a9b0d1ee0

Authored by Gabriel Mazetto
1 parent 48a36851

Better algorithm to deal with encodings. Moved fallback rescue message from view to encode library.

This helps fix cases where UTF-8 is wrongly identified as ISO-8859-1. We will only try to convert strings if we are 100% sure about the charset, otherwise, we will fallback to UTF-8.
app/views/commits/_commit.html.haml
... ... @@ -8,7 +8,7 @@
8 8 %strong.cgray= commit.author_name
9 9 –
10 10 = image_tag gravatar_icon(commit.author_email), :class => "avatar", :width => 16
11   - %span.row_title= truncate(commit.safe_message, :length => 50) rescue "--broken encoding"
  11 + %span.row_title= truncate(commit.safe_message, :length => 50)
12 12  
13 13 %span.right.cgray
14 14 = time_ago_in_words(commit.committed_date)
... ...
lib/gitlabhq/encode.rb
... ... @@ -8,16 +8,19 @@ module Gitlabhq
8 8 def utf8 message
9 9 return nil unless message
10 10  
11   - encoding = detect_encoding(message)
12   - if encoding
  11 + detect = CharlockHolmes::EncodingDetector.detect(message) rescue {}
  12 +
  13 + # It's better to default to UTF-8 as sometimes it's wrongly detected as another charset
  14 + if detect[:encoding] && detect[:confidence] == 100
13 15 CharlockHolmes::Converter.convert(message, encoding, 'UTF-8')
14 16 else
15 17 message
16 18 end.force_encoding("utf-8")
  19 +
17 20 # Prevent app from crash cause of
18 21 # encoding errors
19 22 rescue
20   - ""
  23 + "--broken encoding: #{encoding}"
21 24 end
22 25  
23 26 def detect_encoding message
... ...