#2695 URI UTF-8 Encoding

SlimerDude Thu 3 May 2018

This is very related to the UTF-8 Encoding thread, but also very different.

I came across this URL - http://www.recipesee.com/how-to-make-sausage-at-home-%e0%a4%b8%e0%a4%b8%e0%a5%87%e0%a4%9c-%e0%a4%b0%e0%a5%87%e0%a4%b8%e0%a4%bf%e0%a4%aa%e0%a5%80-easy-sausage-recipe-yummy-food-world-%f0%9f%8d%b4-91/ which is perfectly valid - but when you try to decode it in Fantom:

url := "http://www.recipesee.com/how-to-make-sausage..."
Uri.decode(url)

sys::ParseErr: Invalid Uri: Invalid UTF-8 encoding
  fan.sys.Uri.decode (Uri.java:55)
  fan.sys.Uri.decode (Uri.java:45)

But unlike the earlier thread, I do not think the dodgy characters in question should be replaced with the Unicode replacement character. That's because (as the name suggests) URIs are identifiers, and if you change an ID then it is no longer valid as it now identifies something else.

But it would be a lot more correct if the error message was changed to:

Invalid Uri: Unsupported UTF-8 encoding

That way the 2 scenarios (invalid and unsupported) could be identified and handled differently.

Login or Signup to reply.