Convert UTF8 strings to ASCII and back (without alien libs)

  • up
    100%
  • down
    0%

I decided to try again how to handle UTF8 strings on AmigaOS4. Utility library has had functions to convert strings between UTF8 and UCS4 a little while. I was wondering always what's the use for this type of conversions. I realized that UCS4 is compatible with ASCII (ISO-8859-15 on my system). The point is that while UTF8 is variable size UCS4 is fixed size. UTF-8 seems to be compatible with ASCII -7 bit only. UCS4 seems to be ASCII -8 bit compatible.

  1. STRPTR ConvertToASCII(CONST_STRPTR inbuffer, uint32 inbuff_size)
  2. {
  3. STRPTR ascii_buffer=NULL;
  4. uint32 r=0;
  5.  
  6. uint32 ucs_size=inbuff_size*sizeof(int32)+1; // UCS4 buffer should be int32* // +1 is for null terminator
  7.  
  8. int32 *ucs_buffer=(int32 *)IExec->AllocVecTags(ucs_size,TAG_DONE);
  9. if (ucs_buffer!=NULL)
  10. {
  11. int32 conv_chars_cnt=IUtility->UTF8toUCS4(inbuffer,ucs_buffer,ucs_size,UTF_INVALID_SUBST_FFFD);
  12.  
  13. STRPTR ascii_buffer=(STRPTR)IExec->AllocVecTags(conv_chars_cnt+1,TAG_DONE);
  14. if (ascii_buffer!=NULL)
  15. {
  16. // Copy UCS4 chars to ASCII buffer
  17. for (r=0; r<conv_chars_cnt; r++) ascii_buffer[r]=(char)ucs_buffer[r];
  18. ascii_buffer[conv_chars_cnt]=0;
  19. }
  20.  
  21. IExec->FreeVec(ucs_buffer);
  22. }
  23.  
  24. return ascii_buffer;
  25. }
  26.  
  27. // It goes otherway around as well
  28.  
  29. STRPTR ConvertToUTF8(CONST_STRPTR inbuffer, uint32 inbuff_size)
  30. {
  31. STRPTR utf8_buffer=NULL;
  32. uint32 r=0;
  33.  
  34. uint32 ucs_size=inbuff_size*sizeof(int32)+1; // UCS4 buffer should be int32* // +1 is for null terminator
  35.  
  36. int32 *ucs_buffer=(int32 *)IExec->AllocVecTags(ucs_size,TAG_DONE);
  37. if (ucs_buffer!=NULL)
  38. {
  39. // Copy ASCII chars to UCS4 buffer
  40. for (r=0; r<inbuff_size; r++) ucs_buffer[r]=(int32)inbuffer[r];
  41. ucs_buffer[inbuff_size]=0;
  42.  
  43. uint32 utf8_buff_size=IUtility->UCS4Count(ucs_buffer,FALSE)*4+1; // UTF8 char size can be 1-4 bytes so reserve room for four bytes per char // +1 is for null terminator
  44. utf8_buffer=(STRPTR)IExec->AllocVecTags(utf8_buff_size,TAG_DONE);
  45. if (utf8_buffer!=NULL)
  46. {
  47. int32 conv_chars_cnt=IUtility->UCS4toUTF8(ucs_buffer,utf8_buffer,ucs_chars_cnt,UTF_INVALID_SUBST_FFFD);
  48. // You can save utf_buffer to a file or do what ever you like
  49. }
  50.  
  51. IExec->FreeVec(ucs_buffer);
  52. }
  53.  
  54. return utf8_buffer;
  55. }
  56.  
  57. // IUtility->UTF8Count() and IUtility->UCS4Count() functions contains a validator to check validity of the strings.

I did tests with a-z, scandinavian letters and a couple of accent letters.

Tags: 

Blog post type: 

Comments

OldFart's picture

Hi,

Whilst looking at your code, which btw is nice and clear, I came across a point of concern, due to a possibly shadowing declaration,
- Line 13 contains a shadowing declaration of line 3

Regards,
OldFart
-