I decided to try again how to handle UTF8 strings on AmigaOS4. Utility library has had functions to convert strings between UTF8 and UCS4 a little while. I was wondering always what's the use for this type of conversions. I realized that UCS4 is compatible with ASCII (ISO-8859-15 on my system). The point is that while UTF8 is variable size UCS4 is fixed size. UTF-8 seems to be compatible with ASCII -7 bit only. UCS4 seems to be ASCII -8 bit compatible.
STRPTR ConvertToASCII(CONST_STRPTR inbuffer, uint32 inbuff_size) { STRPTR ascii_buffer=NULL; uint32 r=0; uint32 ucs_size=inbuff_size*sizeof(int32)+1; // UCS4 buffer should be int32* // +1 is for null terminator int32 *ucs_buffer=(int32 *)IExec->AllocVecTags(ucs_size,TAG_DONE); if (ucs_buffer!=NULL) { int32 conv_chars_cnt=IUtility->UTF8toUCS4(inbuffer,ucs_buffer,ucs_size,UTF_INVALID_SUBST_FFFD); STRPTR ascii_buffer=(STRPTR)IExec->AllocVecTags(conv_chars_cnt+1,TAG_DONE); if (ascii_buffer!=NULL) { // Copy UCS4 chars to ASCII buffer for (r=0; r<conv_chars_cnt; r++) ascii_buffer[r]=(char)ucs_buffer[r]; ascii_buffer[conv_chars_cnt]=0; } IExec->FreeVec(ucs_buffer); } return ascii_buffer; } // It goes otherway around as well STRPTR ConvertToUTF8(CONST_STRPTR inbuffer, uint32 inbuff_size) { STRPTR utf8_buffer=NULL; uint32 r=0; uint32 ucs_size=inbuff_size*sizeof(int32)+1; // UCS4 buffer should be int32* // +1 is for null terminator int32 *ucs_buffer=(int32 *)IExec->AllocVecTags(ucs_size,TAG_DONE); if (ucs_buffer!=NULL) { // Copy ASCII chars to UCS4 buffer for (r=0; r<inbuff_size; r++) ucs_buffer[r]=(int32)inbuffer[r]; ucs_buffer[inbuff_size]=0; uint32 utf8_buff_size=IUtility->UCS4Count(ucs_buffer,FALSE)*4+1; // UTF8 char size can be 1-4 bytes so reserve room for four bytes per char // +1 is for null terminator utf8_buffer=(STRPTR)IExec->AllocVecTags(utf8_buff_size,TAG_DONE); if (utf8_buffer!=NULL) { int32 conv_chars_cnt=IUtility->UCS4toUTF8(ucs_buffer,utf8_buffer,ucs_chars_cnt,UTF_INVALID_SUBST_FFFD); // You can save utf_buffer to a file or do what ever you like } IExec->FreeVec(ucs_buffer); } return utf8_buffer; } // IUtility->UTF8Count() and IUtility->UCS4Count() functions contains a validator to check validity of the strings.
I did tests with a-z, scandinavian letters and a couple of accent letters.
Comments
Submitted by OldFart on
Hi,
Whilst looking at your code, which btw is nice and clear, I came across a point of concern, due to a possibly shadowing declaration,
- Line 13 contains a shadowing declaration of line 3
Regards,
OldFart
-