http://stackoverflow.com/questions/1775859/how-to-convert-a-unichar-value-to-an-nsstring-in-objective-c
I've got an international character stored in a unichar variable. This character does not come from a file or url. The variable itself only stores an unsigned short(0xce91) which is in UTF-8 format and translates to the greek capital letter 'A'. I'm trying to put that character into an NSString variable but i fail miserably.
I've tried 2 different ways both of which unsuccessful:
unichar greekAlpha =0xce91;//could have written greekAlpha = 'Α' instead.
NSString*theString =[NSString stringWithFormat:@"Greek Alpha: %C", greekAlpha];
No good. I get some weird chinese characters. As a sidenote this works perfectly with english characters.
Then I also tried this:
NSString*byteString =[[NSString alloc] initWithBytes:&greekAlpha
length:sizeof(unichar)
encoding:NSUTF8StringEncoding];
But this doesn't work either. I'm obviously doing something terribly wrong, but I don't know what. Can someone help me please ? Thanks!
Since 0xce91 is in the UTF-8 format and %C expects it to be in UTF-16 a simple solution like the one above won't work. For stringWithFormat:@"%C" to work you need to input 0x391 which is the UTF-16 unicode.
In order to create a string from the UTF-8 encoded unichar you need to first split the unicode into it's octets and then use initWithBytes:length:encoding.
unichar utf8char = 0xce91; char chars[2]; int len = 1; if (utf8char > 127) { chars[0] = (utf8char >> 8) & (1 << 8) - 1; chars[1] = utf8char & (1 << 8) - 1; len = 2; } else { chars[0] = utf8char; } NSString *string = [[NSString alloc] initWithBytes:chars length:len encoding:NSUTF8StringEncoding];
The above answer is great but doesn't account for UTF-8 characters longer than 16 bits, e.g. the ellipsis symbol - 0xE2,0x80,0xA6. Here's a tweak to the code:
if (utf8char > 65535) { chars[0] = (utf8char >> 16) & 255; chars[1] = (utf8char >> 8) & 255; chars[2] = utf8char & 255; chars[3] = 0x00; } else if (utf8char > 127) { chars[0] = (utf8char >> 8) & 255; chars[1] = utf8char & 255; chars[2] = 0x00; } else { chars[0] = utf8char; chars[1] = 0x00; } NSString *string = [[[NSString alloc] initWithUTF8String:chars] autorelease];
Note the different string initialisation method which doesn't require a length parameter.
Here is an algorithm for UTF-8 encoding on a single character:
if (utf8char<0x80){ chars[0] = (utf8char>>0) & (0x7F | 0x00); chars[1] = 0x00; chars[2] = 0x00; chars[3] = 0x00; } else if (utf8char<0x0800){ chars[0] = (utf8char>>6) & (0x1F | 0xC0); chars[1] = (utf8char>>0) & (0x3F | 0x80); chars[2] = 0x00; chars[3] = 0x00; } else if (utf8char<0x010000) { chars[0] = (utf8char>>12) & (0x0F | 0xE0); chars[1] = (utf8char>>6) & (0x3F | 0x80); chars[2] = (utf8char>>0) & (0x3F | 0x80); chars[3] = 0x00; } else if (utf8char<0x110000) { chars[0] = (utf8char>>18) & (0x07 | 0xF0); chars[1] = (utf8char>>12) & (0x3F | 0x80); chars[2] = (utf8char>>6) & (0x3F | 0x80); chars[3] = (utf8char>>0) & (0x3F | 0x80); }