Today's lecture: characters and strings 1. The h/w have primitives to represent ints of various sizes and floats. How to represent text characters? ASCII (a convention that maps each character to a one-byte number) Developed in 60s, based on the English Alphabet Includes a-z, A-Z, 0-9, some symbols (,.-^><=) Encodes 128 characters, so the MSB is always zero. 0 (null) 0x30 ('0') ... 0x39 ('9') 0x61 ('a') 0x62 ('b') ... 0x41 ('A') 0x42 ('B') .. Unicode is the modern standard includes 120,000 characters, Chinese, Korean, etc. UTF-8 is the common implementation of Unicode. Variable lengths: 1 to 4 bytes. ASCII bytes are valid UTF-8-encoding (1 byte). 2. Strings char *h = "hello"; String is represented an array of characters. But C does not encode array length, how do we know when is the end? an array of characters terminated by the null character '\0' char *h = "hello"; printf("string is %s\n", h); int length(char *s) { int i = 0; while (s[i] != '\0') { i++; } return i; } Another example: How to change a string of characters to only lower case? void ToLower(char *s) { int i = 0; while (s[i]!=0) { if ((s[i] >= 'A') && (s[i] <= 'Z')) { printf("s[i] %c %d\n", s[i], i); s[i] = s[i] + 'a' - 'A'; } i++; } } int main() { char x[20] = "heLLo World"; ToLower(x); printf("%s\n", x); } How to implement atoi()? int Atoi(char *s) { int i = 0; int result = 0; while (s[i]!=0) { if (s[i] >= '0' && s[i] <= '9') { result = result * 10 + (s[i] -'0'); }else{ break; } } return result } 3. Strings and pointers char *w; w = h; //does it make another copy of hello? int copy(char *src, char *dst) { int i = 0; while (src[i]!= '\0') { dst[i] = src[i]; i++; } } copy(h, w); //correct? //change char *w to char w[100]; //the null termination is by convention, not enforced h[5] = 'a'; printf("string is %s\n", h); //??? undefined length(h); //??? undefined 4. array of pointers, pointer to pointers char* names[3] = { "alice", "bob", "clark"}; names: |_____|_____|_____| |a l i c e \0| |b o b \0| |c l a r k \0| char **namep; namep = names; //namep = &names[0]; The most common array of pointers int main(int argc, char *argv[]) //alternativesly, char **argv { int i; for (i = 0; i < argc; i++) { printf("%s ", argv[i]); } printf("\n"); } $./a.out 1 2 3 ./a.out 1 2 3