Text Inspector
Zeichenketten in Unicode-Punkte und deren UTF-8-Kodierung aufschlüsselnBreaking down strings into Unicode Points and their UTF-8 Encodings
How it works
The Javascript on this page is source-readable and is definitely not rocket science. Here is just some aid for beginners with some basic Javascript knowledge, wondering how to split a string into its characters and a character into its bytes.
-
Array.from(s)
returns an array from anything that has a length property. A string has a length, thus can be turned into an array of characters. Note this is not an array of bytes, as it takes more than one byte to number characters beyond ASCII. -
codePointAt(0)
returns the code point at position n, thus when applied to a string of one character (here: one individual element of aforementioned array) it returns the code point of its one and only character. -
enc = new TextEncoder('utf-8')
creates an instance of TextEncoder (a class part of Javascript and supported by most browsers since 2014). It is required for calling one of its methods as follows: -
Array.from(enc.encode(c))
returns, for one character c, an array of its bytes. For an ASCII character this will be just an array with one element - the one and only byte needed to represent it. For a multibyte character it will be an array of the individual bytes. Example: The UTF-8 encoding for the "Latin Small Letter Sharp S", in short "ß", is the two byte number (50079). But as we prefer to see hexadecimal numbers (if you do not prefer those yet, get used to them if you want to deal with Unicode), here comes a very standard method with a not so well-known parameter: -
b.toString(16)
returns the hexadecimal representation of the value of b. In our case b is the sole byte or one of multiple bytes representing the character we are examining. The character "ß" from the example above, in UTF-8 encoded as the value of 50079, results in the strings "C3" and "9F". Just believe me that the hexadecimal number C39F (usually written as 0xC39F) has exactly the same value as the decimal number 50079.
The rest of the code is cosmetics like writing the results in a table.