This tutorial shows you how to perform UTF-8 encoding & decoding in Deno.
UTF-8 is a fixed-width character encoding. It's the most common encoding for the World Wide Web. It can be used to encode all 1,112,04 valid character code points in Unicode.
UTF-8 works by encoding each character into one to four bytes depending on the code point of the character. Frequently used characters are usually encoded to fewer bytes. You can see the table below. The x characters are replaced by the bits of the code point. For example, If a character's code point is in U+0000 ~ U+007F range, it will be encoded to one byte. If the character's code point is in U+0800 ~ U+FFFF range, it will be encoded to three bytes.
Number of Bytes | Code point range | Byte 1 | Byte 2 | Byte 3 | Byte 4 |
---|---|---|---|---|---|
1 | U+0000 ~ U+007F | 0xxxxxxx | |||
2 | U+0080 ~ U+07FF | 110xxxxx | 10xxxxxx | ||
3 | U+0800 ~ U+FFFF | 1110xxxx | 10xxxxxx | 10xxxxxx | |
4 | U+10000 ~ U+10FFFF | 11110xxx | 10xxxxxx | 10xxxxxx | 10xxxxxx |
As an example, we are going to encode a string 'wð𐍈lhå
using UTF-8 encoding. For each character, you need to get the code point and match it with the above table to determine the result in binary. After that, convert the binary result to the corresponding UTF-8 characters.
Character | Code Point | Binary | UTF-8 Binary | UTF-8 Character |
---|---|---|---|---|
w | U+0077 | 1110111 | 01110111 | 119 |
ð | U+00F0 | 11110000 | 11000011 10110000 | 195,176 |
𐍈 | U+10348 | 000010000001101001000 | 11110000 10010000 10001101 10001000 | 240,144,141,136 |
l | U+006C | 1101100 | 01101100 | 108 |
h | U+0068 | 1101000 | 01101000 | 104 |
å | U+00E5 | 11100101 | 11000011 10100101 | 195,165 |
To perform UTF-8 encoding and decoding in Deno, you don't have to implement the encode and decode functions yourself. Deno has TextEncoder
and TextDecoder
for that purpose. The usage examples are shown below.
Using TextEncoder
and TextDecoder
Encode Using TextEncoder
TextEncoder
has a function named encode
which returns the result of running UTF-8 encoder.
encode(input?: string): Uint8Array;
Example:
import { base32Encode } from './deps.ts';
const textEncoder = new TextEncoder();
const encodedValue = textEncoder.encode('wð𐍈lhå');
console.log(`encodedValue: ${encodedValue}`);
Output:
encodedValue: 119,195,176,240,144,141,136,108,104,195,165
TextEncoder
also has a function named encodeInto
. It encodes the value passed as source
and stores the result in destination
. The function returns an object with two fields; read
and written
. read
is the number of converted code units of source, while written
is the number of bytes modified in destination.
encodeInto(source: string, destination: Uint8Array): TextEncoderEncodeIntoResult;
Example:
const textEncoder = new TextEncoder();
const bytes = new Uint8Array(64);
const result = textEncoder.encodeInto('wð𐍈lhå', bytes);
console.log(bytes);
console.log(result);
Output:
Uint8Array(64) [
119, 195, 176, 240, 144, 141, 136, 108, 104, 195, 165, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
]
{ read: 7, written: 11 }
Decode Using TextDecoder
TextDecoder
's decode
function can be used to decode a UTF-8 encoded value into a string.
decode(input?: BufferSource, options?: TextDecodeOptions): string;
Example:
import { base32Decode } from './deps.ts';
const textDecoder = new TextDecoder();
const decodedValue = textDecoder.decode(encodedValue);
console.log(`decodedValue: ${decodedValue}`);
Output:
decodedValue: wð𐍈lhå5
Using Deno std
UTF-8 Module
The above solution requires you to create new TextEncoder
and TextDecoder
instances in each function or file where you want to perform encoding or decoding. That can be inefficient if you need to perform the operations in many files. A better approach is only creating the instances of TextEncoder
and TextDecoder
once and reuse them on other files. The utf8
module of Deno std
already implements that approach. To use the module, you need to import and re-export the functions on deps.ts
file.
deps.ts
import {
decode as utf8Decode,
encode as utf8Encode,
} from 'https://deno.land/std@0.82.0/encoding/utf8.ts';
export { utf8Decode, utf8Encode };
Then, use it in another file.
import { utf8Decode, utf8Encode } from './deps.ts';
const encodedValue = utf8Encode('wð𐍈lhå');
console.log(`encodedValue: ${encodedValue}`);
const decodedValue = utf8Decode(encodedValue);
console.log(`decodedValue: ${decodedValue}`);
If you don't want to import the remote module, you can use it as a reference to implement a similar approach.
Summary
That's how to perform UTF-8 encoding and decoding in Deno. You can utilize TextEncoder
to encode a value to UTF-8 and TextDecoder
to decode a UTF-8 encoded value. It would be better if you use the same instances of TextEncoder
and TextDecoder
across different files, such as by using utf8
module of Deno std
.
Related Posts: