Deno - UTF-8 Encoding & Decoding Examples

This tutorial shows you how to perform UTF-8 encoding & decoding in Deno.

UTF-8 is a fixed-width character encoding. It's the most common encoding for the World Wide Web. It can be used to encode all 1,112,04 valid character code points in Unicode.

UTF-8 works by encoding each character into one to four bytes depending on the code point of the character. Frequently used characters are usually encoded to fewer bytes. You can see the table below. The x characters are replaced by the bits of the code point. For example, If a character's code point is in U+0000 ~ U+007F range, it will be encoded to one byte. If the character's code point is in U+0800 ~ U+FFFF range, it will be encoded to three bytes.

Number of Bytes	Code point range	Byte 1	Byte 2	Byte 3	Byte 4
1	U+0000 ~ U+007F	0xxxxxxx
2	U+0080 ~ U+07FF	110xxxxx	10xxxxxx
3	U+0800 ~ U+FFFF	1110xxxx	10xxxxxx	10xxxxxx
4	U+10000 ~ U+10FFFF	11110xxx	10xxxxxx	10xxxxxx	10xxxxxx

As an example, we are going to encode a string 'wð𐍈lhå using UTF-8 encoding. For each character, you need to get the code point and match it with the above table to determine the result in binary. After that, convert the binary result to the corresponding UTF-8 characters.

Character	Code Point	Binary	UTF-8 Binary	UTF-8 Character
w	U+0077	1110111	01110111	119
ð	U+00F0	11110000	11000011 10110000	195,176
𐍈	U+10348	000010000001101001000	11110000 10010000 10001101 10001000	240,144,141,136
l	U+006C	1101100	01101100	108
h	U+0068	1101000	01101000	104
å	U+00E5	11100101	11000011 10100101	195,165

To perform UTF-8 encoding and decoding in Deno, you don't have to implement the encode and decode functions yourself. Deno has TextEncoder and TextDecoder for that purpose. The usage examples are shown below.

Using `TextEncoder` and `TextDecoder`

Encode Using `TextEncoder`

TextEncoder has a function named encode which returns the result of running UTF-8 encoder.

  encode(input?: string): Uint8Array;

Example:

  import { base32Encode } from './deps.ts';

  const textEncoder = new TextEncoder();
  const encodedValue = textEncoder.encode('wð𐍈lhå');
  console.log(`encodedValue: ${encodedValue}`);

Output:

  encodedValue: 119,195,176,240,144,141,136,108,104,195,165

TextEncoder also has a function named encodeInto. It encodes the value passed as source and stores the result in destination. The function returns an object with two fields; read and written. read is the number of converted code units of source, while written is the number of bytes modified in destination.

  encodeInto(source: string, destination: Uint8Array): TextEncoderEncodeIntoResult;

Example:

  const textEncoder = new TextEncoder();
  const bytes = new Uint8Array(64);
  const result = textEncoder.encodeInto('wð𐍈lhå', bytes);
  console.log(bytes);
  console.log(result);

Output:

  Uint8Array(64) [
    119, 195, 176, 240, 144, 141, 136, 108, 104, 195, 165, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0, 0,
      0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 0
  ]
  { read: 7, written: 11 }

Decode Using `TextDecoder`

TextDecoder's decode function can be used to decode a UTF-8 encoded value into a string.

  decode(input?: BufferSource, options?: TextDecodeOptions): string;

Example:

  import { base32Decode } from './deps.ts';

  const textDecoder = new TextDecoder();
  const decodedValue = textDecoder.decode(encodedValue);
  console.log(`decodedValue: ${decodedValue}`);

Output:

  decodedValue: wð𐍈lhå5

Using Deno `std` UTF-8 Module

The above solution requires you to create new TextEncoder and TextDecoder instances in each function or file where you want to perform encoding or decoding. That can be inefficient if you need to perform the operations in many files. A better approach is only creating the instances of TextEncoder and TextDecoder once and reuse them on other files. The utf8 module of Deno std already implements that approach. To use the module, you need to import and re-export the functions on deps.ts file.

deps.ts

  import {
    decode as utf8Decode,
    encode as utf8Encode,
  } from 'https://deno.land/std@0.82.0/encoding/utf8.ts';

  export { utf8Decode, utf8Encode };

Then, use it in another file.

  import { utf8Decode, utf8Encode } from './deps.ts';

  const encodedValue = utf8Encode('wð𐍈lhå');
  console.log(`encodedValue: ${encodedValue}`);

  const decodedValue = utf8Decode(encodedValue);
  console.log(`decodedValue: ${decodedValue}`);

If you don't want to import the remote module, you can use it as a reference to implement a similar approach.

Summary

That's how to perform UTF-8 encoding and decoding in Deno. You can utilize TextEncoder to encode a value to UTF-8 and TextDecoder to decode a UTF-8 encoded value. It would be better if you use the same instances of TextEncoder and TextDecoder across different files, such as by using utf8 module of Deno std.

How to perform Base64 Encoding and Decoding in Deno

Deno - UTF-8 Encoding & Decoding Examples

Using TextEncoder and TextDecoder

Encode Using TextEncoder

Decode Using TextDecoder

Using Deno std UTF-8 Module

Summary

Ivan Andrianto

Using `TextEncoder` and `TextDecoder`

Encode Using `TextEncoder`

Decode Using `TextDecoder`

Using Deno `std` UTF-8 Module