Introduction to unorm
Unorm is a versatile library for Unicode normalization in JavaScript. It provides effective methods to handle and normalize Unicode strings, ensuring that your text is consistent and correctly formatted. This can be particularly useful in applications requiring text processing across various languages.
Getting Started with unorm
First, you need to install unorm. You can do this using npm:
npm install unorm
Once installed, you can start using unorm in your JavaScript code. Here’s how to include it:
const unorm = require('unorm');
Useful API Methods in unorm
NFC (Normalization Form C)
Combines characters and canonicalizes ordering.
const str = "e\u0301"; const normalized = unorm.nfc(str); console.log(normalized); // outputs: é
NFD (Normalization Form D)
Decomposes characters into multiple combining marks.
const str = "é"; const normalized = unorm.nfd(str); console.log(normalized); // outputs: é
NFKC (Normalization Form KC)
Compatibility decomposition followed by canonical composition.
const str = "ℌ"; const normalized = unorm.nfkc(str); console.log(normalized); // outputs: H
NFKD (Normalization Form KD)
Compatibility decomposition.
const str = "ℌ"; const normalized = unorm.nfkd(str); console.log(normalized); // outputs: H
Advanced API Usages
Checking if a String is in a Normal Form
const str = "e"; console.log(unorm.isNormalized(str, 'NFC')); // outputs: false console.log(unorm.isNormalized(str, 'NFD')); // outputs: true
Building an Application with unorm
To demonstrate how to use unorm in an application, let’s build a simple normalization tool for user input.
const unorm = require('unorm'); const readline = require('readline'); const rl = readline.createInterface({ input: process.stdin, output: process.stdout }); rl.question('Enter a string to normalize: ', (input) => { console.log('NFC: ', unorm.nfc(input)); console.log('NFD: ', unorm.nfd(input)); console.log('NFKC: ', unorm.nfkc(input)); console.log('NFKD: ', unorm.nfkd(input)); rl.close(); });
This simple tool takes a user’s input and normalizes it in all four forms, displaying the results.
By leveraging unorm, you can handle Unicode normalization effectively, ensuring your applications can properly process text in any language.
Hash: 1089609be209fe13592afca8a26665f15a6d2689088c08269ef5d0cf124f687c