String Object in JavaScript

Introduction

Strings are useful for storing data that can be represented as text. Some of the most commonly used operations on strings are checking their length, constructing and concatenating them using the string operators + and +=, checking for the existence or location of substrings with the indexOf() method, or extracting substrings with the substring() method.

Creating strings

Strings can be created primitively, from the string literal, or as objects using the String() constructor:

const string1 = "A string primitive";
const string2 = 'Also a string primitive';
const string3 = `Yet another string primitive`;

const string4 = new String("A String object");

String primitives and string objects share many common behaviors, but they also have other important differences and caveats. See “String primitives and String objects” below.

String literals can be specified using single or double quotes, which are treated the same, or using the backtick character `. This last form specifies an actual pattern: you can interpolate expressions with this form. For more information about string literal syntax, see Lexical syntax.

Character access

There are two ways to access a character in a string. The first is the charAt() method:

"cat".charAt(1); // gives value "a"

Another way is to think of the string as an array-like object, where individual characters correspond to a numeric index:

"cat"[1]; // gives value "a"

When using the bracket notation to access the character, attempts to delete or assign a value to these properties will fail. The properties involved are neither writable nor settable. (See Object.defineProperty() for more information.)

Comparing strings

Use the less-than and greater-than operators to compare strings:

const a = "a";
const b = "b";
if (a < b) {
// true
console.log(`${a} is less than ${b}`);
} else if (a > b) {
console.log(`${a} is greater than ${b}`);
} else {
console.log(`${a} and ${b} are equal.`);
}

Note that all comparison operators, including === and ==, compare strings in case. A common way to compare strings in case is to convert both to the same letter (upper or lower) before comparing them.

function areEqualCaseInsensitive(str1, str2) {
return str1.toUpperCase() === str2.toUpperCase();
}

The choice of conversion with toUpperCase() or toLowerCase() is largely arbitrary, and neither is completely robust when expanding beyond the Latin alphabet. For example, the German lowercase letters ß and ss are both converted to SS by toUpperCase() , while the Turkish letter ı is incorrectly reported as not being equal to I by toLowerCase() unless toLocaleLowerCase (“tr”) is specifically used.

const areEqualInUpperCase = (str1, str2) =>
str1.toUpperCase() === str2.toUpperCase();
const areEqualInLowerCase = (str1, str2) =>
str1.toLowerCase() === str2.toLowerCase();

areEqualInUpperCase("ß", "ss"); // true; should be false
areEqualInLowerCase("ı", "I"); // false; should be true

A logical and robust solution for testing case-insensitive equality is to use the Intl.Collator API or the string localeCompare() method – they have the same interface – with the sensitivity option set to “accent” or “base”.

const areEqual = (str1, str2, locale = "en-US") =>
str1.localeCompare(str2, locale, { sensitivity: "accent" }) === 0;

areEqual("ß", "ss", "de"); // false
areEqual("ı", "I", "tr"); // true

The localeCompare() method allows comparing strings in a similar way to strcmp() – this allows sorting strings in a locale-aware manner.

Primitive strings and string objects

Note that JavaScript distinguishes between string objects and primitive string values. (This is also true for Booleans and Numbers.)

String literals (delimited by double or single quotes) and strings returned from string calls in a non-constructive context (i.e., called without using the new keyword) are primitive strings. In contexts where a method is to be called on a primitive string or a property lookup occurs, JavaScript automatically wraps the primitive string and calls the method or performs the property lookup on the wrapper object instead.

const strPrim = "foo"; // A literal is a string primitive
const strPrim2 = String(1); // Coerced into the string primitive "1"
const strPrim3 = String(true); // Coerced into the string primitive "true"
const strObj = new String(strPrim); // String with new returns a string wrapper object.

console.log(typeof strPrim); // "string"
console.log(typeof strPrim2); // "string"
console.log(typeof strPrim3); // "string"
console.log(typeof strObj); // "object

Primitive strings and string objects also give different results when using eval(). The primitive passed to eval is treated as source code. By returning an object, string objects are treated like any other object. For example:

const s1 = "2 + 2"; // creates a string primitive
const s2 = new String("2 + 2"); // creates a String object
console.log(eval(s1)); // returns the number 4
console.log(eval(s2)); // returns the string "2 + 2"

For these reasons, code may break when it encounters string objects when it expects a primitive string instead, although in general, authors do not need to worry about the distinction.

A String object can always be converted to its original counterpart with the valueOf() method.

console.log(eval(s2.valueOf())); // returns the number 4

String coercion

Many built-in operations that expect strings first cast their arguments to strings (which is why String objects behave like primitive strings). The operations can be summarized as follows:

Strings are returned as is.
Undefined becomes "undefined.".
null becomes "empty".
True becomes “true.” False becomes “false.”.
Numbers are converted with the same algorithm as toString(10).
BigInts are converted with the same algorithm as toString(10).
Symbols raise a TypeError.
Objects are first converted to a primitive by calling the [Symbol.toPrimitive]() (with “string” as a hint), toString(), and valueOf() methods, respectively. The resulting primitive is then converted to a string.

There are different ways to achieve roughly the same effect in JavaScript.

Literally: `${x}` performs exactly the string-binding steps described above for the embedded expression.
The String() function: String(x) uses the same algorithm to convert x, except that Symbols does not raise a TypeError, but returns “Symbol(description)”, where description is the description of the symbol.
Using the + operator: “” + x coerces its operand to a primitive instead of a string, and for some objects, it has completely different behaviors than regular string coercion. See its reference page for more details.

Depending on your use case, you may want to use `${x}` (to mimic the built-in behavior) or String(x) (to handle symbol values without causing errors), but you should not use “” + x. .

UTF-16 characters, Unicode code points, and grapheme clusters

Strings are essentially represented as a sequence of UTF-16 code units. In UTF-16 encoding, each code unit is exactly 16 bits long. This means that there are a maximum of 216 or 65,536 possible characters as a single UTF-16 code unit. This character set is called the Basic Multilingual Page (BMP) and includes the most common characters such as the Latin, Greek, Cyrillic, and many East Asian characters. Each code unit can be written in a string with \u followed by exactly four hex digits.

However, the entire Unicode character set is much larger than 65536. The additional characters in UTF-16 are stored as surrogate pairs, which are pairs of 16-bit code units that represent a single character. To avoid ambiguity, the two parts of the pair must be between 0xD800 and 0xDFFF, and these code units are not used to encode single-code characters. (More precisely, leading surrogates, also called upper surrogate code units, have values between 0xD800 and 0xDBFF, while trailing surrogates, also called lower surrogate code units, have values between 0xDC00 and 0xDFFF, including the Unicode character.) A character consisting of one or two UTF-16 code units is also called a Unicode code point. Any Unicode code point can be written in a string with \u{xxxxxx}, where xxxxxx represents 1 to 6 hexadecimal digits.

A “single surrogate” is a 16-bit code unit that meets one of the following descriptions:

In the range 0xD800–0xDBFF, inclusive (i.e., is a leading surrogate), but is the last code unit in the string, or the next code unit is not a last surrogate.
In the range 0xDC00–0xDFFF, inclusive (i.e., a last surrogate), but the first code unit in the string, or the preceding code unit is not the original surrogate.

Single surrogates do not represent any Unicode characters. Although most internal JavaScript methods handle them correctly, since they all operate on UTF-16 code units, single surrogates are not usually valid values when interacting with other systems—for example, encodeURI() throws a URIE error for single surrogates, because URI encoding uses UTF-8 encoding, which has no encoding for single surrogates. Strings that do not contain any single surrogates are called well-formed strings, and are safe to use with functions that do not deal with UTF-16 (such as encodeURI() or TextEncoder ). You can check whether a string is well-formed with the isWellFormed() method, or clean up single surrogates with the toWellFormed() method.

Above the Unicode characters, there is a specific sequence of Unicode characters that must be considered as a visual unit, known as a graphical cluster. The most common case is emoticons: many emoticons, which have a wide range of variations, are actually formed by multiple emojis, usually with the character (U+200D) join together.

You need to be careful at which level of characters you iterate. For example, split(“”) splits by UTF-16 code units and separates the alternate pairs. String indices also refer to the index of each UTF-16 code unit. On the other hand, [Symbol.iterator]() iterates by Unicode code points. Iterating through grapheme clusters requires custom code.

"😄".split(""); // ['\ud83d', '\ude04']; splits into two lone surrogates
// "Backhand Index Pointing Right: Dark Skin Tone"
[..."👉🏿"]; // ['👉', '🏿']
// splits into the basic "Backhand Index Pointing Right" emoji and
// the "Dark skin tone" emoji
// "Family: Man, Boy"
[..."👨‍👦"]; // [ '👨', '‍', '👦' ]
// splits into the "Man" and "Boy" emoji, joined by a ZWJ
// The United Nations flag
[..."🇺🇳"]; // [ '🇺', '🇳' ]
// splits into two "region indicator" letters "U" and "N".
// All flag emojis are formed by joining two region indicator letters

Constructor

String()

Creates string objects. When called as a function, it returns initial values of type String.

Static methods

String.fromCharCode()

Returns a string created using the specified sequence of Unicode values.

String.fromCodePoint()

Returns a string created using the specified sequence of code points.

String.raw()

Returns a string created from a raw pattern string.

Sample properties

These properties are defined in String.prototype and are shared by all String instances.

String.prototype.constructor

The constructor function that creates the instance object. For string instances, the initial value is the string constructor.

These features are specific to each string instance.

length

Reflects the length of the string. Read-only.

Sample methods

String.prototype.at()

Returns the character (exactly one UTF-16 code unit) in the specified index. Accepts negative integers counting backward from the last character of the string.

String.prototype.charAt()

Returns the character (exactly one UTF-16 code unit) in the specified index.

String.prototype.charCodeAt()

Returns a number that is the value of the UTF-16 code unit at the given index.

String.prototype.codePointAt()

Returns a non-negative integer that is the code point value of the UTF-16 encoded code point starting at the specified pose.

String.prototype.concat()

The text combines two (or more) strings and returns a new string.

String.prototype.endsWith()

Determines whether a string ends with the characters of the string searchString.

String.prototype.includes()

Determines whether the calling string contains a SearchString.

String.prototype.indexOf()

Returns the index in this string of the first occurrence of searchValue, or -1 if not found.

String.prototype.isWellFormed()

Returns a boolean indicating whether this string has single replacements.

String.prototype.lastIndexOf()

Returns the index in this string of the last occurrence of searchValue, or -1 if not found.

String.prototype.localeCompare()

Returns a number indicating whether the string referenced by compareString is before, after, or equivalent to the given string in the sort order.

String.prototype.match()

It is used to match a regular expression regexp against a string.

String.prototype.matchAll()

Returns an iterator of all regexp matches.

String.prototype.normalize()

The Unicode normalized form returns the value of the calling string.

String.prototype.padEnd()

Concatenates the current string from the end with a string and returns a new string of length targetLength.

String.prototype.padStart()

Pads the current string from the beginning with a given string and returns a new string of length targetLength.

String.prototype.repeat()

Returns a string consisting of the elements of the object counted times.

String.prototype.replace()

SearchFor is used to replace items using replaceWith. searchFor may be a string or regular expression, and replaceWith may be a string or function.

String.prototype.replaceAll()

SearchFor is used to replace all occurrences of replaceWith. searchFor may be a string or regular expression, and replaceWith may be a string or function.

String.prototype.search()

Search for a match between a regular expression regexp and the calling string.

String.prototype.slice()

Extracts part of a string and returns a new string.

String.prototype.split()

Returns an array of strings filled by splitting the calling string into the following strings sep.

String.prototype.startsWith()

Determines whether the calling string starts with the characters in the searchString string.

String.prototype.substr() is deprecated

Returns a portion of a string starting at the specified index and extending for a specified number of characters.

String.prototype.substring()

Returns a new string containing the characters of the calling string from (or between) the specified index (or indices).

String.prototype.toLocaleLowerCase()

Characters within a string are converted to lowercase while respecting the current locale.

For most languages, this returns the same state as toLowerCase().

String.prototype.toLocaleUpperCase()

Characters within a string are converted to uppercase while respecting the current locale.

For most languages, this returns the same state as toUpperCase().

String.prototype.toLowerCase()

Returns the value of the calling string converted to lowercase.

String.prototype.toString()

Returns a string representing the specified object. Overrides the Object.prototype.toString() method.

String.prototype.toUpperCase()

Returns the value of the calling string converted to uppercase.

String.prototype.toWellFormed()

Returns a string in which all single occurrences of this string are replaced with the Unicode replacement character U+FFFD.

String.prototype.trim()

Trims whitespace from the beginning and end of the string.

String.prototype.trimEnd()

Trims the end of the string with white space.

String.prototype.trimStart()

Trims whitespace from the beginning of the string.

String.prototype.valueOf()

Returns the initial value of the specified object. Overrides the Object.prototype.valueOf() method.

String.prototype[Symbol.iterator]()

Returns a new iterator object that iterates over the code points of a string value, returning each code point as a string value.