2 Lexical conventions [lex]

2.14 Literals [lex.literal]

2.14.1 Kinds of literals [lex.literal.kinds]

The term “literal” generally designates, in this International Standard, those tokens that are called “constants” in ISO C.

2.14.2 Integer literals [lex.icon]

integer-literal:
    decimal-literal integer-suffixopt
    octal-literal integer-suffixopt
    hexadecimal-literal integer-suffixopt
decimal-literal:
    nonzero-digit
    decimal-literal digit
octal-literal:
    0
    octal-literal octal-digit
hexadecimal-literal:
    0x hexadecimal-digit
    0X hexadecimal-digit
    hexadecimal-literal hexadecimal-digit
nonzero-digit: one of
    1  2  3  4  5  6  7  8  9
octal-digit: one of
    0  1  2  3  4  5  6  7
hexadecimal-digit: one of
    0  1  2  3  4  5  6  7  8  9
    a  b  c  d  e  f
    A  B  C  D  E  F
integer-suffix:
    unsigned-suffix long-suffixopt 
    unsigned-suffix long-long-suffixopt 
    long-suffix unsigned-suffixopt 
    long-long-suffix unsigned-suffixopt
unsigned-suffix: one of
    u  U
long-suffix: one of
    l  L
long-long-suffix: one of
    ll  LL

An integer literal is a sequence of digits that has no period or exponent part. An integer literal may have a prefix that specifies its base and a suffix that specifies its type. The lexically first digit of the sequence of digits is the most significant. A decimal integer literal (base ten) begins with a digit other than 0 and consists of a sequence of decimal digits. An octal integer literal (base eight) begins with the digit 0 and consists of a sequence of octal digits.22 A hexadecimal integer literal (base sixteen) begins with 0x or 0X and consists of a sequence of hexadecimal digits, which include the decimal digits and the letters a through f and A through F with decimal values ten through fifteen. [ Example: the number twelve can be written 12, 014, or 0XC.  — end example ]

The type of an integer literal is the first of the corresponding list in Table [tab:lex.type.integer.constant] in which its value can be represented.

Table 6 — Types of integer constants
SuffixDecimal constantsOctal or hexadecimal constant
none int int
long int unsigned int
long long int long int
unsigned long int
long long int
unsigned long long int
u or U unsigned int unsigned int
unsigned long int unsigned long int
unsigned long long int unsigned long long int
l or L long int long int
long long int unsigned long int
long long int
unsigned long long int
Both u or U unsigned long int unsigned long int
and l or L unsigned long long int unsigned long long int
ll or LL long long int long long int
unsigned long long int
Both u or U unsigned long long int unsigned long long int
and ll or LL

If an integer literal cannot be represented by any type in its list and an extended integer type ([basic.fundamental]) can represent its value, it may have that extended integer type. If all of the types in the list for the literal are signed, the extended integer type shall be signed. If all of the types in the list for the literal are unsigned, the extended integer type shall be unsigned. If the list contains both signed and unsigned types, the extended integer type may be signed or unsigned. A program is ill-formed if one of its translation units contains an integer literal that cannot be represented by any of the allowed types.

The digits 8 and 9 are not octal digits.

2.14.3 Character literals [lex.ccon]

character-literal:
    ' c-char-sequence '
    u' c-char-sequence '
    U' c-char-sequence '
    L' c-char-sequence '
c-char-sequence:
    c-char
    c-char-sequence c-char
c-char:
	any member of the source character set except
		the single-quote ', backslash \, or new-line character
	escape-sequence
	universal-character-name
escape-sequence:
    simple-escape-sequence
    octal-escape-sequence
    hexadecimal-escape-sequence
simple-escape-sequence: one of
    \'  \"  \?  \\
    \a  \b  \f  \n  \r  \t  \v
octal-escape-sequence:
    \ octal-digit
    \ octal-digit octal-digit
    \ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
    \x hexadecimal-digit
    hexadecimal-escape-sequence hexadecimal-digit

A character literal is one or more characters enclosed in single quotes, as in 'x', optionally preceded by one of the letters u, U, or L, as in u'y', U'z', or L'x', respectively. A character literal that does not begin with u, U, or L is an ordinary character literal, also referred to as a narrow-character literal. An ordinary character literal that contains a single c-char has type char, with value equal to the numerical value of the encoding of the c-char in the execution character set. An ordinary character literal that contains more than one c-char is a multicharacter literal. A multicharacter literal has type int and implementation-defined value.

A character literal that begins with the letter u, such as u'y', is a character literal of type char16_t. The value of a char16_t literal containing a single c-char is equal to its ISO 10646 code point value, provided that the code point is representable with a single 16-bit code unit. (That is, provided it is a basic multi-lingual plane code point.) If the value is not representable within 16 bits, the program is ill-formed. A char16_t literal containing multiple c-chars is ill-formed. A character literal that begins with the letter U, such as U'z', is a character literal of type char32_t. The value of a char32_t literal containing a single c-char is equal to its ISO 10646 code point value. A char32_t literal containing multiple c-chars is ill-formed. A character literal that begins with the letter L, such as L'x', is a wide-character literal. A wide-character literal has type wchar_t.23 The value of a wide-character literal containing a single c-char has value equal to the numerical value of the encoding of the c-char in the execution wide-character set, unless the c-char has no representation in the execution wide-character set, in which case the value is implementation-defined. [ Note: The type wchar_t is able to represent all members of the execution wide-character set (see [basic.fundamental]).  — end note ]. The value of a wide-character literal containing multiple c-chars is implementation-defined.

Certain nongraphic characters, the single quote ', the double quote ", the question mark ?,24 and the backslash \, can be represented according to Table [tab:escape.sequences]. The double quote " and the question mark ?, can be represented as themselves or by the escape sequences \" and \? respectively, but the single quote ' and the backslash \ shall be represented by the escape sequences \' and \\ respectively. Escape sequences in which the character following the backslash is not listed in Table [tab:escape.sequences] are conditionally-supported, with implementation-defined semantics. An escape sequence specifies a single character.

Table 7 — Escape sequences
new-line NL(LF) \n
horizontal tab HT \t
vertical tab VT \v
backspace BS \b
carriage return CR \r
form feed FF \f
alert BEL \a
backslash \ \\
question mark ? \?
single quote ' \'
double quote " \"
octal number ooo \ooo
hex number hhh \xhhh

The escape \ooo consists of the backslash followed by one, two, or three octal digits that are taken to specify the value of the desired character. The escape \xhhh consists of the backslash followed by x followed by one or more hexadecimal digits that are taken to specify the value of the desired character. There is no limit to the number of digits in a hexadecimal sequence. A sequence of octal or hexadecimal digits is terminated by the first character that is not an octal digit or a hexadecimal digit, respectively. The value of a character literal is implementation-defined if it falls outside of the implementation-defined range defined for char (for literals with no prefix), char16_t (for literals prefixed by 'u'), char32_t (for literals prefixed by 'U'), or wchar_t (for literals prefixed by 'L').

A universal-character-name is translated to the encoding, in the appropriate execution character set, of the character named. If there is no such encoding, the universal-character-name is translated to an implementation-defined encoding. [ Note: In translation phase 1, a universal-character-name is introduced whenever an actual extended character is encountered in the source text. Therefore, all extended characters are described in terms of universal-character-names. However, the actual compiler implementation may use its own native character set, so long as the same results are obtained.  — end note ]

They are intended for character sets where a character does not fit into a single byte.

Using an escape sequence for a question mark can avoid accidentally creating a trigraph.

2.14.4 Floating literals [lex.fcon]

floating-literal:
    fractional-constant exponent-partopt floating-suffixopt
    digit-sequence exponent-part floating-suffixopt
fractional-constant:
    digit-sequenceopt . digit-sequence
    digit-sequence .
exponent-part:
    e signopt digit-sequence
    E signopt digit-sequence
sign: one of
    +  -
digit-sequence:
    digit
    digit-sequence digit
floating-suffix: one of
    f  l  F  L

A floating literal consists of an integer part, a decimal point, a fraction part, an e or E, an optionally signed integer exponent, and an optional type suffix. The integer and fraction parts both consist of a sequence of decimal (base ten) digits. Either the integer part or the fraction part (not both) can be omitted; either the decimal point or the letter e (or E ) and the exponent (not both) can be omitted. The integer part, the optional decimal point and the optional fraction part form the significant part of the floating literal. The exponent, if present, indicates the power of 10 by which the significant part is to be scaled. If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner. The type of a floating literal is double unless explicitly specified by a suffix. The suffixes f and F specify float, the suffixes l and L specify long double. If the scaled value is not in the range of representable values for its type, the program is ill-formed.

2.14.5 String literals [lex.string]

string-literal:
    encoding-prefixopt " s-char-sequenceopt "
    encoding-prefixopt R raw-string
encoding-prefix:
  u8
  u
  U
  L
s-char-sequence:
    s-char
    s-char-sequence s-char
s-char:
	any member of the source character set except
		the double-quote ", backslash \, or new-line character
	escape-sequence
	universal-character-name
raw-string:
    " d-char-sequenceopt ( r-char-sequenceopt ) d-char-sequenceopt "
r-char-sequence:
    r-char
    r-char-sequence r-char
r-char:
	any member of the source character set, except
		a right parenthesis ) followed by the initial  d-char-sequence
		(which may be empty) followed by a double quote ".
d-char-sequence:
    d-char
    d-char-sequence d-char
d-char:
	any member of the basic source character set except:
		space, the left parenthesis (, the right parenthesis ), the backslash \,
		and the control characters representing horizontal tab,
		vertical tab, form feed, and newline.

A string literal is a sequence of characters (as defined in [lex.ccon]) surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in "...", R"(...)", u8"...", u8R"**(...)**", u"...", uR"*~(...)*~", U"...", UR"zzz(...)zzz", L"...", or LR"(...)", respectively.

A string literal that has an R in the prefix is a raw string literal. The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence. A d-char-sequence shall consist of at most 16 characters.

Note: The characters '(' and ')' are permitted in a raw-string. Thus, R"delimiter((a|b))delimiter" is equivalent to "(a|b)".  — end note ]

Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string-literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:

const char *p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);

 — end note ]

Example: The raw string

R"a(
)\
a"
)a"

is equivalent to "\n)\\\na\"\n". The raw string

R"(??)"

is equivalent to "\?\?". The raw string

R"#(
)??="
)#"

is equivalent to "\n)\?\?=\"\n".  — end example ]

After translation phase 6, a string literal that does not begin with an encoding-prefix is an ordinary string literal, and is initialized with the given characters.

A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8.

Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration ([basic.stc]).

A string literal that begins with u, such as u"asdf", is a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters. A single c-char may produce more than one char16_t character in the form of surrogate pairs.

A string literal that begins with U, such as U"asdf", is a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.

A string literal that begins with L, such as L"asdf", is a wide string literal. A wide string literal has type “array of n const wchar_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.

Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.

In translation phase 6 ([lex.phases]), adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally supported with implementation-defined behavior. [ Note: This concatenation is an interpretation, not a conversion. Because the interpretation happens in translation phase 6 (after each character from a literal has been translated into a value from the appropriate character set), a string literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation.  — end note ] Table [tab:lex.string.concat] has some examples of valid concatenations.

Table 8 — String literal concatenations
Source Means Source Means Source Means
u"a" u"b" u"ab" U"a" U"b" U"ab" L"a" L"b" L"ab"
u"a" "b" u"ab" U"a" "b" U"ab" L"a" "b" L"ab"
"a" u"b" u"ab" "a" U"b" U"ab" "a" L"b" L"ab"

Characters in concatenated strings are kept distinct.

Example:

"\xA" "B"

contains the two characters '\xA' and 'B' after concatenation (and not the single hexadecimal character '\xAB').  — end example ]

After any necessary concatenation, in translation phase 7 ([lex.phases]), '\0' is appended to every string literal so that programs that scan a string can find its end.

Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character literals ([lex.ccon]), except that the single quote ' is representable either by itself or by the escape sequence \', and the double quote " shall be preceded by a \. In a narrow string literal, a universal-character-name may map to more than one char element due to multibyte encoding. The size of a char32_t or wide string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating U'\0' or L'\0'. The size of a char16_t string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for each character requiring a surrogate pair, plus one for the terminating u'\0'. [ Note: The size of a char16_t string literal is the number of code units, not the number of characters.  — end note ] Within char32_t and char16_t literals, any universal-character-names shall be within the range 0x0 to 0x10FFFF. The size of a narrow string literal is the total number of escape sequences and other characters, plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating '\0'.

2.14.6 Boolean literals [lex.bool]

boolean-literal:
    false
    true

The Boolean literals are the keywords false and true. Such literals are prvalues and have type bool.

2.14.7 Pointer literals [lex.nullptr]

pointer-literal:
    nullptr

The pointer literal is the keyword nullptr. It is a prvalue of type std::nullptr_t. [ Note: std::nullptr_t is a distinct type that is neither a pointer type nor a pointer to member type; rather, a prvalue of this type is a null pointer constant and can be converted to a null pointer value or null member pointer value. See [conv.ptr] and [conv.mem].  — end note ]

2.14.8 User-defined literals [lex.ext]

user-defined-literal:
    user-defined-integer-literal
    user-defined-floating-literal
    user-defined-string-literal
    user-defined-character-literal
user-defined-integer-literal:
    decimal-literal ud-suffix
    octal-literal ud-suffix
    hexadecimal-literal ud-suffix
user-defined-floating-literal:
    fractional-constant exponent-partopt ud-suffix
    digit-sequence exponent-part ud-suffix
user-defined-string-literal:
    string-literal ud-suffix
user-defined-character-literal:
    character-literal ud-suffix
ud-suffix:
    identifier

If a token matches both user-defined-literal and another literal kind, it is treated as the latter. [ Example: 123_km is a user-defined-literal, but 12LL is an integer-literal.  — end example ] The syntactic non-terminal preceding the ud-suffix in a user-defined-literal is taken to be the longest sequence of characters that could match that non-terminal.

A user-defined-literal is treated as a call to a literal operator or literal operator template ([over.literal]). To determine the form of this call for a given user-defined-literal L with ud-suffix X, the literal-operator-id whose literal suffix identifier is X is looked up in the context of L using the rules for unqualified name lookup ([basic.lookup.unqual]). Let S be the set of declarations found by this lookup. S shall not be empty.

If L is a user-defined-integer-literal, let n be the literal without its ud-suffix. If S contains a literal operator with parameter type unsigned long long, the literal L is treated as a call of the form

operator "" X(nULL)

Otherwise, S shall contain a raw literal operator or a literal operator template ([over.literal]) but not both. If S contains a raw literal operator, the literal L is treated as a call of the form

operator "" X("n")

Otherwise (S contains a literal operator template), L is treated as a call of the form

operator "" X<'c1', 'c2', ... 'ck'>()

where n is the source character sequence c1c2...ck. [ Note: The sequence c1c2...ck can only contain characters from the basic source character set.  — end note ]

If L is a user-defined-floating-literal, let f be the literal without its ud-suffix. If S contains a literal operator with parameter type long double, the literal L is treated as a call of the form

operator "" X(fL)

Otherwise, S shall contain a raw literal operator or a literal operator template ([over.literal]) but not both. If S contains a raw literal operator, the literal L is treated as a call of the form

operator "" X("f")

Otherwise (S contains a literal operator template), L is treated as a call of the form

operator "" X<'c1', 'c2', ... 'ck'>()

where f is the source character sequence c1c2...ck. [ Note: The sequence c1c2...ck can only contain characters from the basic source character set.  — end note ]

If L is a user-defined-string-literal, let str be the literal without its ud-suffix and let len be the number of code units in str (i.e., its length excluding the terminating null character). The literal L is treated as a call of the form

operator "" X(str, len)

If L is a user-defined-character-literal, let ch be the literal without its ud-suffix. S shall contain a literal operator ([over.literal]) whose only parameter has the type of ch and the literal L is treated as a call of the form

operator "" X(ch)

Example:

long double operator "" _w(long double);
std::string operator "" _w(const char16_t*, size_t);
unsigned operator "" _w(const char*);
int main() {
  1.2_w;      // calls operator "" _w(1.2L)
  u"one"_w;   // calls operator "" _w(u"one", 3)
  12_w;       // calls operator "" _w("12")
  "two"_w;    // error: no applicable literal operator
}

 — end example ]

In translation phase 6 ([lex.phases]), adjacent string literals are concatenated and user-defined-string-literals are considered string literals for that purpose. During concatenation, ud-suffixes are removed and ignored and the concatenation process occurs as described in [lex.string]. At the end of phase 6, if a string literal is the result of a concatenation involving at least one user-defined-string-literal, all the participating user-defined-string-literals shall have the same ud-suffix and that suffix is applied to the result of the concatenation.

Example:

int main() {
  L"A" "B" "C"_x; // OK: same as L"ABC"_x
  "P"_x "Q" "R"_y;// error: two different ud-suffixes
}

 — end example ]

Some identifiers appearing as ud-suffixes are reserved for future standardization ([usrlit.suffix]). A program containing such a ud-suffix is ill-formed, no diagnostic required.