string-literal: encoding-prefixopt " s-char-sequenceopt " encoding-prefixopt R raw-string
encoding-prefix: u8 u U L
s-char-sequence: s-char s-char-sequence s-char
s-char: any member of the source character set except the double-quote ", backslash \, or new-line character escape-sequence universal-character-name
raw-string: " d-char-sequenceopt ( r-char-sequenceopt ) d-char-sequenceopt "
r-char-sequence: r-char r-char-sequence r-char
r-char: any member of the source character set, except a right parenthesis ) followed by the initial d-char-sequence (which may be empty) followed by a double quote ".
d-char-sequence: d-char d-char-sequence d-char
d-char: any member of the basic source character set except: space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters representing horizontal tab, vertical tab, form feed, and newline.
A string literal is a sequence of characters (as defined in [lex.ccon]) surrounded by double quotes, optionally prefixed by R, u8, u8R, u, uR, U, UR, L, or LR, as in "...", R"(...)", u8"...", u8R"**(...)**", u"...", uR"*~(...)*~", U"...", UR"zzz(...)zzz", L"...", or LR"(...)", respectively.
A string literal that has an R in the prefix is a raw string literal. The d-char-sequence serves as a delimiter. The terminating d-char-sequence of a raw-string is the same sequence of characters as the initial d-char-sequence. A d-char-sequence shall consist of at most 16 characters.
[ Note: The characters '(' and ')' are permitted in a raw-string. Thus, R"delimiter((a|b))delimiter" is equivalent to "(a|b)". — end note ]
[ Note: A source-file new-line in a raw string literal results in a new-line in the resulting execution string-literal. Assuming no whitespace at the beginning of lines in the following example, the assert will succeed:
const char *p = R"(a\ b c)"; assert(std::strcmp(p, "a\\\nb\nc") == 0);
— end note ]
[ Example: The raw string
R"a( )\ a" )a"
is equivalent to "\n)\\\na\"\n". The raw string
R"(??)"
is equivalent to "\?\?". The raw string
R"#( )??=" )#"
is equivalent to "\n)\?\?=\"\n". — end example ]
After translation phase 6, a string literal that does not begin with an encoding-prefix is an ordinary string literal, and is initialized with the given characters.
A string literal that begins with u8, such as u8"asdf", is a UTF-8 string literal and is initialized with the given characters as encoded in UTF-8.
Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has static storage duration ([basic.stc]).
A string literal that begins with u, such as u"asdf", is a char16_t string literal. A char16_t string literal has type “array of n const char16_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters. A single c-char may produce more than one char16_t character in the form of surrogate pairs.
A string literal that begins with U, such as U"asdf", is a char32_t string literal. A char32_t string literal has type “array of n const char32_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
A string literal that begins with L, such as L"asdf", is a wide string literal. A wide string literal has type “array of n const wchar_t”, where n is the size of the string as defined below; it has static storage duration and is initialized with the given characters.
Whether all string literals are distinct (that is, are stored in nonoverlapping objects) is implementation-defined. The effect of attempting to modify a string literal is undefined.
In translation phase 6 ([lex.phases]), adjacent string literals are concatenated. If both string literals have the same encoding-prefix, the resulting concatenated string literal has that encoding-prefix. If one string literal has no encoding-prefix, it is treated as a string literal of the same encoding-prefix as the other operand. If a UTF-8 string literal token is adjacent to a wide string literal token, the program is ill-formed. Any other concatenations are conditionally supported with implementation-defined behavior. [ Note: This concatenation is an interpretation, not a conversion. Because the interpretation happens in translation phase 6 (after each character from a literal has been translated into a value from the appropriate character set), a string literal's initial rawness has no effect on the interpretation or well-formedness of the concatenation. — end note ] Table [tab:lex.string.concat] has some examples of valid concatenations.
Source | Means | Source | Means | Source | Means | |||
u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
Characters in concatenated strings are kept distinct.
[ Example:
"\xA" "B"
contains the two characters '\xA' and 'B' after concatenation (and not the single hexadecimal character '\xAB'). — end example ]
After any necessary concatenation, in translation phase 7 ([lex.phases]), '\0' is appended to every string literal so that programs that scan a string can find its end.
Escape sequences and universal-character-names in non-raw string literals have the same meaning as in character literals ([lex.ccon]), except that the single quote ' is representable either by itself or by the escape sequence \', and the double quote " shall be preceded by a \. In a narrow string literal, a universal-character-name may map to more than one char element due to multibyte encoding. The size of a char32_t or wide string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for the terminating U'\0' or L'\0'. The size of a char16_t string literal is the total number of escape sequences, universal-character-names, and other characters, plus one for each character requiring a surrogate pair, plus one for the terminating u'\0'. [ Note: The size of a char16_t string literal is the number of code units, not the number of characters. — end note ] Within char32_t and char16_t literals, any universal-character-names shall be within the range 0x0 to 0x10FFFF. The size of a narrow string literal is the total number of escape sequences and other characters, plus at least one for the multibyte encoding of each universal-character-name, plus one for the terminating '\0'.