5 Lexical conventions [lex]
integer-literal:
binary-literal integer-suffixopt
octal-literal integer-suffixopt
decimal-literal integer-suffixopt
hexadecimal-literal integer-suffixopt
binary-literal:
0b binary-digit
0B binary-digit
binary-literal 'opt binary-digit
octal-literal:
0
octal-literal 'opt octal-digit
decimal-literal:
nonzero-digit
decimal-literal 'opt digit
hexadecimal-literal:
hexadecimal-prefix hexadecimal-digit-sequence
binary-digit: one of
0 1
octal-digit: one of
0 1 2 3 4 5 6 7
nonzero-digit: one of
1 2 3 4 5 6 7 8 9
hexadecimal-prefix: one of
0x 0X
hexadecimal-digit-sequence:
hexadecimal-digit
hexadecimal-digit-sequence 'opt hexadecimal-digit
hexadecimal-digit: one of
0 1 2 3 4 5 6 7 8 9
a b c d e f
A B C D E F
integer-suffix:
unsigned-suffix long-suffixopt
unsigned-suffix long-long-suffixopt
long-suffix unsigned-suffixopt
long-long-suffix unsigned-suffixopt
unsigned-suffix: one of
u U
long-suffix: one of
l L
long-long-suffix: one of
ll LL
An
integer literal is a sequence of digits that has no period
or exponent part, with optional separating single quotes that are ignored
when determining its value
. An integer literal may have a prefix that specifies
its base and a suffix that specifies its type
. The lexically first digit
of the sequence of digits is the most significant
. A
hexadecimal integer literal
(base sixteen) begins with
0x or
0X and consists of a sequence of hexadecimal
digits, which include the decimal digits and the letters
a
through
f and
A through
F with decimal values
ten through fifteen
. [
Example: The number twelve can be written
12,
014,
0XC, or
0b1100. The integer literals
1048576,
1'048'576,
0X100000,
0x10'0000, and
0'004'000'000 all have the same value
. —
end example ]
The type of an integer literal is the first of the corresponding list
in Table
7 in which its value can be
represented
.Table
7 — Types of integer literals
Suffix | Decimal literal | Binary, octal, or hexadecimal literal |
none | int | int |
| long int | unsigned int |
| long long int | long int |
| | unsigned long int |
| | long long int |
| | unsigned long long int |
u or U | unsigned int | unsigned int |
| unsigned long int | unsigned long int |
| unsigned long long int | unsigned long long int |
l or L | long int | long int |
| long long int | unsigned long int |
| | long long int |
| | unsigned long long int |
Both u or U | unsigned long int | unsigned long int |
and l or L | unsigned long long int | unsigned long long int |
ll or LL | long long int | long long int |
| | unsigned long long int |
Both u or U | unsigned long long int | unsigned long long int |
and ll or LL | | |
If an integer literal cannot be represented by any type in its list and
an
extended integer type can represent its value, it may have that
extended integer type
. If all of the types in the list for the integer literal
are signed, the extended integer type shall be signed
. If all of the
types in the list for the integer literal are unsigned, the extended integer
type shall be unsigned
. If the list contains both signed and unsigned
types, the extended integer type may be signed or unsigned
. A program is
ill-formed if one of its translation units contains an integer literal
that cannot be represented by any of the allowed types
.character-literal:
encoding-prefixopt ' c-char-sequence '
encoding-prefix: one of
u8 u U L
c-char-sequence:
c-char
c-char-sequence c-char
c-char:
any member of the source character set except the single-quote ', backslash \, or new-line character
escape-sequence
universal-character-name
escape-sequence:
simple-escape-sequence
octal-escape-sequence
hexadecimal-escape-sequence
simple-escape-sequence: one of
\' \" \? \\
\a \b \f \n \r \t \v
octal-escape-sequence:
\ octal-digit
\ octal-digit octal-digit
\ octal-digit octal-digit octal-digit
hexadecimal-escape-sequence:
\x hexadecimal-digit
hexadecimal-escape-sequence hexadecimal-digit
A character literal is one or more characters enclosed in single quotes,
as in
'x', optionally preceded by
u8,
u,
U, or
L,
as in
u8'w',
u'x',
U'y', or
L'z',
respectively
. An ordinary character literal that contains a
single
c-char representable in the execution character
set has type
char, with value equal to the
numerical value of the encoding of the
c-char in the
execution character set
. A multicharacter literal, or an ordinary character literal containing
a single
c-char not representable in the execution
character set, is conditionally-supported, has type
int,
and has an
implementation-defined value
. A character literal that
begins with
u8, such as
u8'w',
is a character literal of type
char,
known as a
UTF-8 character literal. The value of a UTF-8 character literal
is equal to its ISO/IEC 10646 code point value,
provided that the code point value
is representable with a single UTF-8 code unit
(that is, provided it is in the C0 Controls and Basic Latin Unicode block)
. If the value is not representable with a single UTF-8 code unit,
the program is ill-formed
. A UTF-8 character literal containing multiple
c-chars is ill-formed
.A character literal that
begins with the letter
u, such as
u'x',
is a character literal of type
char16_t. The value
of a
char16_t character literal containing a single
c-char is
equal to its ISO/IEC 10646 code point value, provided that the code point value is
representable with a single 16-bit code unit (that is, provided it is in the
basic multi-lingual plane)
. If the value is not representable
with a single 16-bit code unit, the program is ill-formed
. A
char16_t character literal
containing multiple
c-chars is ill-formed
.A character literal that
begins with the letter
U, such as
U'y',
is a character literal of type
char32_t. The value of a
char32_t character literal containing a single
c-char is equal
to its ISO/IEC 10646 code point value
. A
char32_t character literal containing
multiple
c-chars is ill-formed
. A wide-character literal has type
wchar_t.
The value of a wide-character literal containing a single
c-char has value equal to the numerical value of the encoding
of the
c-char in the execution wide-character set, unless the
c-char has no representation in the execution wide-character set, in which
case the value is
implementation-defined
. [
Note: The type
wchar_t is able to
represent all members of the execution wide-character set (see
[basic.fundamental])
. —
end note ]
The value
of a wide-character literal containing multiple
c-chars is
implementation-defined
. Certain non-graphic characters, the single quote
', the double quote
",
the question mark
?,
and the backslash
\, can be represented according to
Table
8. The double quote
" and the question mark
?, can be
represented as themselves or by the escape sequences
\" and
\? respectively, but
the single quote
' and the backslash
\
shall be represented by the escape sequences
\' and
\\ respectively
. Escape sequences in
which the character following the backslash is not listed in
Table
8 are conditionally-supported, with
implementation-defined semantics
. An escape sequence specifies a single
character
.Table
8 — Escape sequences
new-line | NL(LF) | \n |
horizontal tab | HT | \t |
vertical tab | VT | \v |
backspace | BS | \b |
carriage return | CR | \r |
form feed | FF | \f |
alert | BEL | \a |
backslash | \ | \\ |
question mark | ? | \? |
single quote | ' | \' |
double quote | " | \" |
octal number | ooo | \ooo |
hex number | hhh | \xhhh |
The escape
\ooo consists of the backslash followed by one,
two, or three octal digits that are taken to specify the value of the
desired character
. The escape
\xhhh
consists of the backslash followed by
x followed by one or more
hexadecimal digits that are taken to specify the value of the desired
character
. There is no limit to the number of digits in a hexadecimal
sequence
. A sequence of octal or hexadecimal digits is terminated by the
first character that is not an octal digit or a hexadecimal digit,
respectively
. The value of a character literal is
implementation-defined if it falls outside of the
implementation-defined
range defined for
char (for character literals with no prefix) or
wchar_t (for character literals prefixed by
L)
. [
Note: If the value of a character literal prefixed by
u,
u8, or
U
is outside the range defined for its type,
the program is ill-formed
. —
end note ]
A
universal-character-name is translated to the encoding, in the appropriate
execution character set, of the character named
. [
Note: In translation phase 1, a
universal-character-name is introduced whenever an
actual extended
character is encountered in the source text
. However,
the actual compiler implementation may use its own native character set,
so long as the same results are obtained
. —
end note ]
floating-literal:
decimal-floating-literal
hexadecimal-floating-literal
decimal-floating-literal:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
hexadecimal-floating-literal:
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffixopt
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffixopt
fractional-constant:
digit-sequenceopt . digit-sequence
digit-sequence .
hexadecimal-fractional-constant:
hexadecimal-digit-sequenceopt . hexadecimal-digit-sequence
hexadecimal-digit-sequence .
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
binary-exponent-part:
p signopt digit-sequence
P signopt digit-sequence
sign: one of
+ -
digit-sequence:
digit
digit-sequence 'opt digit
floating-suffix: one of
f l F L
A floating literal consists of
an optional prefix specifying a base,
an integer part,
a radix point,
a fraction part,
an
e,
E,
p or
P,
an optionally signed integer exponent, and
an optional type suffix
. The integer and fraction parts both consist of
a sequence of decimal (base ten) digits if there is no prefix, or
hexadecimal (base sixteen) digits if the prefix is
0x or
0X. [
Example: The floating literals
1.602'176'565e-19 and
1.602176565e-19
have the same value
. —
end example ]
Either the integer part or the fraction part (not both) can be omitted
. Either the radix point or the letter
e or
E and
the exponent (not both) can be omitted from a decimal floating literal
. The radix point (but not the exponent) can be omitted
from a hexadecimal floating literal
. The integer part, the optional radix point, and the optional fraction part,
form the
significand of the floating literal
. In a decimal floating literal, the exponent, if present,
indicates the power of 10 by which the significand is to be scaled
. In a hexadecimal floating literal, the exponent
indicates the power of 2 by which the significand is to be scaled
. [
Example: The floating literals
49.625 and
0xC.68p+2 have the same value
. —
end example ]
If the scaled value is in
the range of representable values for its type, the result is the scaled
value if representable, else the larger or smaller representable value
nearest the scaled value, chosen in an
implementation-defined manner
. The type of a floating literal is
double
unless explicitly specified by a suffix
. The suffixes
f and
F specify
float,
the suffixes
l and
L specify
long
double. If the scaled value is not in the range of representable
values for its type, the program is ill-formed
.string-literal:
encoding-prefixopt " s-char-sequenceopt "
encoding-prefixopt R raw-string
s-char-sequence:
s-char
s-char-sequence s-char
s-char:
any member of the source character set except the double-quote ", backslash \, or new-line character
escape-sequence
universal-character-name
raw-string:
" d-char-sequenceopt ( r-char-sequenceopt ) d-char-sequenceopt "
r-char-sequence:
r-char
r-char-sequence r-char
r-char:
any member of the source character set, except a right parenthesis ) followed by
the initial d-char-sequence (which may be empty) followed by a double quote ".
d-char-sequence:
d-char
d-char-sequence d-char
d-char:
any member of the basic source character set except:
space, the left parenthesis (, the right parenthesis ), the backslash \, and the control characters
representing horizontal tab, vertical tab, form feed, and newline.
A
string-literal is a sequence of characters (as defined
in
[lex.ccon]) surrounded by double quotes, optionally prefixed by
R,
u8,
u8R,
u,
uR,
U,
UR,
L,
or
LR,
as in
"...",
R"(...)",
u8"...",
u8R"**(...)**",
u"...",
uR"*~(...)*~",
U"...",
UR"zzz(...)zzz",
L"...",
or
LR"(...)",
respectively
.[
Note: The characters
'(' and
')' are permitted in a
raw-string. Thus,
R"delimiter((a|b))delimiter" is equivalent to
"(a|b)". —
end note ]
[
Note: A source-file new-line in a raw string literal results in a new-line in the
resulting execution string literal
. Assuming no
whitespace at the beginning of lines in the following example, the assert will succeed:
const char* p = R"(a\
b
c)";
assert(std::strcmp(p, "a\\\nb\nc") == 0);
—
end note ]
[
Example: The raw string
R"a(
)\
a"
)a"
is equivalent to
"\n)\\\na\"\n". The raw string
R"(x = "\"y\"")"
is equivalent to
"x = \"\\\"y\\\"\"". —
end example ]
Ordinary string literals and UTF-8 string literals are
also referred to as narrow
string literals
. A narrow string literal has type
“array of
n const char”, where
n is the size of
the string as defined below, and has
static storage duration.For a UTF-8 string literal, each successive element of the
object representation has the value of the corresponding
code unit of the UTF-8 encoding of the string
.A
string-literal that begins with
u,
such as
u"asdf", is
a
char16_t string literal
. A
char16_t string literal has
type “array of
n const char16_t”, where
n is the
size of the string as defined below; it
is initialized with the given characters
. A single
c-char may
produce more than one
char16_t character in the form of
surrogate pairs
.A
string-literal that begins with
U,
such as
U"asdf", is
a
char32_t string literal
. A
char32_t string literal has
type “array of
n const char32_t”, where
n is the
size of the string as defined below; it
is initialized with the given characters
. A wide string literal has type “array of
n const
wchar_t”, where
n is the size of the string as defined below; it
is initialized with the given characters
. If a UTF-8 string literal token is adjacent to a
wide string literal token, the program is ill-formed
. Any other concatenations are
conditionally-supported with
implementation-defined
behavior
. [
Note: This
concatenation is an interpretation, not a conversion
. Because the interpretation happens in translation phase 6 (after each character from a
string literal has been translated into a value from the appropriate character set), a
string-literal's initial rawness has no effect on the interpretation or
well-formedness of the concatenation
. —
end note ]
Table
9 has some examples of valid concatenations
.Table
9 — String literal concatenations
Source | Means | Source | Means | Source | Means |
u"a" | u"b" | u"ab" | U"a" | U"b" | U"ab" | L"a" | L"b" | L"ab" |
u"a" | "b" | u"ab" | U"a" | "b" | U"ab" | L"a" | "b" | L"ab" |
"a" | u"b" | u"ab" | "a" | U"b" | U"ab" | "a" | L"b" | L"ab" |
Characters in concatenated strings are kept distinct
.[
Example:
"\xA" "B"
contains the two characters
'\xA' and
'B'
after concatenation (and not the single hexadecimal character
'\xAB')
. —
end example ]
After any necessary concatenation, in
translation phase
7,
'\0' is appended to every
string literal so that programs that scan a string can find its end
.Escape sequences and
universal-character-names in non-raw string literals
have the same meaning as in
character literals, except that
the single quote
' is representable either by itself or by the escape sequence
\', and the double quote
" shall be preceded by a
\,
and except that a
universal-character-name in a
char16_t string literal may yield a surrogate pair
. The
size of a
char32_t or wide string literal is the total number of
escape sequences,
universal-character-names, and other characters, plus
one for the terminating
U'\0' or
L'\0'. The size of a
char16_t string
literal is the total number of escape sequences,
universal-character-names, and other characters, plus one for each
character requiring a surrogate pair, plus one for the terminating
u'\0'. [
Note: The size of a
char16_t
string literal is the number of code units, not the number of
characters
. —
end note ]
The size of a narrow string literal is
the total number of escape sequences and other characters, plus at least
one for the multibyte encoding of each
universal-character-name, plus
one for the terminating
'\0'.Evaluating a
string-literal results in a string literal object
with static storage duration, initialized from the given characters as
specified above
. Whether all string literals are distinct (that is, are stored in
nonoverlapping objects) and whether successive evaluations of a
string-literal yield the same or a different object is
unspecified
. [
Note: The effect of attempting to modify a string literal is undefined
. —
end note ]
boolean-literal:
false
true
The Boolean literals are the keywords
false and
true. Such literals are prvalues and have type
bool.pointer-literal:
nullptr
The pointer literal is the keyword
nullptr. It is a prvalue of type
std::nullptr_t. [
Note: std::nullptr_t is a distinct type that is neither a pointer type nor a pointer-to-member type;
rather, a prvalue of this type is a null pointer constant and can be
converted to a null pointer value or null member pointer value
. —
end note ]
user-defined-literal:
user-defined-integer-literal
user-defined-floating-literal
user-defined-string-literal
user-defined-character-literal
user-defined-integer-literal:
decimal-literal ud-suffix
octal-literal ud-suffix
hexadecimal-literal ud-suffix
binary-literal ud-suffix
user-defined-floating-literal:
fractional-constant exponent-partopt ud-suffix
digit-sequence exponent-part ud-suffix
hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part ud-suffix
hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part ud-suffix
user-defined-string-literal:
string-literal ud-suffix
user-defined-character-literal:
character-literal ud-suffix
ud-suffix:
identifier
The syntactic non-terminal preceding the
ud-suffix in a
user-defined-literal is taken to be the longest sequence of
characters that could match that non-terminal
. Let
S be the set of declarations found by
this lookup
. If
S contains a literal operator with
parameter type
unsigned long long, the literal
L is treated as a call of
the form
operator "" X(nULL)
Otherwise,
S shall contain a raw literal operator
or a numeric literal operator template (
[over.literal]) but not both
. If
S contains a raw literal operator,
the literal
L is treated as a call of the form
operator "" X("n")
Otherwise (
S contains a numeric literal operator template),
L is treated as a call of the form
operator "" X<'c1', 'c2', ... 'ck'>()
where
n is the source character sequence
c1c2...ck. [
Note: The sequence
c1c2...ck can only contain characters from the basic source character set
. —
end note ]
If
S contains a literal operator
with parameter type
long double, the literal
L is treated as a call of
the form
operator "" X(fL)
Otherwise,
S shall contain a raw literal operator
or a numeric literal operator template (
[over.literal]) but not both
. If
S contains a raw literal operator,
the
literal L is treated as a call of the form
operator "" X("f")
Otherwise (
S contains a numeric literal operator template),
L is treated as a call of the form
operator "" X<'c1', 'c2', ... 'ck'>()
where
f is the source character sequence
c1c2...ck. [
Note: The sequence
c1c2...ck can only contain characters from the basic source character set
. —
end note ]
If
L is a
user-defined-string-literal,
let
str be the literal without its
ud-suffix
and let
len be the number of code units in
str
(i.e., its length excluding the terminating null character).
If
S contains a literal operator template with
a non-type template parameter for which
str is
a well-formed
template-argument,
the literal
L is treated as a call of the form
operator "" X<str>()
Otherwise, the literal
L is treated as a call of the form
operator "" X(str, len)
S shall contain a
literal operator whose only parameter has
the type of
ch and the
literal
L is treated as a call
of the form
operator "" X(ch)
[
Example:
long double operator "" _w(long double);
std::string operator "" _w(const char16_t*, std::size_t);
unsigned operator "" _w(const char*);
int main() {
1.2_w; u"one"_w; 12_w; "two"_w; }
—
end example ]
During concatenation,
ud-suffixes are removed and ignored and
the concatenation process occurs as described in
[lex.string]. [
Example:
int main() {
L"A" "B" "C"_x; "P"_x "Q" "R"_y;}
—
end example ]