3.2 Lexical Translations

A raw Unicode character stream is translated into a sequence of Java tokens, using the following three lexical translation steps, which are applied in turn:

Java always uses the longest possible translation at each step, even if the result does not ultimately make a correct Java program, while another lexical translation would. Thus the input characters a--b are tokenized (§3.5) as a, --, b, which is not part of any grammatically correct Java program, even though the tokenization a, -, -, b could be part of a grammatically correct Java program.