Introduction
While working on a code, where I wanted to hide some part of the code by using the unicode escapes instead of the actual code, I accidentally discovered an undocumented feature which as been available since Java 16.
I like to call this new feature as end-of-file comment.
In Java we know three types of comments
- A single line comment, which starts with a
// and finishes at the end of line.
- A block comments which starts with a
/* and ends with a */.
This comment can span multiple lines also.
- A Documentation comment which starts with a
/** and ends with a */. This is a special
case of a block comment which with a /**.
This comment is used by the javadoc tool to pick up descriptions for various elements in
the java code like the classes, fields, methods, constructors, etc. These are required to be
used just before the element for which the description is being specified.
end-of-file comment
From Java 16 we have another comment, I would like to call it as end-of-file comment. This
comment starts with the end-of-file character.
In a Java file we can use the end-of-file haracter to have comments at the end of the file. The unicode escape for end-of-file character is \u001a.
e.g.
Given content of a sample Hello.java file as follows:
class HelloWorld {
public static void main(String[] args) {
System.out.printf("Hello world\n");
}
}
interface HelloInterface {
public static void main(String[] args) {
System.out.println("Hello from interface");
}
}
enum HelloEnum {
Hello,
HI,
;
public static void main(String[] args) {
System.out.println("Hello from enum");
}
}
record HelloRecord {
public static void main(String[] args) {
System.out.println("Hello from record");
}
}
@interface HelloAnnotation {
}
In the above Hello.java file, if we want to comment out the last two definitions (i.e. the
HelloRecord and the HelloAnnotation), then we can use the end-of-file character before the
HelloRecord definition as shown below:
class HelloWorld {
public static void main(String[] args) {
System.out.printf("Hello world\n");
}
}
interface HelloInterface {
public static void main(String[] args) {
System.out.println("Hello from interface");
}
}
enum HelloEnum {
Hello,
HI,
;
public static void main(String[] args) {
System.out.println("Hello from enum");
}
}
\u001a The rest of the file content gets commented
record HelloRecord {
public static void main(String[] args) {
System.out.println("Hello from record");
}
}
@interface HelloAnnotation {
}
This end-of-file character works as start of comment upto end of file, since Java 16. Prior to Java 16, this character could only be used as the last character in a java file. Nothing was acceptable beyond this character.
i.e. in the above code, we would get a compilation error following the usage of the end-of-file character, when using Java compiler prior to Java 16.
About keywords and identifiers
A few interesting observations about Java keywords.
Prior to Java 9, all Java keywords were restricted identifiers (i.e. they follow the rules of identifiers, but are restricted from being used as identifiers).
From Java 9 when the module definition in a module-info.java file was introduced, Java restricted usage of some more identifiers in certain contexts e.g. module in a module definition Java perferred to call these as restricted keywords.
Then upto Java 15 some more identifiers were restricted for usage in certain contexts. Java preferred to call these as restricted identifiers. Though all keywords are also restricted idenfitiers (restricted as idenfitiers in all the contexts).
Then an interesting thing happened in Java 16.
non-sealed, the non identifier keyword for first time in Java 16
In Java 16, it categorized keywords as reserved keywords and contextual keywords. The reserved keywords are restricted as identifiers everywhere, whereas the contextual keywords are restricted as identifiers in certain contexts.
There is an interesting addition in the contextual keywords in Java 16. non-sealed, this is not a restricted identifier as is the case with all other keywords but rather it looks like an expression.
Processing of identifier-ignorable characters
Also another point to note here is that, while processing identifier-ignorable characters, the
compiler treats all the keywords as identifiers and the non-sealed is treated as being made up of two identifiers.
i.e. the identifier-ignorable characters are allowed in all the keywords.
e.g. instance\u00adof (\u00ad is unicode escape for the soft-hyphen character, which is one of the identifier-ignorable characters) is equivalent to using instanceof keyword.
The identifier-ignorable characters are discussed and listed in article 'Charsets and unicode
idenfitiers in Java'.
All these iidentifier-ignorable characters are valid as java-identifier-part but not as java-identifier-start. This can be checked with the following code segments:
IntStream.range(0,0x10ffff)
.filter(Character::isIdentifierIgnorable)
.allMatch(Character::isJavaIdentifierPart)
The above code segment returns true
IntStream.range(0,0x10ffff)
.filter(Character::isIdentifierIgnorable)
.anyMatch(Character::isJavaIdentifierStart)
The above code segment returns false
So, in case of non-sealed an identifier-ignorable character is not allowed in two places. One, in the beginning and another in the beginning of sealed in the non-sealed.
i.e. the following usage of identifier-ignorable character in non-sealed is acceptable
no\u00adn-sealed
whereas the following two usages of identifier-ignorable character in non-sealed is not
acceptable
\u00adnon-sealed and non-\u00adsealed
It seems that this new undocumented feature (end-of-file comment) has got introduced untentionally while taking care of the contextual keyword non-sealed (the only keyword which is not a restricted identifier).
Conclusion
So, we can see that since Java 16, an end-of-file comment is a new feature in Java, This feature has not been documented and is not compatible with the earlier versions of Java. So, Java needs to take a call on whether it would like to continue with this new feature and make appropriate changes in the JLS, or revert back to the usage of end-of-file character as it was prior to Java 16.