About Identifiers in Java

posted 6 min read

Identifiers in Java

In the Java source code, we define/declare several entities. These entities are identifiable by some names. Identifiers are used to give names to the entities. These identifiers are used for naming the class(es), interface(s), enum(s), annotation(s), record(s). They are also used for giving names to various kinds of members of the class, interface, enum, annotation and record. The members which may be given names are the fields, methods. The identifiers are also used for giving names to variables. The identifiers are also used for giving names to the type parameters of a Generic class or method. The identifiers are also used for giving names to packages and modules. In case of package and module the names can use multiple identifiers separated by a '.' character. Identifiers are also used for giving names to labels within a method or a block. These labels are target of the break and the continue statements.

What are the rules for defining an identifier in Java? In Java, an identifier may contain any number of “Java letters” or “Java digits”, and it can start only with a “Java letter”. The sequence of “Java letters” in an identifier cannot match any of the keywords of the Java language or the boolean literals true, false or the literal null. A “Java letter” is not just the letters A — Z and a — z from the ASCII character set, but it also includes the letters from other languages available from the Unicode character set. The “Java letter” also includes the connecting punctuation characters like the ‘_’ character, currency symbols like the ‘$’, ‘’, ‘’, ‘£’, ‘¥’, etc. sign of a numeric letter like the roman numeral ‘X’. The “Java letter or digit” also includes the digits used in the various languages available in the Unicode character set and not just the digits 0 — 9 from the ASCII character set. It also includes the combining marks and the non-spacing marks, which may be used for combining characters, The following declaration shows a valid declaration of a Java identifier:

char अ = 'अ';

Here has been used as an identifier; since it is a letter in Hindi, this declaration is valid. But then how do we use these characters in a Java source file, which is created using a text editor, where only the ASCII characters may be available? In a Java source file before the compiler identifies the lines and the tokens, it looks for Unicode escapes in the Java source file. The Java compiler works on Unicode characters. Our Java source file is normally encoded in ASCII or some extension of ASCII. While decoding from ASCII to Unicode, the compiler would first replace the Unicode escapes in the Java file with the actual Unicode character value. Using the Unicode escape we can write the above declaration in a Java source file encoded in ASCII as shown below:

char \u0905 = '\u0905'; // 0905 is the hex value for hindi letter A

Unicode escape is written as \u followed by four hexadecimal digits, where the hexadecimal digits are the codepoint values for that character in the Unicode character set. The following code segment would not compile:

char ch = '\u000A'; // 000A is value for line feed

since this will be seen by the Java compiler as:

char ch = '
';

Instead char ch ='\n'; should be used to have character literal for newline. Continuing with examples of valid and invalid identifiers for Java, the declaration

String नमस्ते = ”नमस्ते ”;

is valid in Java since it is only made up of Letters, but

String ९नमस्ते = ”नमस्ते ”;

is not valid since it starts with a digit( is Devanagari digit nine). But

String ₹९नमस्ते = ”नमस्ते ”;

would become valid since it now does not start with a digit, but a currency symbol.

Let us now define a class called नमस्तेदुनिया with a main method similar to the main method of the class HelloWorld, let use parameter name as आर्ग instead of args, and it should print “नमस्तेदुनिया” on the standard output instead of “Hello world”. This can be done as given below:

Listing of class नमस्तेदुनिया

class नमस्तेदुनिया {
    public static void main(String[] आर्ग) {
        System.out.println(”नमस्तेदुनिया” ) ;
    }
}

It is not a good idea to use non-ASCII text in the Java source code directly, since the interpretation of non-ASCII text would largely depend on the encoding used by the native OS. So, if we want to include non-ASCII characters in the Java source file, we may be better off using the Unicode escapes for all such characters. This may seem to be a tedious task. This can easily be taken care of by the native2ascii utility, which is part of JDK. This utility can convert any text file encoded using any of the standard encoding to ASCII by applying the Unicode escapes for all the non-ASCII characters, and can also be reversed back to native encoding using the same utility. e.g. if the code in the above Listing is saved in a file named HelloWorldHindi.java. And saved using the utf-8 encoding, then this Java source file can be converted to ASCII only, by using the command as given below:

native2ascii -encoding utf-8 HelloWorldHindi.java HelloWorldHindi.java

This file can be converted back to utf-8 encoded format with the help of command as given below:

native2ascii -encoding utf-8 -reverse HelloWorldHindi.java HelloWorldHindi.java

Note: Since Java 8, Java assumes that the default source files are encoded in utf-8. The native2ascii utility has been removed from JDK 9 onwards. If still want to use the utility then we can install JDK8 and use it from the bin folder of JDK8 installation.

Identifier ignorable characters

There are also some non-printable characters (most of these are control-characters), which are ignored in an identifier by the Java compiler. These characters are known as Java-Identifier-Ignorable. i.e. if any such character is used in an identifier it will be ignored, so it is possible to have two different sequence of Unicode characters in an identifier, which mean the same. e.g.

class TestIdentifierIgnorable {
    public static void main(String[] args) {
        String str = ”Hello world!” ;
        System.out.println(s\u0001tr) ; // \u0001 is Java Identifier Ignorable character
    }
}

In the above code listing, the variable name str in line 3 is same as s\u0001tr used in line 4. So, the above code is legal, compiles successfully and when run, would print “Hello world!” on the standard output.

The following are the numeric value in hex for the Java Identifier Ignorable characters.

0, 1, 2, 3, 4, 5, 6, 7, 8, e, f, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 1a, 1b, 7f, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 8a, 8b, 8c,42, 8d, 8e, 8f, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 9a, 9b, 9c, 9d, 9e, 9f, ad

600, 601, 602, 603, 604, 605, 61c, 6dd, 70f, 8e2, 180e, 200b, 200c, 200d, 200e, 200f, 202a, 202b, 202c, 202d, 202e, 2060, 2061, 2062, 2063, 2064, 2066, 2067, 2068, 2069, 206a, 206b, 206c, 206d, 206e, 206f, feff, fff9, fffa, fffb

110bd, 110cd, 13430, 13431, 13432, 13433, 13434, 13435, 13436, 13437, 13438, 1bca0, 1bca1, 1bca2, 1bca3, 1d173, 1d174, 1d175, 1d176, 1d177, 1d178, 1d179, 1d17a, e0001, e0020, e0021, e0022, e0023, e0024, e0025, e0026, e0027, e0028, e0029, e002a, e002b, e002c, e002d, e002e, e002f, e0030, e0031, e0032, e0033, e0034, e0035, e0036, e0037, e0038, e0039, e003a, e003b, e003c, e003d, e003e, e003f, e0040, e0041, e0042, e0043, e0044, e0045, e0046, e0047, e0048, e0049, e004a, e004b, e004c, e004d, e004e, e004f, e0050, e0051, e0052, e0053, e0054, e0055, e0056, e0057, e0058, e0059, e005a, e005b, e005c, e005d, e005e, e005f, e0060, e0061, e0062, e0063, e0064, e0065, e0066, e0067, e0068, e0069, e006a, e006b, e006c, e006d, e006e, e006f, e0070, e0071, e0072, e0073, e0074, e0075, e0076, e0077, e0078, e0079, e007a, e007b, e007c, e007d, e007e, e007f.

Note: In order to use any of the values above Hex FFFF(supplementary characters), it will have to be encoded using the UTF-16 encoding, which will require two char values. e.g. the value Hex 110BD will be encoded as two 16-bit values, which will be, Hex D804 and DCBD(These pair of values is known as surrogate pair, first value is high-surrogate and the second value is the low-surrogate). So in any identifier having a pair of character \uD804\uDCBD will be ignored by the compiler.

i.e. the identifier str is also equivalent to s\uD804\uDCBDtr.

Misconception Note:

Misconception - Letters used for forming an identifier are only from the ASCII character set i.e. A – Z and a – z.

Fact - Java uses Unicode character set, and so identifiers can use letters from any of the languages available in Unicode.

Misconception - Digits used for forming an identifier are only from the ASCII character set i.e. 0 – 9.

Fact - Java uses Unicode character set, and so identifiers can use digits from any of the languages available in Unicode.

Misconception - $ is the only currency symbol available for use in an identifier.

Fact In Java, identifier can use any currency symbol available in the Unicode character set.

0 votes
0 votes

More Posts

Data Types in Java, Part 1, the primitive types in detail

Pravin - Sep 15, 2025

Unicode escape trick

Pravin - Oct 9, 2025

Data Types in Java, part 2 - specifying literals

Pravin - Sep 29, 2025

How I Built a React Portfolio in 7 Days That Landed ₹1.2L in Freelance Work

Dharanidharan - Feb 9

Beyond the 98.6°F Myth: Defining Personal Baselines in Health Management

Huifer - Feb 2
chevron_left

Related Jobs

View all jobs →

Commenters (This Week)

3 comments
3 comments
1 comment

Contribute meaningful comments to climb the leaderboard and earn badges!