Data Types in Java
All the data types in Java have been categorized into two, primitive data types
and the reference data types.
The primitive types, also known as built-in types, are byte, short, int, long,
float, double, char and boolean.
The reference data types are arrays, classes, interfaces, enums, annotations
and records.
enums and annotations were introduced into the Java programming
language from Java 5 and records were introduced into the Java Programming
Language from Java 16.
From Java 8, there is another type called Functional Interface, which is identified by compiler. The Functional Interface is a kind of interface, for which instances may be created using lambda expressions(introduced into the Java Programming Language from java 8).
Primitive Data Types
The following table list the primitive types in Java:

The primitive types may be further categorized into numeric types and boolean
type.
boolean Data Type
A variable of the boolean type can have only two possible values, true or false. In Java, boolean is a separate data type and is not like C language, where a numeric value of non-zero or zero is considered as true or false, i.e. a numeric type cannot be used as boolean. E.g. we cannot use a numeric expression in the condition of the if statement, as is allowed in C, where boolean is not a separate data type. There are many operators on other data types which result in value of type boolean, and all the Logical operators available for boolean
type also result in value of boolean. The various operators will been discussed in a separate tutorial.
Numeric Data Types
In the Java programming language, we do not have the signed or the unsigned prefix for the numeric data types. Note that signed and unsigned are not keywords in Java.
Among the numeric types, byte, short, int and long are the signed integral types, char is an unsigned integral type and the float and double are single-precision and double-precision signed floating-point data types.
byte order
For all the numeric types whose size is more than 1 byte, there are two ways of ordering the bytes. The byte order for a machine could be big-endian or little-endian. The byte ordering is normally dependent on the architecture of the machine. All of the ×86 and ×64 machines follow the little-endian byte ordering for the numeric types, and most of the other machines, with RISC-based architecture follow the big-endian byte ordering. The Java Virtual Machine
follows the big-endian byte ordering. The difference in the two types of byte orderings is that in case of the little-endian byte ordering, the most significant byte is at a higher location (comes at the end) and the least significant byte is at a lower location (comes first), whereas in case of the big-endian byte ordering, the most significant byte is at a lower location (comes first) and the least significant byte is at a higher location (comes at the end). The following figure shows the difference in byte ordering.

integral data types
What is the size of int in C? Is it 2 bytes or 4 bytes? It depends on the platform. Most of the C compilers on the ×86 platform consider int as 2 bytes, whereas on the RISC machines, it is commonly found to be using 4 bytes, in some cases it may even use 8 bytes. Java works on single platform, the JVM; so on this platform the sizes of each of the data types are fixed. The size of byte is 1 byte, short is 2 bytes, int is 4 bytes and long is 8 bytes. The integral values are stored using 2’s complement. The range of values for the various integral data
types is given in Table below:

floating-point data types
The types float and double represent the single-precision and double-precision floating-point values and their sizes are 4 and 8 bytes, respectively. These floating-point data type representations are according to the IEEE-754 standard for single-precision (32-bit) and double-precision (64-bit) floating-point values. This standard uses the most significant bit (MSB) as the sign bit, and the rest of the bits are divided among two fields, an exponent and a significand. In case of the single-precision value (float of Java), bit-31 is the sign bit, bits 30–23 are used for the exponent and bits 22–0 are used for the significand. In case of the double-precision value (double of Java), bit-63 is the sign bit, bits 62–52 are used for the exponent and bits 51–0 are used for the significand. The range of values for the two floating-point data types is given in Table below:

Infinities and NaNs for floating-point types
Let us consider the code given below:
public static void main(String[] args) {
double a = 75;
double b = 0;
double c = a / b;
System.out.println(c);
}
What would be the output of executing the above code? Error at line 4 (divide y zero)? No this does not give any error at runtime and executes perfectly. What, is the value of dividing 75 by 0? The above code prints Infinity. If we use int instead of the double data type in the above code then we will get an error at runtime, but for the floating point data types, division by zero is not an error at Runtime. The floating-point data types can represent infinities (positive and negative) as well as not-a-number (result of 0 divide by 0).
According to the IEEE 754, the floating-point numbers have a representation for positive infinity and negative infinity. There are also representations for the values that are Not-a-Number (NaN).
When all bits for the exponent field are one, then these numbers are used to represent the infinities and the NaNs. The figures below shows the bit representations for the infinities and the NaNs.


When all exponent bits are one and all significand bits are zero, they represent infinity. Depending on the sign bit, the infinity is either a positive infinity or a negative infinity. When all exponent bits are one and the significand is a non-zero value, then they represent NaN.
floating-point types and precision
The use of floating-point data types float and double is discouraged, since these data types have a limitation about precision level. These types cannot precisely represent all the floating-point values. The precision level falls significantly as magnitude of value increases (moving away from zero). for e.g. if, we consider the float data type, then we find that float is incapable of represnting 123456789. The float can represent the value 123456784 and then 123456792. We can try the code in Listing below to test this.
class TestFloatingPoint {
public static void main(String[] args) {
for (int i = 123456784; i < 123456793; i++) {
float f = i;
System.out.printf(”int:%d, float:%10.1f\n” , i, f ) ;
}
}
}
Atleast, when dealing with larger values do not use the and float and double data types. When dealing with very high values, we have the option of using the classes like java.math.BigInteger and java.math.BigDecimal, we will discuss them in another tutorial.
char data type
char data type in a programming language is used to represent a unit of text. How is text data represented? The text data is represented as sequence of characters. The char data type is simply a numeric value of the character from a character set.
Character sets.
What is a character set? A character set is a collection of a unit of text(character) which are assigned some unique numeric value. There are various character sets available. The most commonly known character set is the ASCII(American Standard Code for Information Interchange) character set which assigns only 128 characters(including the control characters) to the numeric values in the range from 0 — 127. Not all characters in a character set are printable. Character sets also include control characters. e.g. we have characters for a carriage-return, line-feed, form-feed, tab, bell etc. These are not having a print position, but give effect to position of next character or some such function.
In the initial days of computing each OS would support a particular character set and the most commonly supported character sets used to be either ASCII or EBCDIC(Extended Binary Coded Decimal Interchange Code). Later ASCII and its extensions were adopted by most of the OSs. The ASCII character set included only the commonly used latin characters and control characters. Later
ASCII and its extensions became more popular, and most new character sets were extensions of the ASCII character set. Each character set was catering to demands of a particular region/culture’s requirement of text representations. Most of these extensions of ASCII utilized the numeric values from 128 to 255 for the additional characters.
Some of the common examples of character sets which are extensions of ASCII are the ISO 8859 series, e.g. 8859-1 or Latin-1 caters to the Western European, 8859-2 or Latin-2 caters to Eastern European, 8859-3 or Latin-3 caters to Southern European, 8859-4 or Latin-4 caters to Northern European, 8859-5 or Cyrillic caters to Russian, Bulgarian, 8859-6 or Arabic, caters to the Arabic
characters, 8859-7 or Greek, caters to the Greek characters, 8859-8 or Hebrew, caters to Hebrew characters, 8859-9 or Latin-5 caters to Turkish characters, 8859-10 or Latin-6 caters to Northern European, 8859-11 or Thai, caters to the Thai characters, 8859-13 or Latin-7 caters to Baltic, 8859-14 or Latin-8 caters to Celtic, 8859-15 or Latin-9 caters to Western European and 8859-16 or Latin-10
caters to Eastern European. There other extensions of ASCII. We have ISCII (Indian Script Code for Information Interchange) which caters to the Indian scripts. The ISO 8859-12 was reserved for catering to the Devanagari script, but this was abandoned. The ISO 8859 series of character sets are summarized in Table below:

All the character sets mentioned above are extensioons of ASCII. i.e. They have the same characters in the range from 0 – 127, as ASCII. A quick summary of the ASCII characters is given in Table below:

The char data type used by many of the programming languages would simply rely on the OSs interpretation of the numeric value of the char data. i.e. if the OS used a different character set, the same numeric value would be interpreted differently. e.g. According to Latin-1 (8859-1) the value EB(Hex) is used for representing the character ë, whereas according to Greek (8859-7) the value EB(Hex) is used for representing the character 
To solve this problem a universal character set was designed in the form of Unicode. The first version of Unicode was introduced in 1991. Unicode character set was designed to include all the characters available in all the languages/scripts of the world. This character set does get revised to include newer characters being added in various regions, as well as identification of some languages/scripts
which were not included in the earlier version. This character set has been designed to use numeric values from 0 — 0x10FFFFHex. This character set is also an extension of ASCII, so the initial values from 0 to 127 are same as ASCII. Most of the Indian scripts have been provided a block of 128 characters each, starting from 0x0900 onwards. The blocks for the Central and South East Asian Scripts in Unicode are summarized in Tables below:



What is the size of char in C? The size of char is usually 1 byte in C. The size of char in Java is 2 bytes. It is an unsigned integral, 16-bit value, used for representing UTF-16 code-units.
Why is the size of a char, 2 bytes in Java? In C, char represents a character from the platform’s local character set, which in most cases, is some extension of ASCII. The number of characters in most of these character sets is normally upto 256, so they require only 1 byte. In the case of Java, the char type is used to represent characters from the Unicode character set using the UTF-16 encoding, which requires 16 bits. The details of the UTF encodings will be given in another article.
Unicode
Let us understand Unicode. Unicode is a character set that has characters from all the languages of the world. There are various versions of the Unicode character set. At the time of writing this tutorial, the version of Unicode was 16. The Unicode standard maps characters from all the languages to a unique codepoint value. The codepoint values can be in the range of 0 —10FFFF (Hex). This codepoint range has been divided into 17 planes, each of 65536 values, i.e. 2^16 . The zeroeth plane, i.e. values from 0 —FFFF(Hex) is known as BMP (Basic Multilingual Plane), and other planes define the supplementary characters. To represent the complete range of characters using only 16-bit units, the Unicode standard defines an encoding called UTF-16. In this encoding, supplementary characters are represented as a pair of 16-bit code units, the first code unit from the high-surrogates range (D800 – DBFF(Hex)), and the second code unit from the low-surrogates range (DC00 – DFFF(Hex)). In Unicode standard, the range of codepoint values from D800 to DFFF (Hex) has not been assigned to any valid character and is reserved for surrogates. For characters in the range of 0000 —FFFF(Hex), the values of codepoints and UTF-16 code units are the same. The Java programming language represents text in sequences of 16-bit code units using the UTF-16 encoding. The char type in the Java programming language represents the 16-bit code unit.
Unicode Escapes in Java Source Code
The Java source code is a sequence of Unicode characters. The Java source code can contain characters from any language and not just characters from the ASCII character set. Most of the time the source code is encoded in some native character set, which is an extension of ASCII. Even in these cases the Java source code can include characters that are not part of the native character set. This is done by using the Unicode escape. In the source code we can specify any UTF-16 code unit by specifying the value as \u followed by four hexadecimal digits.
Tutorial to be continued for discusing literals and reference data types in Java.