Students Learn About: Data types
| Students Learn To:
|
Introduction
Data items are the raw materials on which computer programs operate.These data items must be stored in binary if they are to be manipulated by the instructions that form the software programs.
Each data item must be assigned a data type.
The data type determines how each of the data items will be represented in binary and what kind of processing the software will be able to perform on them.
There are a number of data types that are used so frequently that most programming languages include them as predefined parts of the language. These data types are those that are used in everyday life. For example,
- we use whole numbers (integers) for counting and performing arithmetic,
- numbers with decimal points or real numbers (floating point) for fractional and large number computations and
- words and sentences (strings) for all forms of writing.
- We use yes/no or true/false (Boolean) data to answer questions and make decisions.
- Dates and times are used to schedule our lives and
- currency is used for purchasing.
Digital Data
- The computer is a two-state device that uses only two digits: 0 and 1.
- Two digits are easily represented electronically by circuits in the computer being either on or off.
- The digit 1 is used to represent the electronic state of ‘on’ and the digit 0 is used to represent the electronic state of ‘off’. Each on or off digit is called a bit (binary digit).
- A bit is the smallest unit of data stored in a computer.
- When used to represent text, a byte stands for a single character, such as a letter, a number, a punctuation mark or a space.
- Because a byte is such a small unit, the prefixes ‘kilo’, ‘mega’, ‘giga’ and ‘tera’ are added to create more useful units for measuring data storage (see Table 1.2).
Unit Symbol Meaning Approximate value Exact value
(bytes) (bytes)
byte
b
1
1 (20)
kilobyte
Kb
thousand bytes
1 000
1024 (210)
megabyte
Mb
million bytes
1 000 000
1 048 576 (220)
gigabyte
Gb
billion bytes
1 000 000 000
1 073 741 824 (230)
terabyte
Tb
trillion bytes
1 000 000 000 000
1 099 511 627 776 (240)
Table 1.2 Units of measurement of digital data
The Binary System
The normal system we use for counting is called the decimal system. It is an arithmetic system using a base of 10 (the digits 0 to 9). The system of counting used by computers is called the binary system (or binary code). It is an arithmetic system using a base of two (the digits 0 and 1). Like the decimal system, the binary system uses place value to determine the worth of a digit. However, whereas the decimal system uses powers of ten (10, 100, 1000, etc.), the binary system uses powers of two (2, 4, 8, etc.) for its place values. A sub- script is used to distinguish between numbers with different bases. For example, 102 is the number ‘one zero’ in the base two (binary) system.
To change a binary number into a decimal number, we add the appropriate place values, as shown in the example below.
Example
Convert the binary number 1001110 into a decimal number.
|
Powers of 2 |
26 |
25 |
24 |
23 |
22 |
|
21 |
20 |
|
Value |
64 |
32 |
16 |
8 |
4 |
|
2 |
1 |
|
Binary number |
1 |
0 |
0 |
1 |
1 |
|
1 |
0 |
10011102 = (1 ´ 64) + (0 ´ 32) + (0 ´ 16) + (1 ´ 8) + (1 ´ 4) + (1 ´ 2) + (0 ´ 1)
= 64 + 8 + 4 + 2
= 7810
So, binary number 100110 equals decimal number 78.
Binary To Decimal
To change a decimal number into a binary number, we divide the binary place values into the decimal number. The result of the division is the binary digit, and the remainder is divided by the next place value. This process is repeated for all place values.
Example
Convert 10910 into binary.
|
Powers of 2 |
26 |
25 |
24 |
23 |
22 |
21 |
20 |
|
Value |
64 |
32 |
16 |
8 |
4 |
2 |
1 |
10910 = 64 + 32 + 8 + 4 + 1
= (1 ´ 64) + (1 ´ 32) + (0 ´ 16) + (1 ´ 8) + (1 ´ 4) + (0 ´ 2) + (1 ´ 1)
= 11011012
Decimal To Binary
The hexadecimal system
- Because they use only two digits, they result in very long strings of 1s and 0s.
- For this reason, many computers represent binary numbers in hexadecimal. The hexadecimal number system, or hex, is to the base 16, and uses the sixteen digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F.
- The numbers are often preceded by the $ (dollar) sign or more commonly now & (ampersand) sign to indicate that they are in hexadecimal code.
- So &A =1010, &B =1110, and so on.
- Because 16 is 24, it is very easy to convert binary numbers to hexadecimal and vice versa.
Basics of the Hexadecimal number system
To change a hexadecimal number into a decimal number, we add the appropriate place values, as shown in the example below.
|
Powers of 16 |
163 |
162 |
161 |
160 |
|
Value |
4096 |
256 |
16 |
1 |
|
Binary number |
1 |
B |
0 |
5 |
1B0516 = (1 ´ 4096) + (11 ´ 256) + (0 ´ 16) + (5 ´ 1)
So, hexadecimal 1B05 equals the decimal number 6917.= 4096 + 2816 + 5
= 691710 ……..
Hex to Decimal
Hex to Binary
|
Powers of 16 |
163 |
162 |
161 |
160 |
|
Value |
4096 |
256 |
16 |
1 |
42310 = 256 + 160 + 7
= (1 ´ 256) + (10 ´ 16) + (7 ´ 1) = 1A716
So, decimal 423 equals the hexadecimal number 1A7.
YouTube Video
Binary to Hex
ASCII and EBCDIC
- To be used in a computer, all data needs to be converted into a binary number.
- To ensure data from one computer can be used on another, there needs to be a standard method of converting letters, numbers, characters and instructions into binary code.
- Two commonly used coding methods are ASCII and EBCDIC.
ASCII
The standard coding method used on personal computers is called ASCII (pronounced ‘ass-kee’), which stands for the American Standard Code for Information Interchange. ASCII is a system for changing letters, numbers and symbols into a 7-bit code.
- For example, the letter ‘K’ is converted to the decimal number 75 using the ASCII code, and this number is then converted to the binary number 1001011, which can be stored by the computer.
- Seven-bit ASCII allows for 128 different characters (27), including 96 standard keyboard characters and 32 control characters.
- The keyboard characters include 26 upper case letters, 26 lower case letters, 10 digits and 34 symbols (the complete code is given in the Appendix).
- The control characters are used for computer functions such as ‘carriage return’ and ‘form feed’.
- The standard seven-bit ASCII was designed when computers were not extensively used outside the US and UK.
- However, it is a problem with many languages other than English.
- Many European languages include accent marks and special characters that cannot be represented by standard ASCII.
- Several larger character sets such as extended ASCII use eight bits, which gives 128 additional characters.
- The extra characters are used to represent non-English characters, graphic symbols and mathematical symbols.
- Because there are a number of different extended character sets, they are not always interchangeable between different computer systems.
- It only uses English alphabets.
- It is limited to 7-bits, so it can only represent 128 distinct characters.
- It is not usable for non-latin languages, such as Chinese.
EBCDIC
- A coding method used on large IBM computers is called EBCDIC (pro- nounced ‘ebb-see-dick’).
- It stands for Extended Binary Coded Decimal Inter- change Code and was adapted by IBM from punched card code in the 1960s.
- There exist at least six different versions, with one version of EBCDIC containing all the characters of ASCII.
- This allows data to be translated between the two codes.
- EBCDIC is a system that changes letters, numbers and symbols into an 8-bit code.
- This allows for 256 (28) different characters (the complete code is given in the Appendix).
- For example, the letter ‘A’ is converted to the decimal number 193 using the EBCDIC code, and this number is then converted to the binary number 11000001, which can be stored by the computer.
- The latest version of Unicode contains a repertoire of more than 120,000 characters covering 129 modern and historic scripts, as well as multiple symbol sets.
- ASCII character encoding is a subset of Unicode.
- Unicode can be implemented by different character encodings. The Unicode standard defines UTF-8, UTF-16 and UTF-32.
- So, these use between 8 and 32 bits per character and has the advantage that it represents many more unique characters than ASCII because of the larger number of bits available to store a character code.
- It uses the same codes as ASCII up to 127.
- UTF-8, the dominant encoding on the World Wide Web (used in over 92% of websites), uses one byte for the first 128 code points, and up to 4 bytes for other characters. The first 128 Unicode code points are the ASCII characters, which means that any ASCII text is also a UTF-8 text.
- UTF-16, uses 16 bits to represent each character. This means that it is capable of representing 65,536 different characters.
- UTF-32, uses 32 bits to represent each character, meaning it can represent a character set of 4,294,967,296 possible characters, enough for all known languages.
- Its major advantage is that it provides a unique standard for all the World's writing systems. It allows for multilingual text in any language.
Data Types
The developer needs to define data types in a problem solution. The data type decides how data will be stored and manipulated by the computer. The two major groups of data include the simple data types and the data structures in which simple data types are organised in more complex ways.
Data Type | Characteristics | Example/s |
Integer | Positive or negative whole number | -32768 to +32767 |
Double | + or – whole number that is bigger than an integer | +2147483647 |
Floating point | Real or decimal number | 455.999 |
Character | Any letter, number, command punctuation or symbol | &^*$ |
String | Sequence of characters with a single identity | Hello world |
Boolean | Variable with one of two possible outputs | True or false |
Character data type is stored as a sequence of bits and each character is represented by one byte. Character is a general term used to refer to any number, letter, symbol or command. The character is the smallest item of meaningful data as a bit has no meaning in itself and seven or eight bits are required in ASCII or EBCDIC to represent a character.
Integer data type is stored as two bytes the least significant bit can be used as a sign bit (0 for positive, 1 for negative). The size of the machine determine how many integers can be stored eg. 16 bit machine can store 216 integers in twos complement form, that is, the range of integers from –32768 to +32767 (0 is a positive)
Floating point or real numbers are represented by a mantissa and an exponent. The floating point is actually handled as a fraction. The exponent is the integer and the mantissa is the fractional part of the decimal. Each part requires a number of bytes for storage
To understand how this works
27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 0 | 2-1 | 2-2 | 2-3 | 2-4 | 2-5 | 2-6 | 2-7 |
128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 | 0 | ½ | ¼ | 1/8 | 1/16 | 1/32 | 1/64 | 1/128 |
Look at the negative values in this example:
The computer needs to be sent instructions to handle a mantissa and exponent. If the instructions are sent to calculate a number such as 1.5 this can be represented as a single byte.
20 | 2-1 | 2-2 | 2-3 | 2-4 | 2-5 | 2-6 | 2-7 |
1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
This example works because the value is able to be handled accurately. Many real numbers cannot be translated as accurately
String data type is a sequence of characters that keeps its identity as a single data element by use of size of bytes at the beginning of the string and an end of text character to mark the end of the string. Strings can contain any character that can be produced on the keyboard. Numbers stored as strings cannot have mathematical calculations performed on them
Boolean data type is a variable with two possible outputs and can be used for a variety of purposes in software. Usually 0 represents false and 1 represents true. Only one bit is ever stored in a Boolean data type and this can only ever be a 0 or a 1. Theoretically anything that has only two possible choices could be stored as a Boolean data type.
