2. Data Types

11 SDD‎ > ‎8.2 Introduction to Software Development‎ > ‎8.2.1 Defining and understanding the problem, and planning and designing software solutions‎ > ‎2. Planning and Designing Software Solutions‎ > ‎

Contents

1 Introduction
2 Digital Data
3 The Binary System
4 The hexadecimal system
5 ASCII and EBCDIC
1. 5.1 ASCII
2. 5.2 EBCDIC
6 Data Types

Students Learn About:

Data types

data types used in solutions, including:

integer
string
floating point/real
boolean

integer representation in binary, decimal and hexadecimal
characters represented as numbers in binary, decimal and hexadecimal
limitations of particular data types
data structures, including:

one-dimensional array
record

use of records in sequential files

Students Learn To:

interpret and use an ASCII table
identify the maximum decimal value that can be stored in a given number of bits
recognise the impact of the use of an inappropriate data type
select the most appropriate data type for the solution to a particular problem and discuss the merit of the chosen type
create a data dictionary which defines the data appropriately

Introduction

Data items are the raw materials on which computer programs operate.

These data items must be stored in binary if they are to be manipulated by the instructions that form the software programs.

Each data item must be assigned a data type.

The data type determines how each of the data items will be represented in binary and what kind of processing the software will be able to perform on them.

There are a number of data types that are used so frequently that most programming languages include them as predefined parts of the language. These data types are those that are used in everyday life. For example,

we use whole numbers (integers) for counting and performing arithmetic,
numbers with decimal points or real numbers (floating point) for fractional and large number computations and
words and sentences (strings) for all forms of writing.
We use yes/no or true/false (Boolean) data to answer questions and make decisions.
Dates and times are used to schedule our lives and
currency is used for purchasing.

All these data types are predefined in most programming languages.

Digital Data

Digital data is data that is represented using digits (numbers).

The computer is a two-state device that uses only two digits: 0 and 1.
Two digits are easily represented electronically by circuits in the computer being either on or off.
The digit 1 is used to represent the electronic state of ‘on’ and the digit 0 is used to represent the electronic state of ‘off’. Each on or off digit is called a bit (binary digit).
A bit is the smallest unit of data stored in a computer.

A group of eight bits is called a byte. A byte is the basic unit of measurement for digital data. Using eight bits means that there are 256 possible values for a byte (00000000, 00000001, etc.).

When used to represent text, a byte stands for a single character, such as a letter, a number, a punctuation mark or a space.
Because a byte is such a small unit, the prefixes ‘kilo’, ‘mega’, ‘giga’ and ‘tera’ are added to create more useful units for measuring data storage (see Table 1.2).

Unit   Symbol      Meaning         Approximate value       Exact value
                                                      (bytes)                       (bytes)

byte

b

1

1 (20)

kilobyte

Kb

thousand bytes

1 000

1024 (210)

megabyte

Mb

million bytes

1 000 000

1 048 576 (220)

gigabyte

Gb

billion bytes

1 000 000 000

1 073 741 824 (230)

terabyte

Tb

trillion bytes

1 000 000 000 000

1 099 511 627 776 (240)

Table 1.2   Units of measurement of digital data

The Binary System

The normal system we use for counting is called the decimal system. It is an arithmetic system using a base of 10 (the digits 0 to 9). The system of counting used by computers is called the binary system (or binary code). It is an arithmetic system using a base of two (the digits 0 and 1). Like the decimal system, the binary system uses place value to determine the worth of a digit. However, whereas the decimal system uses powers of ten (10, 100, 1000, etc.), the binary system uses powers of two (2, 4, 8, etc.) for its place values. A sub- script is used to distinguish between numbers with different bases. For example, 102 is the number ‘one zero’ in the base two (binary) system.

To change a binary number into a decimal number, we add the appropriate place values, as shown in the example below.

Example

Convert the binary number 1001110 into a decimal number.

Powers of 2	26	25	24	23	22	21	20
Value	64	32	16	8	4	2	1
Binary number	1	0	0	1	1	1	0

10011102 = (1 ´ 64) + (0 ´ 32) + (0 ´ 16) + (1 ´ 8) + (1 ´ 4) + (1 ´ 2) + (0 ´ 1)

= 64 + 8 + 4 + 2

= 7810

So, binary number 100110 equals decimal number 78.

Binary To Decimal

To change a decimal number into a binary number, we divide the binary place values into the decimal number. The result of the division is the binary digit, and the remainder is divided by the next place value. This process is repeated for all place values.

Example

Convert 109₁₀ into binary.

Powers of 2	26	25	24	23	22	21	20
Value	64	32	16	8	4	2	1

10910 = 64 + 32 + 8 + 4 + 1

= (1 ´ 64) + (1 ´ 32) + (0 ´ 16) + (1 ´ 8) + (1 ´ 4) + (0 ´ 2) + (1 ´ 1)

= 11011012

So, decimal number 109₁₀ equals the binary number 1101101.

Decimal To Binary

The hexadecimal system

Binary numbers are ideal for computers but very difficult for people.

Because they use only two digits, they result in very long strings of 1s and 0s.

For this reason, many computers represent binary numbers in hexadecimal. The hexadecimal number system, or hex, is to the base 16, and uses the sixteen digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F.

The numbers are often preceded by the $ (dollar) sign or more commonly now & (ampersand) sign to indicate that they are in hexadecimal code.

So &A =10₁₀, &B =11₁₀, and so on.
Because 16 is 2⁴, it is very easy to convert binary numbers to hexadecimal and vice versa.

Basics of the Hexadecimal number system

To change a hexadecimal number into a decimal number, we add the appropriate place values, as shown in the example below.

Example

Convert 1B05₁₆ into a decimal number.

Powers of 16	163	162	161	160
Value	4096	256	16	1
Binary number	1	B	0	5

1B0516 = (1 ´ 4096) + (11 ´ 256) + (0 ´ 16) + (5 ´ 1)

= 4096 + 2816 + 5

= 6917₁₀ ……..

So, hexadecimal 1B05 equals the decimal number 6917.

Hex to Decimal

Hex to Binary

To change a decimal number into a hexadecimal number, we divide the hexadecimal place values into the decimal number. The result of the division is the hexadecimal digit, and the remainder is divided by the next place value. This process is repeated for all place values, as shown in the next example.

Example

Convert 42310 into hexadecimal.

Powers of 16	163	162	161	160
Value	4096	256	16	1

42310 = 256 + 160 + 7

= (1 ´ 256) + (10 ´ 16) + (7 ´ 1) = 1A716

So, decimal 423 equals the hexadecimal number 1A7.

YouTube Video

Binary to Hex

ASCII and EBCDIC

To be used in a computer, all data needs to be converted into a binary number.

To ensure data from one computer can be used on another, there needs to be a standard method of converting letters, numbers, characters and instructions into binary code.

Two commonly used coding methods are ASCII and EBCDIC.

ASCII

The standard coding method used on personal computers is called ASCII (pronounced ‘ass-kee’), which stands for the American Standard Code for Information Interchange. ASCII is a system for changing letters, numbers and symbols into a 7-bit code.

For example, the letter ‘K’ is converted to the decimal number 75 using the ASCII code, and this number is then converted to the binary number 1001011, which can be stored by the computer.
Seven-bit ASCII allows for 128 different characters (27), including 96 standard keyboard characters and 32 control characters.
The keyboard characters include 26 upper case letters, 26 lower case letters, 10 digits and 34 symbols (the complete code is given in the Appendix).
The control characters are used for computer functions such as ‘carriage return’ and ‘form feed’.

The standard seven-bit ASCII was designed when computers were not extensively used outside the US and UK.
However, it is a problem with many languages other than English.

Many European languages include accent marks and special characters that cannot be represented by standard ASCII.
Several larger character sets such as extended ASCII use eight bits, which gives 128 additional characters.
The extra characters are used to represent non-English characters, graphic symbols and mathematical symbols.
Because there are a number of different extended character sets, they are not always interchangeable between different computer systems.

The ASCII has been used for a long time. But it has some serious shortcomings:

It only uses English alphabets.
It is limited to 7-bits, so it can only represent 128 distinct characters.
It is not usable for non-latin languages, such as Chinese.

EBCDIC

A coding method used on large IBM computers is called EBCDIC (pro- nounced ‘ebb-see-dick’).
It stands for Extended Binary Coded Decimal Inter- change Code and was adapted by IBM from punched card code in the 1960s.

There exist at least six different versions, with one version of EBCDIC containing all the characters of ASCII.

This allows data to be translated between the two codes.

EBCDIC is a system that changes letters, numbers and symbols into an 8-bit code.

This allows for 256 (28) different characters (the complete code is given in the Appendix).

For example, the letter ‘A’ is converted to the decimal number 193 using the EBCDIC code, and this number is then converted to the binary number 11000001, which can be stored by the computer.

Unicode

Unicode (Unique, Universal, and Uniform character enCoding) is the new standard for representing characters of all the languages of the World. This has been introduced to address the shortcomings of ASCII.

The latest version of Unicode contains a repertoire of more than 120,000 characters covering 129 modern and historic scripts, as well as multiple symbol sets.
ASCII character encoding is a subset of Unicode.
Unicode can be implemented by different character encodings. The Unicode standard defines UTF-8, UTF-16 and UTF-32.
So, these use between 8 and 32 bits per character and has the advantage that it represents many more unique characters than ASCII because of the larger number of bits available to store a character code.
It uses the same codes as ASCII up to 127.
UTF-8, the dominant encoding on the World Wide Web (used in over 92% of websites), uses one byte for the first 128 code points, and up to 4 bytes for other characters. The first 128 Unicode code points are the ASCII characters, which means that any ASCII text is also a UTF-8 text.
UTF-16, uses 16 bits to represent each character. This means that it is capable of representing 65,536 different characters.
UTF-32, uses 32 bits to represent each character, meaning it can represent a character set of 4,294,967,296 possible characters, enough for all known languages.
Its major advantage is that it provides a unique standard for all the World's writing systems. It allows for multilingual text in any language.

Data Types

The developer needs to define data types in a problem solution. The data type decides how data will be stored and manipulated by the computer. The two major groups of data include the simple data types and the data structures in which simple data types are organised in more complex ways.

Data Type	Characteristics	Example/s
Integer	Positive or negative whole number	-32768 to +32767
Double	+ or – whole number that is bigger than an integer	+2147483647
Floating point	Real or decimal number	455.999
Character	Any letter, number, command punctuation or symbol	&^*$
String	Sequence of characters with a single identity	Hello world
Boolean	Variable with one of two possible outputs	True or false

Character data type is stored as a sequence of bits and each character is represented by one byte. Character is a general term used to refer to any number, letter, symbol or command. The character is the smallest item of meaningful data as a bit has no meaning in itself and seven or eight bits are required in ASCII or EBCDIC to represent a character.

Integer data type is stored as two bytes the least significant bit can be used as a sign bit (0 for positive, 1 for negative). The size of the machine determine how many integers can be stored eg. 16 bit machine can store 216 integers in twos complement form, that is, the range of integers from –32768 to +32767 (0 is a positive)

Floating point or real numbers are represented by a mantissa and an exponent. The floating point is actually handled as a fraction. The exponent is the integer and the mantissa is the fractional part of the decimal. Each part requires a number of bytes for storage

To understand how this works

2⁷

2⁶

2⁵

2⁴

2³

2²

2¹

2⁰

2^-¹

2^-²

2^-³

2^-⁴

2^-⁵

2^-⁶

2^-⁷

128

1/8

1/16

1/32

1/64

1/128

Look at the negative values in this example:

The computer needs to be sent instructions to handle a mantissa and exponent. If the instructions are sent to calculate a number such as 1.5 this can be represented as a single byte.

2⁰	2-¹	2-²	2-³	2-⁴	2-⁵	2-⁶	2-⁷
1	1	0	0	0	0	0	0

This example works because the value is able to be handled accurately. Many real numbers cannot be translated as accurately

String data type is a sequence of characters that keeps its identity as a single data element by use of size of bytes at the beginning of the string and an end of text character to mark the end of the string. Strings can contain any character that can be produced on the keyboard. Numbers stored as strings cannot have mathematical calculations performed on them

Boolean data type is a variable with two possible outputs and can be used for a variety of purposes in software. Usually 0 represents false and 1 represents true. Only one bit is ever stored in a Boolean data type and this can only ever be a 0 or a 1. Theoretically anything that has only two possible choices could be stored as a Boolean data type.

Subpages (1): Student Activity

Comments

St Pius X - TAS Computing

2. Data Types

Introduction

Digital Data

The Binary System

Binary To Decimal

Decimal To Binary

The hexadecimal system

Basics of the Hexadecimal number system

Hex to Decimal

Hex to Binary

YouTube Video

Binary to Hex

ASCII and EBCDIC

ASCII

EBCDIC

Data Types