3.4.1 Database and Files

Database Design

The information processes in any large organisation are often unique and complex. 
The storage and retrieval of data must occur accurately many times each day.
Storage and retrieval depend on databases and files.

A database is an organised collection of data. An organisation stores all accounting and operational records in a database, often called an operational database. 

This database is a model of the organisation’s operational aspects. The data in an operational database is defined in a schema. The TPS usually has a restricted view of the operational database. For example, there is no need for an
order entry system to have access to the accounts payable data. This restricted view, or subset, of the data is called a subschema.



When designing a database for real-time transaction processing, the following features are important:

  • Good data placement
    • A large number of users are simultaneously performing transactions to change data. 
    • The database should be designed to access patterns of data use and to place frequently accessed data together.
  • Short transactions. 
    • Keeping transactions short enables the entire transaction to be processed quickly, which improves concurrency. 
    • User interaction during transaction processing should be avoided as this slows down the system.
  • Real-time backup
    • Real-time processing is characterised by continuous operations with downtime kept to an absolute minimum. Backing up the database needs to be scheduled during times of low activity to minimise effects on users.
  • High normalisation
    • Redundant information is kept to a minimum whenever possible to increase the speed of updates and improve concurrency. 
    • Reducing data also improves the speed of backups because less data needs to be backed up.
  • Archiving of historical data
    • Data that is rarely referenced should be archived into separate databases or moved out of the heavily updated tables. 
    • This keeps tables as small as possible, improving backup times and query performance.
  • Good hardware configuration. The hardware needs to be able to handle a large number of concurrent users and to provide quick response times.

Files used  in TPS

A file is a block of data. In a database, a file is divided into a set of related records. The records contain the specific information, such as details about a customer or a product. 

Every TPS uses files to store and organise its transaction data. 

Batch transaction processing and real-time transaction processing require different types of files and different methods of storage and retrieval. 

In a TPS, there are five basic types of files:

  • A master file contains information about an organisation’s business situation. 
    • The master file stores the operational database.
    • Transaction data is stored in the master file.
  • A transaction file is a collection of transaction records. 
    • The data in the transaction file is used to update the master file. 
    • Transaction files also serve as audit trails and history for the organisation.
  • A report file contains data that has been formatted for presentation to a user.
  • A work file is a temporary file in the system used during the processing.
  • A program file contains instructions for the processing of data. 
    • It is created from a high-level programming language, such as Cobol, Fortran,Visual Basic or C++.

Transaction Logs

In the field of databases in computer science, a transaction log (also transaction journaldatabase logbinary log or audit trail) is a history of actions executed by a database management system used to guarantee ACID properties over crashes or hardware failures. Physically, a log is a file listing changes to the database, stored in a stable storage format.

If, after a start, the database is found in an inconsistent state or not been shut down properly, the database management system reviews the database logs for uncommitted transactions and rolls back the changes made by these transactions. Additionally, all transactions that are already committed but whose changes were not yet materialized in the database are re-applied. Both are done to ensure atomicity and durability of transactions.

This term is not to be confused with other, human-readable logs that a database management system usually provides.

In database management systems, a journal is the record of data altered by a given process.



Database Mirroring vs Replication

Mirroring and Replication are the terms somehow related to copying of data in a DBMS. The prior difference between mirroring and replication is that mirroring refers to copy a database to another location whereas replication includes the copy of data and database objects from one database to another database.

Both mirroring and replication are advantageous and increases the availability and performance of the data or database.

Content: Mirroring Vs Replication

  1. Comparison Chart
  2. Definition
  3. Key Differences
  4. Conclusion

Comparison Chart

BASIS FOR COMPARISONMIRRORINGREPLICATION
BasicCreation of a database copy at a different location (machine).Creation of data and database object to enhance the distribution operations.
Performed onDatabaseOnly data and database objects
Located onDifferent machineDifferent database
CostVery expensiveInexpensive
Distributed databaseNo provision for distributed database supportSupports distributed database

Definition of Mirroring

Mirroring is the process of generating multiple copies of databases and it is also known as shadowing. These database copy usually located on different machine. If any primary server crashes or engaged for maintenance, at that instant, the system can automatically failover to the mirrored database. At any given time, only one copy can be accessed.

The tight coupling between the primary database and mirrored database is established with the help of sending blocks of the transaction log to the mirrored database. In case of any failure, it is also capable of restoring the data by copying it from one database to another. When any failover takes place, the mirror database becomes principal database.

Mirroring incorporate the redoing of the update, insert and delete operation that are committed in a primary database on to mirrored database without any delay. In a fully secure mode, a transaction cannot commit until the log records for the transaction have made it to disk on the mirror. Mirroring doesn’t support distributed database.

Definition of Replication

Replication is the process of creating distributing redundant data and database objects at different databases to enhance the availability of the data. It is capable of rolling up the corporate data from geographically dispersed sites and spreading the data to remote users on a local area networks or internet. It increases the execution of the parallel commands.

In Microsoft SQL server, the publisher is an entity that provides the data for replication to other servers. The subscriber is a server that usually receives the replicated data from a publisher.

There are three types of options available in the Microsoft SQL server for replication operation: Snapshot replication, Transaction replication and Merge replication.

  • Snapshot replication copies the data and database objects same as they appear in an instant.
  • Transaction replication generates an initial snapshot of the data to the subscriber then its incremental modification is sent to the subscriber as a discrete process. There are two replication processes defined under transactional replication namely log reader agent and distribution agent.The former process log reader agent reads the transactions from the database transaction log, applies an optional filter and stores them in a distribution database, which resembles a queue supporting store and forward mechanism of transactional replication. The latter process distribution agent forwards the changes to each subscriber.
  • Merge replication enables each replica to be self-governing whether it is online or offline. Whenever any change is committed on metadata of published object, it is tracked back to the subscriber’s and publisher’s end. It does not replicate the instance of data and database objects.

The data can be replicated by using database triggers with the help of some specific tools. The specialized stored procedure in the database signals the replicating agent to capture and transport the changes. Replication can also be employed with database mirroring, which should have a distributor.

Key Differences Between Mirroring and Replication

  1. Mirroring involves the duplication of a database stored at different machines where original database is known as primary database and copied database is known as a mirror. On the other hand, replication is the duplication of data and database objects stored at the different location to improve the performance of the distribution database.
  2. Mirroring is performed on the database while replication is implemented on data and database objects.
  3. The mirror database can usually be found in the different machine from its primary database. As against, the replicated data and database objects are stored in another database.
  4. The mirroring of database costs higher than replication.
  5. Mirroring doesn’t support distributed environment whereas replication was devised for the distributed database.

Conclusion

Mirroring and replication are the methods that help in improvising the data availability, reliability, and performance. But, mirroring involves redundant copies of a database whereas replication includes the duplication of data and database objects such as tables, stored procedures, user-defined functions, views, materialized views, etc.








Comments