Websites grows in regular basics. As the website data increases the DBMS is needs to be scaled for backup and performance. In this post I will explain how MySQL looks in large websites.
Backup prevents loss of data due to failure in hard disk. Some techniques used by small websites for database backup is mysqldump command, backing up table files(if tables are stored in files), text-file backups(represent tables in text files), logs(incremental backups), full backups, RAID(hard disk replication) and server backup software.
Large websites usually use replication and RAID for keeping multiple copies of the same data to prevent loss of data due to hardware failure or disasters.
Replication is a technique involving many computers to enable fault-tolerance(preventing loss of data due to hardware failure) and data accessibility(no down time). In a replication environment many computers share the same data with each other so that even if few computers go down, all the data will be available.
All the slaves contain exactly same data as master. There can be as many as slaves per master server. When a new slave is inserted to the environment, the master automatically syncs all data to the slave.
All the queries are redirected to master server, master server then executes the operations. When a write operation occurs, master replicates the newly written data to all slaves. When a large number sort or read operation are made, master distributes them to the slaves so that a large number of read and sort operations can be executed at a time.
If a slave fails, then also the environment continues working. when the slave again starts working, the master sends updated data to the slave.
If there is a crash in master server and it looses all data then you should convert a slave to master instead of bringing a new computer as a master. If we make a new computer as master then all data in the environment will be lost because new master will have no data and will makes the slaves also to have zero data(new master does resync). If master fails but data is persistent(disk not crashed) then starting up the same master server again will bring up the whole environment to running mode.
The nature of data sync between the master and slave actually depends on the type of replication configured. MySQL 5.6 onward there are three types of replication: synchronous, asynchronous and semi-synchronous.
- With asynchronous replication, the master writes events to its log and slaves request them when they are ready. There is no guarantee that any event will ever reach any slave.By default the nature of data synchronization is asynchronous in MySQL.
- With fully synchronous replication, when a master commits a transaction, all slaves also will have committed the transaction before the master returns to the session that performed the transaction. The drawback of this is that there might be a lot of delay to complete a transaction.
- Semisynchronous replication falls between asynchronous and fully synchronous replication. The master waits after commit only until at least one slave has received and logged the events. It does not wait for all slaves to acknowledge receipt, and it requires only receipt, not that the events have been fully executed and committed on the slave side.
More on MySQL replication.
Clustering is a technique by which data can be sharded(divided) into many computers. The main advantage is that more data can be stored in a cluster because its a combination of computers.
Suppose we have one MySQL server with 64GB of disk space i.e., we can have only 64GB of data. Now if we have 10 clustered computers with each 64GB of disk space then we can store 640GB of data.
MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL. Therefore we need to use NDB APIs to access the data, traditional MySQL clients(or APIs) will not work on accessing clustered data. NDB is actually a mixture of standard MySQL server(default engine) and an in-memory storage engine.
A MySQL clustered environment has replication and sharding by default. Cluster is formed by three different kinds of MySQL nodes(mysql programs):
- Data node (ndbd/ndbmtd process): These nodes store the actual data. Tables are automatically sharded across the data nodes which also transparently handle load balancing and replication.
- Management node (ndb_mgmd process): Used for configuration and monitoring of the cluster. They are required only to start or restart a cluster node.
- Application node or SQL node (mysqld process): A MySQL server (mysqld) that connects to all of the data nodes in order to perform data storage and retrieval. This node type is optional; it is possible to query data nodes directly via the NDB API. But if you want to insert data then you have to use application node.
Before writing this article I was trying to find articles on MySQL architecture for beginners. But I didn’t find any. Therefore I wrote a article about it. I tried to make it very simple so that anyone who is completely new to MySQL can understand it. I hope I made everything clear. Leave comments for questions on it. Thanks for reading.