What is etcd and How it Works with Kubernetes?

Learn via video courses
Topics Covered

Etcd in kubernetes is a distributed key-value store that is the primary data store for Kubernetes cluster configuration and state. It ensures consistency and provides a reliable and highly available storage solution for storing critical cluster information such as API objects, service discovery, and configuration data. etcd plays a crucial role in maintaining the stability and resilience of Kubernetes clusters.

What is etcd?

Etcd in Kubernetes is an open-source distributed key-value store that provides a reliable and highly available storage solution. It is often used as the primary data store in distributed systems and is a fundamental building block for distributed applications. etcd uses the Raft consensus algorithm to ensure data consistency and fault tolerance i.e. one node is elected as leader, while the remaining nodes are designated as followers. It is known for its simplicity, performance, and scalability, making it a popular choice for managing critical data in distributed systems.

Why do we use it?

There are several reasons why etcd is widely used in Kubernetes, including:

  • Distributed Data Storage: Etcd offers a distributed key-value store that allows applications and services to store and retrieve data in a fault-tolerant and highly available manner. It provides a reliable storage solution for critical configuration data, state information, and coordination between components in a distributed system.
  • Consistency and Coordination: Etcd uses the Raft consensus algorithm to ensure strong consistency among distributed nodes. This means that all nodes within a cluster agree on the current state of the data, providing a reliable source of truth. etcd enables coordination between distributed components, allowing them to synchronize and operate coherently.
  • Fault Tolerance and High Availability: Etcd is designed to be resilient and fault-tolerant. It replicates data across multiple nodes, ensuring that even if a node fails, the data remains accessible and the system continues to operate without disruption. This fault tolerance and high availability make etcd suitable for critical applications that require continuous operation.
  • Dynamic Configuration Updates: Distributed systems often require dynamic configuration updates to adapt to changing requirements and scale resources. etcd provides efficient and real-time configuration updates, allowing distributed components to coordinate and synchronize their configurations seamlessly. This flexibility enables the system to respond dynamically to changes without downtime or manual intervention. Let's understand this with an example:
    • Initial Configuration: Suppose initially, you deploy your application with a specific configuration in etcd. For instance, you set the number of replicas to 3, the maximum concurrent connections to 100, and the timeout value to 30 seconds. All microservice instances are running with this configuration.
    • Dynamic Update: Now, let's say you need to scale up your application due to increased demand. You want to change the number of replicas to 5, increase the maximum concurrent connections to 150, and set the timeout value to 45 seconds.
    • With etcd, you can easily update the configuration dynamically. You make the changes directly to the relevant keys in etcd using the etcd API or Kubernetes APIs that internally interact with etcd. For example, you might use kubectl to update the number of replicas in a Deployment object: kubectl scale deployment my-app --replicas=5
    • In our example, the Deployment Controller receives the notification about the updated number of replicas. It then initiates the necessary actions to create two additional microservice instances, scaling the application to 5 replicas. Simultaneously, the kubelet on each node receives the updated configuration for the maximum concurrent connections and timeout values and adjusts the corresponding settings for each microservice instance accordingly.
    • This process happens in real-time, enabling your distributed application to adapt to changing requirements without downtime or manual intervention.
  • Service Discovery and Coordination: Etcd supports service discovery by providing a distributed key-value store where services can dynamically register their information and discover other services. This simplifies locating and communicating with services within a distributed system, enabling efficient coordination and integration between different components.

How to use etcd with Kubernetes?

To use etcd with Kubernetes, you must understand its role and how it fits into the ecosystem. Etcd is a consistent and highly-available key-value store, acting as Kubernetes backing store for all cluster data. It stores critical information, such as configuration data, state data, and metadata, for Kubernetes operations. To use Etcd in Kubernetes, follow the given steps:

1. Set Up Prerequisites:

  • Set up a functional Kubernetes cluster with kubectl configured to communicate with it.
  • Ensure that etcd is installed and running as a consistent and highly available key-value store for all cluster data.
  • Consider using the recommended etcd versions 3.4.22+ and 3.5.6+ for production use.

2. Run etcd as a Cluster:

  • It's essential to run etcd as a multi-node cluster for production environments to ensure durability and high availability.
  • You can start a single-node etcd cluster for testing purposes by running the following command:
  • To start a multi-node etcd cluster, you need to run etcd with multiple nodes, ensuring they communicate with each other:

3. Data Consistency and Reliability:

  • Etcd maintains data consistency using the Raft consensus algorithm. A leader replicates data to other nodes (followers) in a multi-node cluster to ensure consistency.
  • It's crucial to avoid resource starvation in etcd clusters. Ensure dedicated machines or isolated environments for etcd to meet its resource requirements and maintain stability.

4. Backup and Resource Requirements:

  • It's vital to have a backup plan for etcd data for production deployments. Regularly back up etcd data to ensure data integrity and disaster recovery.
  • Deploying etcd in production requires advanced hardware configuration to meet resource requirements. Ensure you are aware of the resource requirements before deploying in production.

How does it work?

Etcd utilizes the Raft algorithm to balance strong consistency and high availability in a distributed system. The Raft algorithm solves the distributed consensus problem, which involves multiple independent processes agreeing on a single value for something.

Here's how etcd works with the Raft algorithm:

1. Raft Protocol and Leader Election:

  • All nodes are in a follower state during the formation of an etcd cluster.
  • Raft elects a leader among the nodes, and all write requests are directed to the leader.
  • If the leader node goes offline, a new leader is elected through an election process.
  • A majority of nodes must agree to elect a new leader, ensuring the consistency and availability of the cluster.

2. Data Writes:

  • Clients can send write requests to any etcd node in the cluster.
  • The write is performed and replicated to other nodes if the client communicates with the leader node.
  • If the client chooses a non-leader node, the write request is forwarded to the leader, and the process is the same.
  • Once the write succeeds, an acknowledgment is sent back to the client.

3. Data Reads:

  • Read requests follow a similar path as write requests but can be optimized to be performed by replica nodes at the expense of linearizability (strong consistency).
  • Read requests can be served by followers, which might have slightly stale data compared to the leader.

4. Achieving Availability:

  • Etcd clusters achieve availability as long as most nodes remain online.
  • For example, in a 3-node cluster, at least two nodes must be online for the cluster to stay available.
  • Adding an odd number of nodes to the cluster is preferred since it improves fault tolerance without impacting availability.
  • Typical production etcd clusters often have 3 or 5 nodes, which balances availability and performance.

FAQs

Q. Does Kubernetes still use etcd?

A. Etcd in kubernetes is used as its default storage backend for storing cluster configuration data, state, and metadata. Etcd provides reliable and highly available storage for Kubernetes operations.

Q. How is data stored in etcd?

A. Data in etcd is stored as key-value pairs in a hierarchically organized directory structure, similar to a standard filesystem. It is a consistent and distributed key-value store accessible by a distributed system or cluster of machines. etcd utilizes the Raft consensus algorithm to ensure data consistency, replication across all nodes, and high availability, making it suitable for critical data storage in distributed systems.

Q. Why do we use etcd?

A. We use Etcd in kubernetes for reliable and highly available data storage. It ensures strong consistency and coordination between components using the Raft consensus algorithm. Etcd is a fundamental building block, providing fault tolerance and dynamic configuration updates for critical data in distributed applications.

Conclusion

  • Etcd is an open-source, distributed key-value store used in Kubernetes for reliable and highly available storage. It uses the Raft consensus algorithm to ensure strong data consistency and fault tolerance.
  • Etcd in kubernetes is crucial for storing critical information like configuration data, state data, and metadata for Kubernetes operations.
  • Reasons for using etcd in Kubernetes include distributed data storage, consistency, coordination, fault tolerance, and dynamic configuration updates.
  • To use etcd in Kubernetes, set up a multi-node etcd cluster for production environments, and ensure data consistency and reliability.
  • To replace etcd with an SQL database in k3s, use the Kine project as a shim that translates etcd API calls into SQL queries.