---
title: How to troubleshoot errors in etcd
sidebar_label: Troubleshoot errors in etcd
---

This guide describes several error conditions that can be encountered with etcd and provides mitigations for them.

## `etcdctl` Command Overview

The standard command for interacting with etcd is [`etcdctl`](https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md). This command requires several settings or options to find and authenticate with the etcd nodes.

In a Deephaven environment, the `etcdctl.sh` script, located in `/usr/illumon/latest/bin`, simplifies the process by automatically setting most required options. To connect, you need access to the appropriate [configuration files](../../sys-admin/configuration/configuration-file-locations-overview.md#configuration-in-etcd), which are stored in `/etc/sysconfig/illumon.d/etcd/client`. While this document uses the `root` configuration for examples, non-root configurations are also available for users without root permissions.

> [!IMPORTANT]
> `/etc/sysconfig/illumon.d/etcd/client/root` represents the etcd root user, and may not be the operating system root user. If appropriate, use `sudo -u irisadmin` (or whatever user owns the files) instead of `sudo` in the commands below.

```bash
sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh <etcdctl command options>
```

This command will print a table of your etcd configuration:

```bash
sudo -u irisadmin DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/schema /usr/illumon/latest/bin/etcdctl.sh endpoint status -w table
```

```
+---------------------------+------------------+---------+---------+-----------+-----------+------------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+---------------------------+------------------+---------+---------+-----------+-----------+------------+
| https://10.128.13.53:2379 | 20f1fe672cdca01d |   3.5.12 |   17 MB |      true |     23854 |      48441 |
| https://10.128.13.54:2379 | 8cdf5ce8a296848f |   3.5.12 |   17 MB |     false |     23854 |      48442 |
| https://10.128.13.55:2379 | 4ea3e72f6e028887 |   3.5.12 |   17 MB |     false |     23854 |      48443 |
+---------------------------+------------------+---------+---------+-----------+-----------+------------+
```

> [!NOTE]
> The output formats specified by the `-w` flag render the node ID in different ways. You can choose the format that best suits your needs.
>
> For instance, using `-w json` returns the ID fields as long integers in JSON format:
>
> ```bash
> sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh endpoint status -w json
> ```
>
> Output:
>
> ```
> [{"Endpoint":"https://10.128.13.53:2379","Status":{"header":{"cluster_id":9262865042670673362,"member_id":2373958197688705053,"revision":13864,"raft_term":23854},"version":"3.5.12","dbSize":17051648,"leader":2373958197688705053,"raftIndex":48449,"raftTerm":23854}},{"Endpoint":"https://10.128.13.54:2379","Status":{"header":{"cluster_id":9262865042670673362,"member_id":10150934239346328719,"revision":13864,"raft_term":23854},"version":"3.5.12","dbSize":17022976,"leader":2373958197688705053,"raftIndex":48450,"raftTerm":23854}},{"Endpoint":"https://10.128.13.55:2379","Status":{"header":{"cluster_id":9262865042670673362,"member_id":5666626947057354887,"revision":13864,"raft_term":23854},"version":"3.5.12","dbSize":17031168,"leader":2373958197688705053,"raftIndex":48451,"raftTerm":23854}}]
> ```

## Running out of space in a node

When etcd exceeds its configured storage space, it can be challenging to identify and resolve the issue. The tools presented in this section can be useful for addressing a variety of issues. However, examples will demonstrate the troubleshooting process with the following error:

```text
io.grpc.StatusRuntimeException: RESOURCE_EXHAUSTED: etcdserver: mvcc: database space exceeded
```

### Investigate the condition

If one or more nodes in the etcd cluster are out of space, you will see an output similar to this when running the `alarm list` command:

```bash
sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh alarm list
```

Output:

```
memberID:3254910096547498518 alarm:NOSPACE
memberID:13807399138998277405 alarm:NOSPACE
memberID:13160873893432754734 alarm:NOSPACE
```

### Clear the error condition

You can clear the alarms with `alarm disarm`:

```bash
sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh alarm disarm
```

Note that this clears the current alarms but does not address their root cause. The alarm will return unless the underlying issue is resolved.

### Check compaction settings

Every change in etcd creates a new revision, which can be used to retrieve previous key values. To prevent storage space from increasing indefinitely, this history needs to be compacted periodically.

> [!NOTE]
> Periodically [defragmenting etcd nodes](../management/defragment-etcd-node.md) is also recommended, especially if running out of storage space is an issue.

The default configuration file includes these lines:

```text
auto-compaction-mode: periodic
auto-compaction-retention: "168"
```

This means that etcd will automatically compact every hour (implied by the periodic mode) and remove all versions older than 168 hours (1 week). If your system frequently exceeds database space, you can shorten this time period or change the mode.

The default configuration file is `/etc/etcd/dh/latest/config.yaml` on the nodes running etcd. Note that this is a symbolic link to one of several configuration files. You will need to edit all the configuration files, distribute them to all etcd nodes, and restart the etcd processes to apply the changes.

### Compact now

You can compact history immediately instead of waiting for the periodic compaction.

**Find the current revision:**

```bash
sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh endpoint status -w fields

"ClusterID" : 9262865042670673362
"MemberID" : 2373958197688705053
"Revision" : 13864
"RaftTerm" : 23854
"Version" : "3.5.12"
"DBSize" : 17051648
"Leader" : 2373958197688705053
"RaftIndex" : 48477
"RaftTerm" : 23854
"Endpoint" : "https://10.128.13.53:2379"
...
```

Use the value of the "Revision" field minus one, and fill it in below.

#### Compact away all old revisions

```bash
sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh compact 13863

compacted revision 13863
```

#### Verify the system accepts changes again

```bash
sudo DH_ETCD_DIR=/etc/sysconfig/illumon.d/etcd/client/root /usr/illumon/latest/bin/etcdctl.sh check perf
```

This command makes changes, and you should see the revision number increase. You can also verify a change by running any command that modifies etcd.

### Increase the maximum database size

The default maximum size is 2 GB. You can increase this by adding the following setting to the configuration file (typically in `/etc/etcd/dh/latest/config.yaml`):

```yaml
quota-backend-bytes: 8589934592
```

An 8 GB limit is recommended, but larger values are supported.

The current setting for this value can be found in the metrics that etcd publishes.

After updating the settings, you must restart etcd with `systemctl restart dh-etcd`.

Use the `etcdctl` commands mentioned above to find your addresses, then use `curl` to get the metrics:

```bash
curl -k https://10.200.46.148:2379/metrics | grep etcd_server_quota_backend_bytes
```

## Related documentation

- [Configuration files](../../sys-admin/configuration/configuration-file-locations-overview.md#configuration-in-etcd)
- [Defragment etcd nodes](../../sys-admin/management/defragment-etcd-node.md)
- [Introduction to etcd](../../sys-admin/core-components/etcd.md)
- [Replace an etcd node](../../sys-admin/management/replace-etcd-node.md)
- [etcdctl](https://github.com/etcd-io/etcd/blob/main/etcdctl/README.md)
