---
id: runbook-monit
title: monit runbook
---

[monit](https://mmonit.com/monit/) is a third-party process supervision tool used to manage Deephaven services on traditional installations and Podman deployments. It monitors processes, automatically restarts failed services, and provides a control interface for starting, stopping, and restarting Deephaven processes. monit is not used in Kubernetes deployments, where pod management is handled by Kubernetes itself.

<!--TODO: remove in sanluis-->

## Impact of monit failure

| Level            | Impact                                                                                                                                                                                                   |
| :--------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Sev 2 - Moderate | Process supervision and automatic restart capabilities are lost. Services continue running but will not automatically recover from failures. Manual intervention required to start or restart processes. |

> [!NOTE]
> monit failure does not stop running Deephaven processes. They continue functioning until manually stopped or until they fail, at which point they will not automatically restart.

## monit dependencies

monit has no dependencies on other services. It is one of the first processes to start on a Deephaven host.

**monit manages:**

- All Deephaven Java processes (Configuration Server, Authentication Server, etc.).
- etcd (if installed as part of Deephaven).
- Other custom services defined in monit configuration.

## Checking monit status

Check monit service is running:

```bash
sudo systemctl status monit
```

Expected output should show `active (running)`.

View summary of all monitored processes:

```bash
dh_monit summary
```

View detailed status:

```bash
dh_monit status
```

Check specific process:

```bash
dh_monit status configuration_server
```

## Viewing monit logs

View monit log:

```bash
cat /var/log/deephaven/monit/monit.log
```

Tail the log to follow in real-time:

```bash
tail -f /var/log/deephaven/monit/monit.log
```

View systemd journal for monit:

```bash
sudo journalctl -u monit -f
```

## Restart procedure

Restart monit:

```bash
sudo systemctl restart monit
```

> [!CAUTION]
> Restarting monit does NOT restart monitored processes. They continue running but are temporarily unmonitored during the restart.

Verify the restart was successful:

```bash
sudo systemctl status monit
```

Check all monitored processes are still tracked:

```bash
dh_monit summary
```

## Reload monit configuration

After modifying monit configuration files:

```bash
dh_monit reload
```

This reloads configuration without restarting monit or any monitored processes.

## Managing processes with monit

### Start a process

```bash
dh_monit start configuration_server
```

### Stop a process

```bash
dh_monit stop configuration_server
```

### Restart a process

```bash
dh_monit restart configuration_server
```

### Start all processes

```bash
dh_monit start all
```

### Stop monitoring a process (without stopping it)

```bash
dh_monit unmonitor configuration_server
```

### Resume monitoring

```bash
dh_monit monitor configuration_server
```

## monit configuration

monit configuration consists of:

**Main configuration:** `/etc/monitrc` (overridable via `DH_MONIT_RC` environment variable)

**Deephaven-specific configuration:** `/etc/sysconfig/illumon.d/monit/`

### Main monit configuration

Key settings in main configuration file:

```bash
# Check interval
set daemon 60  # Check every 60 seconds

# Log file
set log /var/log/deephaven/monit/monit.log

# State file
set statefile /var/lib/deephaven/monit/monit.state

# Include Deephaven process configurations
include /etc/sysconfig/illumon.d/monit/*.conf
```

### Process configuration files

Each Deephaven process has its own configuration file:

**Example:** `/etc/sysconfig/illumon.d/monit/configuration_server.conf`

```bash
check process configuration_server
    with pidfile /var/run/deephaven/configuration_server.pid
    start program = "/usr/illumon/latest/bin/start-service configuration_server"
        as uid irisadmin gid dbquery
    stop program = "/usr/illumon/latest/bin/stop-service configuration_server"
        as uid irisadmin gid dbquery
    if failed port 22023 then restart
    if 5 restarts within 5 cycles then timeout
```

Key elements:

- **pidfile** — Location of process PID file.
- **start program** — Command to start process.
- **stop program** — Command to stop process.
- **health checks** — Port checks, resource limits.
- **restart policy** — When and how to restart.

## Process startup order

monit can enforce process startup order through dependencies:

**Example dependency chain:**

1. etcd (no dependencies).
2. Configuration Server (depends on etcd).
3. Authentication Server (depends on Configuration Server).
4. Other services (depend on Configuration Server, Authentication Server).

**Configuration:**

```bash
check process authentication_server
    depends on configuration_server
    start program = "..."
```

monit will not start a process until its dependencies are running.

## Disabling and enabling processes

### Temporarily disable a service

Prevent monit from starting a process:

```bash
# Stop and unmonitor
dh_monit stop configuration_server
dh_monit unmonitor configuration_server
```

### Permanently disable a service

Rename configuration file:

```bash
cd /etc/sysconfig/illumon.d/monit
mv configuration_server.conf configuration_server.conf.disabled
dh_monit reload
```

### Re-enable a service

Rename configuration file back:

```bash
cd /etc/sysconfig/illumon.d/monit
mv configuration_server.conf.disabled configuration_server.conf
dh_monit reload
dh_monit start configuration_server
```

## monit user and permissions

monit runs as root but executes process commands as specified users:

**monit daemon:** Runs as `root` (required for process supervision).

**Process management:** Commands run as `irisadmin` user.

**Deephaven processes:** Run as various users (`irisadmin`, `dbquery`, `dbmerge`).

**Control commands:** Should be run as `irisadmin`:

```bash
dh_monit status
```

## Testing monit configuration

Before reloading or restarting:

```bash
# Test configuration syntax
sudo monit -t

# Verbose test
sudo monit -t -v
```

Expected output: `Control file syntax OK`

## Monitoring monit itself

Ensure monit is always running:

### systemd management

Enable monit to start on boot:

```bash
sudo systemctl enable monit
```

Check if enabled:

```bash
sudo systemctl is-enabled monit
```

## Configuration files and locations

**systemd service:** Managed by `systemd`

**Service control:** `systemctl {start|stop|restart|status} monit`

**Main configuration:** `/etc/monitrc` (overridable via `DH_MONIT_RC` environment variable)

**Process configurations:** `/etc/sysconfig/illumon.d/monit/*.conf`

**Log file:** `/var/log/deephaven/monit/monit.log`

**State file:** `/var/lib/deephaven/monit/monit.state`

**PID file:** `/var/run/monit.pid`

## Common monit commands

```bash
# View all processes
dh_monit summary

# Detailed status
dh_monit status

# Start all processes
dh_monit start all

# Stop all processes
dh_monit stop all

# Restart specific process
dh_monit restart configuration_server

# Reload configuration
dh_monit reload

# Test configuration
sudo monit -t

# Unmonitor all (stop supervision without stopping processes)
dh_monit unmonitor all

# Monitor all (resume supervision)
dh_monit monitor all

# Start all processes and wait for them to be online
dh_monit up

# Stop all processes
dh_monit down

# Report counts of running vs. total monitored processes
dh_monit report
```

## Related documentation

- [Process restart guide](../architecture/resilience-planning/process-restart-guide.md)
- [System processes overview](../architecture/architecture-overview.md)
- [Process runbooks](runbooks.md)
- [Official monit documentation](https://mmonit.com/monit/documentation/)
