Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Swarm Log Level Management Script (swarmcastor-change-log-level.sh) is designed to manage and dynamically adjust log levels on a DataCore Swarm cluster, providing options to change the log level temporarily and automatically revert it after a specified duration. It supports background execution via screen or tmux, making it ideal for long-running operations that require detachment from the terminal.

...

  • Set and Revert Log Levels: Temporarily change the log level and revert after a specified duration.

  • Flexible JSON Parsing: Uses jq for JSON parsing if available; defaults to grep jq otherwise.

  • Background Execution: Optionally runs in the background using screen or tmux.

  • Log Size Monitoring: Reports the log size generated during the temporary log level change.

  • Countdown Display: Shows a countdown for the specified duration.

...

  • jq (optional): Used for parsing JSON responses; falls back to grep if unavailable.

  • screen or tmux (optional): Required for background execution.

  • Permissions: Ensure sufficient permissions to execute on the DataCore Swarm server and access required files.

Script Usage

Code Block
./swarmcastor-change-log-level.sh -d <swarm<node_ip> -p <admin:password> -i <new_log_level> [-t <duration_in_seconds>] [--background] [-v]

Parameter

Description

-d, --swarm_ip

IP address of the Swarm API endpoint (or set SCSP_HOST environment variable).

-p, --credentials

Admin credentials in the format admin:password.

-i, --log.level

New log level to set (values: 0, 5, 10, 15, 20, 30, 40, 50).

-t, --time

Duration in seconds to keep the new log level (optional).

--backgroundpersistent

Runs the script in a detached session using screen or tmux, allowing continued operation if terminal session ends.

-v, --verbose

Enables verbose mode to display debug information.

...

Example 1: Set log level to 20 and keep it for 10 seconds

Code Block
./swarmcastor-change-log-level.sh -d 192.168.8.84 -p admin:datacore -i 20 -t 10

Example 2: Run in background mode with verbose logging

Code Block
./swarmcastor-change-log-level.sh -d 192.168.8.84 -p admin:datacore -i 20 -t 30 --background -v

...

  1. Log Level Change: Sets the log level to the specified value. If the current log level matches the requested level, the script skips the update.

  2. Countdown: During the specified duration, the script displays a countdown every second.

  3. Revert Log Level: After the countdown, the log level reverts to the initial value.

  4. Log Size Report: Provides details on approximately log size generated during the temporary log level change.

  5. Debug Mode: When -v is specified, debug messages display the script's internal operations.

...

Code Block
[root@scs dist]# ./swarmcastor-change-log-level.sh -p admin:datacore -i 2010 -t 300
Swarm IP: 192.168.1.84
Credentials: [hidden for security]
Cluster Name: gatewayadmindomain

New log level: 510
Current log level is 30.
Updating log level to 510...
Log level changed successfully from 30 → 510.
Keeping log level at 510 for 300 second(s)...
Countdown: 00:00:01 remaining...

Time's up! Reverting log level back to 30...
Approximate 69.4MB new logs were generated at log level 510. Current castor.log size is 371.3MB after 00:05:00.
Log level reverted successfully back to 30.

[root@scs dist]#

...

This script provides administrators with an effective way to adjust and monitor Swarm logging, supporting both temporary and permanent log level changes for troubleshooting and performance monitoring.

Script Source Code

swarmLatest version: castor-change-log-level.sh

Code Block
languagebash
#!/bin/bash
# Written by Milton Suen (milton.suen@datacore.com) Oct 31, 2024
# Revision: Update to support running the script in a backgroundpersistent session using screen or tmux.

# Function to display usage information
usage() {
    echo "Usage: $0 -d swarm_ip -p admin:password -i new_log_level [-t duration_in_seconds] [--backgroundpersistent] [-v]"
    echo "  -d, --swarm_ip           IP address of the Swarm API endpoint (or set SCSP_HOST environment variable)"
    echo "  -p, --credentials        Credentials in the format admin:password"
    echo "  -i, --log.level          New log level to set"
    echo "  -t, --time               Duration in seconds to keep the new log level (optional)"
    echo "  --backgroundpersistent             Run the script in a detached session using screen or tmux"
    echo "  -v, --verbose            Enable verbose mode for debug messages"
    exit 1
}

# Default options
backgroundpersistent=false
verbose=false
output_log="script_output.log"  # Log file for capturing backgroundpersistent session output

# Function to display debug messages if verbose mode is enabled
debug() {
    if $verbose; then
        echo "[DEBUG] $1"
    fi
}

# Function to check if either 'screen' or 'tmux' is installed
check_screen_or_tmux() {
    if ! command -v screen &>/dev/null && ! command -v tmux &>/dev/null; then
        echo "Error: Neither 'screen' nor 'tmux' is installed. Cannot run in backgroundpersistent mode."
        backgroundpersistent=false  # Disable backgroundpersistent session
    fi
}

# Function to format file size
format_size() {
    local size=$1
    if (( size >= 1073741824 )); then
        echo "$(awk "BEGIN {printf \"%.1fGB\", $size/1073741824}")"
    elif (( size >= 1048576 )); then
        echo "$(awk "BEGIN {printf \"%.1fMB\", $size/1048576}")"
    elif (( size >= 1024 )); then
        echo "$(awk "BEGIN {printf \"%.1fKB\", $size/1024}")"
    else
        echo "${size}B"
    fi
}

# Function to format duration
format_duration() {
    local duration=$1
    local hours=$((duration / 3600))
    local minutes=$(( (duration % 3600) / 60 ))
    local seconds=$((duration % 60))
    printf "%02d:%02d:%02d" $hours $minutes $seconds
}

# Function to check if jq is available and set up JSON parsing method
check_jq() {
    if [[ -x "/usr/local/bin/jq" ]]; then
        echo "/usr/local/bin/jq"
    elif [[ -x "$(pwd)/jq" ]]; then
        echo "$(pwd)/jq"
    elif command -v jq &>/dev/null; then
        echo "jq"
    else
        echo "grep"
    fi
}

jq_or_grep=$(check_jq)

# Parse input arguments
while [[ "$#" -gt 0 ]]; do
    case $1 in
        -d|--swarm_ip) swarm_ip="$2"; shift 2 ;;
        -p|--credentials) credentials="$2"; shift 2 ;;
        -i|--log.level) new_log_level="$2"; shift 2 ;;
        -t|--time)
            if [[ -n "$2" && "$2" != -* ]]; then
                duration="$2"
                shift 2
            else
                read -p "Enter duration in seconds: " duration
                shift
            fi
            ;;
        --backgroundpersistent) backgroundpersistent=true; shift ;;
        -v|--verbose) verbose=true; shift ;;
        *) usage ;;
    esac
done

# Check if 'screen' or 'tmux' is installed
check_screen_or_tmux

# If swarm_ip is not provided, try using SCSP_HOST environment variable
if [[ -z "$swarm_ip" ]]; then
    if [[ -n "$SCSP_HOST" ]]; then
        swarm_ip="$SCSP_HOST"
        debug "Using Swarm IP from SCSP_HOST: $swarm_ip"
    else
        echo "Error: swarm_ip not provided and SCSP_HOST is not set."
        usage
    fi
fi

# Check if required arguments are provided
if [[ -z "$credentials" || -z "$new_log_level" ]]; then
    usage
fi

# Retrieve cluster name and handle JSON parsing
debug "Retrieving the cluster name from Swarm API."
if [[ "$jq_or_grep" == "grep" ]]; then
    clusterName=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters" | grep -oP '"name":\s*"\K[^"]+')
else
    clusterName=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters" | "$jq_or_grep" -r '._embedded.clusters[0].name')
fi

if [[ -z "$clusterName" ]]; then
    echo "Failed to retrieve the cluster name. Please check your inputs."
    exit 1
fi
debug "Cluster Name: $clusterName"

# Main logic function to run the script tasks
main_script() {
    local swarm_ip="$1"
    local credentials="$2"
    local new_log_level="$3"
    local duration="$4"
    local clusterName="$5"
    local log_file="/var/log/datacore/castor.log"
    local initial_size=$(stat -c%s "$log_file" 2>/dev/null || echo 0)
    local current_log_level
    local jq_or_grep="$6"

    # Display initial information
    echo "Swarm IP: $swarm_ip"
    echo "Credentials: [hidden for security]"
    echo "Cluster Name: $clusterName"

    debug "Starting main_script function..."

    # Retrieve current log level
    if [[ "$jq_or_grep" == "grep" ]]; then
        current_log_level=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | grep -oP '"value":\s*\K[0-9]+')
    else
        current_log_level=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | "$jq_or_grep" -r '.value')
    fi
    echo ""
    echo "New log level: $new_log_level"
    echo "Current log level is $current_log_level."

    # Skip update if new level matches the current level
    if [[ "$current_log_level" -eq "$new_log_level" ]]; then
        echo ""
        echo "Log level is already set to $new_log_level. No changes made."
        return
    fi

    # Update the log level using PUT
    echo "Updating log level to $new_log_level..."
    response=$(curl --user "$credentials" -sS -X PUT -H "Content-Type: application/json" \
        "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
        -d "{\"value\": $new_log_level}")

    if [[ "$jq_or_grep" == "grep" ]]; then
        updated_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')
    else
        updated_log_level=$(echo "$response" | "$jq_or_grep" -r '.value')
    fi

    if [[ "$updated_log_level" -eq "$new_log_level" ]]; then
        echo "Log level changed successfully from $current_log_level → $new_log_level."
    else
        echo "Failed to update log level. Response: $response"
        exit 1
    fi

    # Countdown and revert log level
    if [[ -n "$duration" && "$duration" -gt 0 ]]; then
        echo "Keeping log level at $new_log_level for $duration second(s)..."
        for ((i=duration; i>0; i--)); do
            printf -v countdown "%02d:%02d:%02d" $((i/3600)) $(( (i%3600) / 60 )) $((i%60))
            echo -ne "Countdown: $countdown remaining...\r"
            sleep 1
        done
        echo -e "\n\nTime's up! Reverting log level back to $current_log_level..."

        response=$(curl --user "$credentials" -sS -X PUT -H "Content-Type: application/json" \
            "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
            -d "{\"value\": $current_log_level}")

        if [[ "$jq_or_grep" == "grep" ]]; then
            reverted_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')
        else
            reverted_log_level=$(echo "$response" | "$jq_or_grep" -r '.value')
        fi

        final_size=$(stat -c%s "$log_file" 2>/dev/null || echo 0)
        size_diff=$(( final_size - initial_size ))
        size_diff_formatted=$(format_size "$size_diff")
        duration_formatted=$(format_duration "$duration")
        echo "Approximate $size_diff_formatted new logs were generated at log level $new_log_level. Current castor.log size is $(format_size "$final_size") after $duration_formatted."

        if [[ "$reverted_log_level" -eq "$current_log_level" ]]; then
            echo "Log level reverted successfully back to $current_log_level."
        else
            echo "Failed to revert log level. Response: $response"
            exit 1
        fi
    else
        echo "Log level change is permanent until manually modified."
    fi
}

# Run in backgroundpersistent or directly
if $background$persistent; then
    # Pass the main_script function to the screen session and store the output in a file
    if command -v screen &>/dev/null; then
        screen -dmS indexer_script bash -c "$(declare -f main_script format_size format_duration check_jq debug); main_script \"$swarm_ip\" \"$credentials\" \"$new_log_level\" \"$duration\" \"$clusterName\" \"$jq_or_grep\" | tee \"$output_log\""
        screen -r indexer_script
    elif command -v tmux &>/dev/null; then
        tmux new-session -d -s indexer_script "$(declare -f main_script format_size format_duration check_jq debug); main_script \"$swarm_ip\" \"$credentials\" \"$new_log_level\" \"$duration\" \"$clusterName\" \"$jq_or_grep\" | tee \"$output_log\""
        tmux attach-session -t indexer_script
    else
        echo "Error: Neither screen nor tmux available. Run without --backgroundpersistent."
        exit 1
    fi

    # Wait for the screen session to complete and then display the output log
    sleep 1
    while screen -list | grep -q "indexer_script"; do
        sleep 1
    done

    echo ""
    cat "$output_log"
else
    main_script "$swarm_ip" "$credentials" "$new_log_level" "$duration" "$clusterName" "$jq_or_grep" | tee "$output_log"
fi

...