Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Managing Log Levels in Swarm Cluster Using a Shell Script

This script The Swarm Log Level Management Script (castor-change-log-level.sh) is designed to manage and dynamically adjust the log level of levels on a DataCore Swarm cluster. The script can set , providing options to change the log level permanently or temporarily (for temporarily and automatically revert it after a specified duration), after which it reverts to the original log level. It supports background execution via screen or tmux, making it ideal for long-running operations that require detachment from the terminal.

Script

...

This script enables administrators to:

...

Features

  • Set and Revert Log Levels: Temporarily change the log level of a Swarm cluster permanently or temporarily.

  • Monitor log file growth after setting a higher log level for diagnostics.

  • Revert the log level to its original setting automatically after a defined period.

Script Usage

Code Block
./log_level_script.sh -d SWARM_IP -p ADMIN:PASSWORD -i LOG_LEVEL [-t DURATION]
  • Parameters:

    • -d / --swarm_ip: IP address of the Swarm API endpoint (or set SCSP_HOST environment variable) (Required)

    • -p / --credentials: Administrator credentials in admin:password format. (Required)

    • -i / --log.level: Desired log level to apply. (Required)

    • -t / --time: Duration in seconds to keep the new log level before reverting to the previous setting. If omitted, the change is permanent. (Optional)

Prerequistites

  • Access to the Swarm storage nodes and valid credentials

  • Verify network access to the Swarm storage nodes from the machine where the script runs.

Instruction for Use

  • Setting the Log Level Permanently To make a permanent change, omit the -t parameter:

    Code Block
    ./log_level_script.sh -d 192.168.1.100 -p admin:password -i 15

    This sets the log level to 15 permanently until you manually change it.

  • Setting the Log Level Temporarily Specify a duration (in seconds) with the -t parameter to revert automatically after a defined period:

    Code Block./log_level_script

    and revert after a specified duration.

  • Flexible JSON Parsing: Uses jq for JSON parsing if available; defaults to jq otherwise.

  • Background Execution: Optionally runs in the background using screen or tmux.

  • Log Size Monitoring: Reports the log size generated during the temporary log level change.

  • Countdown Display: Shows a countdown for the specified duration.

Requirements

  • jq (optional): Used for parsing JSON responses; falls back to grep if unavailable.

  • screen or tmux (optional): Required for background execution.

  • Permissions: Ensure sufficient permissions to execute on the DataCore Swarm server and access required files.

Script Usage

Code Block
./castor-change-log-level.sh -d <node_ip> -p <admin:password> -i <new_log_level> [-t <duration_in_seconds>] [--background] [-v]

Parameter

Description

-d, --swarm_ip

IP address of the Swarm API endpoint (or set SCSP_HOST environment variable).

-p, --credentials

Admin credentials in the format admin:password.

-i, --log.level

New log level to set (values: 10, 15, 20, 30, 40, 50).

-t, --time

Duration in seconds to keep the new log level (optional).

--persistent

Runs the script in a detached session using screen or tmux, allowing continued operation if terminal session ends.

-v, --verbose

Enables verbose mode to display debug information.

Instruction for Use

Example 1: Set log level to 20 and keep it for 10 seconds

Code Block
./castor-change-log-level.sh -d 192.168.

...

8.

...

84 -p admin:

...

datacore -i 

...

20 -t 

...

In this example, the log level is set to 15 and reverts to the original level after 600 seconds (5 minutes).

...

Monitoring Log File Size

  • The script identifies the log file (castor.log) in either /var/log/caringo/ or /var/log/datacore/.

  • Initial file size is shown before setting the new log level.

  • Final file size and size difference are displayed after the temporary duration ends, indicating logs generated during this period.

...

Output Details

  • The script displays Swarm IP, log file location, initial and final log file sizes, and the cluster name.

  • For temporary log levels, a countdown timer displays time remaining before reverting.

  • Once completed, it provides the approximate amount of logs generated, duration, and confirms reversion to the original log level.

Example Output

Code Block
./set_swarm_log_level.sh -p admin:datacore -i 15 -t 600

Using Swarm IP from SCSP_HOST: 192.168.1.100
Swarm IP: 192.168.1.100
Credentials: [hidden for security]
Cluster Name: Cluster_01
Log file located at: /var/log/datacore/castor.log
Initial log file size: 10.5MB

Retrieving the current log level...
New log level: 15
Current log level is 30.
Updating log level to 15...
Log level changed successfully from 30 → 15.
Keeping log level at 15 for 600 second(s)...

Time's up! Reverting log level back to 30...
Approximate 1.2MB new logs were generated at log level 15. Current castor.log size is 11.7MB after 00:10:00.

Log level reverted successfully back to 30.

Error Handling

  • Missing Parameters: Missing parameters prompt a usage message.

  • Invalid Duration: If a non-numeric duration is provided, you’re prompted to enter a valid duration in seconds.

  • Connection Issues: If unable to connect to the Swarm API, check the IP, credentials, and network access.

Notes

  • Credentials are masked in the output for security.

  • Log file sizes are shown in human-readable format (GB, MB, KB, B).

This script provides administrators with an effective way to adjust and monitor Swarm logging, supporting both temporary and permanent log level changes for troubleshooting and performance monitoring.

Script Source Code

set_swarm_log_level.sh

Code Block
languagebash
#!/bin/bash

# Function to display usage information
usage() {
    echo "Usage: $0 -d swarm_ip -p admin:password -i new_log_level [-t duration_in_seconds]"
    echo "  -d, --swarm_ip10

Example 2: Run in background mode with verbose logging

Code Block
./castor-change-log-level.sh -d 192.168.8.84 -p admin:datacore -i 20 -t 30 --background -v

Behavior

  1. Log Level Change: Sets the log level to the specified value. If the current log level matches the requested level, the script skips the update.

  2. Countdown: During the specified duration, the script displays a countdown every second.

  3. Revert Log Level: After the countdown, the log level reverts to the initial value.

  4. Log Size Report: Provides approximately log size generated during the temporary log level change.

  5. Debug Mode: When -v is specified, debug messages display the script's internal operations.

Output Messages

Message

Description

Swarm IP:

Displays the specified Swarm IP address.

Credentials:

Credentials are masked for security.

Cluster Name:

Displays the cluster name retrieved from the Swarm API.

New log level:

Shows the new log level requested.

Current log level:

Displays the current log level.

Updating log level to X...

Indicates the beginning of the log level update process.

Log level changed successfully...

Confirms that the log level was successfully updated.

Keeping log level at X for Y...

Shows the temporary period for which the new log level is retained, with a countdown.

Time's up! Reverting log level...

Indicates that the temporary period has ended and the script is reverting the log level.

Approximate X new logs generated...

Provides information on the amount of logging activity generated during the temporary log level.

Example Output

Code Block
[root@scs dist]# ./castor-change-log-level.sh -p admin:datacore -i 10 -t 300
Swarm IP: 192.168.1.84
Credentials: [hidden for security]
Cluster Name: gatewayadmindomain

New log level: 10
Current log level is 30.
Updating log level to 10...
Log level changed successfully from 30 → 10.
Keeping log level at 10 for 300 second(s)...
Countdown: 00:00:01 remaining...

Time's up! Reverting log level back to 30...
Approximate 69.4MB new logs were generated at log level 10. Current castor.log size is 371.3MB after 00:05:00.
Log level reverted successfully back to 30.

[root@scs dist]#

Error Handling

  • Missing Parameters: Missing parameters prompt a usage message.

  • Invalid Duration: If a non-numeric duration is provided, you’re prompted to enter a valid duration in seconds.

  • Connection Issues: If unable to connect to the Swarm API, check the IP, credentials, and network access.

Notes

  • Credentials are masked in the output for security.

  • Log file sizes are shown in human-readable format (GB, MB, KB, B).

This script provides administrators with an effective way to adjust and monitor Swarm logging, supporting both temporary and permanent log level changes for troubleshooting and performance monitoring.

Script Source Code

Latest version: castor-change-log-level.sh

Code Block
languagebash
#!/bin/bash
# Written by Milton Suen (milton.suen@datacore.com) Oct 31, 2024
# Revision: Update to support running the script in a persistent session using screen or tmux.

# Function to display usage information
usage() {
    echo "Usage: $0 -d swarm_ip -p admin:password -i new_log_level [-t duration_in_seconds] [--persistent] [-v]"
    echo "  -d, --swarm_ip           IP address of the Swarm API endpoint (or set SCSP_HOST environment variable)"
    echo "  -p, --credentials        Credentials in the format admin:password"
    echo "  -i, --log.level          New log level to set"
    echo "  -t, --time               Duration in seconds to keep the new log level (optional)"
    echo "  --persistent             Run the script in a detached session using screen or tmux"
    echo "  -v, --verbose            Enable verbose mode for debug messages"
    exit 1
}

# Default options
persistent=false
verbose=false
output_log="script_output.log"  # Log file for capturing persistent session output

# Function to display debug messages if verbose mode is enabled
debug() {
    if $verbose; then
        echo "[DEBUG] $1"
    fi
}

# Function to check if either 'screen' or 'tmux' is installed
check_screen_or_tmux() {
    if ! command -v screen &>/dev/null && ! command -v tmux &>/dev/null; then
        echo "Error: Neither 'screen' nor 'tmux' is installed. Cannot run in persistent mode."
        persistent=false  # Disable persistent session
    fi
}

# Function to format file size
format_size() {
    local size=$1
    if (( size >= 1073741824 )); then
        echo "$(awk "BEGIN {printf \"%.1fGB\", $size/1073741824}")"
    elif (( size >= 1048576 )); then
        echo "$(awk "BEGIN {printf \"%.1fMB\", $size/1048576}")"
    elif (( size >= 1024 )); then
        echo "$(awk "BEGIN {printf \"%.1fKB\", $size/1024}")"
    else
        echo "${size}B"
    fi
}

# Function to format duration
format_duration() {
    local duration=$1
    local hours=$((duration / 3600))
    local minutes=$(( (duration % 3600) / 60 ))
    local seconds=$((duration % 60))
    printf "%02d:%02d:%02d" $hours $minutes $seconds
}

# Function to check if jq is available and set up JSON parsing method
check_jq() {
    if [[ -x "/usr/local/bin/jq" ]]; then
        echo "/usr/local/bin/jq"
    elif [[ -x "$(pwd)/jq" ]]; then
        echo "$(pwd)/jq"
    elif command -v jq &>/dev/null; then
        echo "jq"
    else
        echo "grep"
    fi
}

jq_or_grep=$(check_jq)

# Parse input arguments
while [[ "$#" -gt 0 ]]; do
    case $1 in
        -d|--swarm_ip) swarm_ip="$2"; shift 2 ;;
        -p|--credentials) credentials="$2"; shift 2 ;;
        -i|--log.level) new_log_level="$2"; shift 2 ;;
        -t|--time)
            if [[ -n "$2" && "$2" != -* ]]; then
                duration="$2"
           IP address of the Swarm APIshift endpoint2
(or set SCSP_HOST environment variable)"     echo "  -p, --credentialselse
        Credentials in the format admin:password"    read echo-p "Enter duration -i, --log.level  in seconds: " duration
       New log level to set"     echoshift
"  -t, --time         fi
     Duration in seconds to keep the new log;;
level (optional)"     exit 1 }

# Function to format file size
format_size() {--persistent) persistent=true; shift ;;
         local size=$1
-v|--verbose) verbose=true; shift ;;
   if (( size >= 1073741824 *)) usage ;;
then    esac
done

# Check echo "$(awk "BEGIN {printf \"%.1fGB\", $size/1073741824}")"
    elif (( size >= 1048576 )); then
        echo "$(awk "BEGIN {printf \"%.1fMB\", $size/1048576}")"
    elif (( size >= 1024 )); thenif 'screen' or 'tmux' is installed
check_screen_or_tmux

# If swarm_ip is not provided, try using SCSP_HOST environment variable
if [[ -z "$swarm_ip" ]]; then
    if [[ -n "$SCSP_HOST" ]]; then
        swarm_ip="$SCSP_HOST"
        echodebug "$(awk "BEGIN {printf \"%.1fKB\", $size/1024}")"Using Swarm IP from SCSP_HOST: $swarm_ip"
    else
        echo "${size}B"
    fi
}

# Function to format duration
format_duration() {Error: swarm_ip not provided and SCSP_HOST is not set."
        usage
    localfi
duration=$1fi

# Check if local hours=$((duration / 3600))
    local minutes=$(( (duration % 3600) / 60 ))
    local seconds=$((duration % 60))
    printf "%02d:%02d:%02d" $hours $minutes $seconds
}

# Parse input arguments
while [[ "$#" -gt 0 ]]; do
    case $1 in
        -d|--swarm_ip) swarm_ip="$2"; shift ;;
        -p|--credentials) credentials="$2"; shift ;;
        -i|--log.level) new_log_level="$2"; shift ;;
        -t|--time)
            if [[ -n "$2" && "$2" != -* ]]; then
                duration="$2"
                shift
            else
                read -p "Enter duration in seconds: " duration
            fi
            ;;
        *) usage ;;
    esac
    shift
done

# If swarm_ip is not provided, try using SCSP_HOST environment variable
if [[ -z "$swarm_ip" ]]; thenrequired arguments are provided
if [[ -z "$credentials" || -z "$new_log_level" ]]; then
    usage
fi

# Retrieve cluster name and handle JSON parsing
debug "Retrieving the cluster name from Swarm API."
if [[ "$jq_or_grep" == "grep" ]]; then
    clusterName=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters" | grep -oP '"name":\s*"\K[^"]+')
else
    clusterName=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters" | "$jq_or_grep" -r '._embedded.clusters[0].name')
fi

if [[ -z "$clusterName" ]]; then
    echo "Failed to retrieve the cluster name. Please check your inputs."
    exit 1
fi
debug "Cluster Name: $clusterName"

# Main logic function to run the script tasks
main_script() {
    local swarm_ip="$1"
    local credentials="$2"
    local new_log_level="$3"
    local duration="$4"
    local clusterName="$5"
    local log_file="/var/log/datacore/castor.log"
    local initial_size=$(stat -c%s "$log_file" 2>/dev/null || echo 0)
    local current_log_level
    local jq_or_grep="$6"

    # Display initial information
    echo "Swarm IP: $swarm_ip"
    echo "Credentials: [hidden for security]"
    echo "Cluster Name: $clusterName"

    debug "Starting main_script function..."

    # Retrieve current log level
    if [[ -n "$SCSP_HOST "$jq_or_grep" == "grep" ]]; then
        current_log_level=$(curl --user "$credentials"  swarm_ip="$SCSP_HOST"
        echo "Using Swarm IP from SCSP_HOST: $swarm_ip"-sS "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | grep -oP '"value":\s*\K[0-9]+')
    else
        current_log_level=$(curl --user "$credentials" echo-sS "Error: swarm_ip not provided and SCSP_HOST is not set."
http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | "$jq_or_grep" -r '.value')
    fi
  usage  echo ""
 fi fi  #echo Check"New iflog required arguments are provided
if [[ -z "$credentials" || -z "$newlevel: $new_log_level"
    echo "Current log level is $current_log_level."
]];
then    # usageSkip fiupdate if # Validate lognew level allowed_log_levels=(0 5 10 15 20 30 40 50)matches the current level
    if [[ ! " ${allowed"$current_log_levels[@]} level" =~-eq " ${new$new_log_level} " ]]; then
        echo "Error: Invalid log level. Must be one of: ${allowed_log_levels[*]}"
    exit 1
fi

# Retrieve the cluster name
clusterName=$(curl -u admin:caringo -sS "http://$swarm_ip:91/api/storage/clusters" | grep -oP '"name":\s*"\K[^"]+')
if [[ -z "$clusterName" ]]; then"
        echo "Log level is already set to $new_log_level. No changes made."
        return
    fi

    # Update the log level using PUT
    echo "FailedUpdating log level to retrieve the cluster name. Please check your inputs."
    exit 1
fi

# Convert duration to an integer if it is set
if [[ -n "$duration" ]]; then $new_log_level..."
    response=$(curl --user "$credentials" -sS -X PUT -H "Content-Type: application/json" \
        "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
        -d "{\"value\": $new_log_level}")

    if ! [[ "$duration$jq_or_grep" =~ ^[0-9]+$= "grep" ]]; then
        updated_log_level=$(echo "$response"Error: Duration| mustgrep be a positive integer value in seconds."
        exit 1
    fi
fi

# Display input parameters
echo "Swarm IP: $swarm_ip"
echo "Credentials: [hidden for security]"
echo "Cluster Name: $clusterName"

# Identify the log file location
log_file=""
if [[ -f "/var/log/caringo/castor.log" ]]; then
    log_file="/var/log/caringo/castor.log"
elif [[ -f "/var/log/datacore/castor.log-oP '"value":\s*\K[0-9]+')
    else
        updated_log_level=$(echo "$response" | "$jq_or_grep" -r '.value')
    fi

    if [[ "$updated_log_level" -eq "$new_log_level" ]]; then
        echo "Log level changed successfully from $current_log_file="/var/log/datacore/castor.log"
fi

# Display log file information
if [[ -n "$log_file" ]]; then
    echo "Log file located at: $log_file"level → $new_log_level."
    else
        echo "Failed to update log level. Response: $response"
        exit 1
    fi

    # Countdown Captureand revert initiallog filelevel
size     initial_size=$(statif [[ -c%sn "$log_file")
    initial_size_formatted=$(format_size "$initial_size")$duration" && "$duration" -gt 0 ]]; then
        echo "InitialKeeping log filelevel size:at $initial$new_sizelog_formatted"level for else
$duration second(s)..."
   echo "Warning: Log file not found in expected directories."
fi

# Get the current log level
echo ""
echo "Retrieving the current log level..."
current_log_level=$(curl -u "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | grep -oP '"value":\s*\K[0-9]+')

# Check if the current log level was retrieved successfully
if [[ -z "$current_log_level" ]]; then
    echo "Failed to retrieve the current log level. Please check your inputs."
    exit 1
fi
echo "New log level: $new_log_level"
echo "Current log level is $current_log_level."

# Check if the new log level is the same as the current log level
if [[ "$current_log_level" -eq "$new_log_level" ]]; then
    echo ""
    echo "Log level is already set to $new_log_level. No changes made."
    exit 0
fi

# Update the log level using PUT
echo "Updating log level to $new_log_level..."
response=$(curl -u "$credentials" -sS -X PUT -H "Content-Type: application/json" \
    "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
    -d "{\"value\": $new_log_level}")

# Verify if the log level was updated
updated_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')
if [[ "$updated_log_level" -eq "$new_log_level" ]]; then
    echo "Log level changed successfully from $current_log_level → $new_log_level."
else
    echo "Failed to update log level. Response: $response"
    exit 1
fi

# If duration is specified, wait and revert after the specified time
if [[ -n "$duration" && "$duration" -gt 0 ]]; then
    echo "Keeping log level at $new_log_level for $duration second(s)..."
    echo ""

    # Countdown loop
    for ((i=duration; i>0; i--)); do
        # Calculate hours, minutes, and secondsfor ((i=duration; i>0; i--)); do
            printf -v countdown "%02d:%02d:%02d" $((i/3600)) $(( (i%3600) / 60 )) $((i%60))
            echo -ne "Countdown: $countdown remaining...\r"
            sleep 1
        done
        echo -e "\n\nTime's up! Reverting log level back to $current_log_level..."

        response=$(curl --user "$credentials" -sS -X PUT -H "Content-Type: application/json" \
            "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
            -d "{\"value\": $current_log_level}")

        if [[ "$jq_or_grep" == "grep" ]]; then
            reverted_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')
        else
            reverted_log_level=$(echo "$response" | "$jq_or_grep" -r '.value')
        fi

        final_size=$(stat -c%s "$log_file" 2>/dev/null || echo 0)
        size_diff=$(( final_size - initial_size ))
        size_diff_formatted=$(format_size "$size_diff")
        duration_formatted=$(format_duration "$duration")
        echo "Approximate $size_diff_formatted new logs were generated at log level $new_log_level. Current castor.log size is $(format_size "$final_size") after $duration_formatted."

        if [[ "$reverted_log_level" -eq "$current_log_level" ]]; then
            echo "Log level reverted successfully back to $current_log_level."
        else
        hours=$((i / 3600))  echo "Failed to revert log level.  minutes=$(( (i % 3600) / 60 ))Response: $response"
            exit  seconds=$((i % 60))1
        fi
    else
# Format countdown in hh:mm:ss    echo "Log level change is printfpermanent -vuntil countdownmanually "%02d:%02d:%02d" $hours $minutes $seconds
        echo -ne "Countdown: $countdown remaining...\r"modified."
    fi
}

# Run in persistent or directly
if $persistent; then
    # Pass the main_script function sleepto 1the screen session and store donethe output in a file
echo -e "\n\nTime's up! Revertingif logcommand level-v back to $current_log_level..."screen &>/dev/null; then
     # Check log filescreen size before reverting
    final_size=$(stat -c%s "$log_file")
    final_size_formatted=$(format_size "$final_size")

    # Calculate size difference
    size_diff=$(( final_size - initial_size ))
    size_diff_formatted=$(format_size "$size_diff")

    # Display size difference and final log size
    duration_formatted=$(format_duration "$duration")
    echo "Approximate $size_diff_formatted new logs were generated at log level $new_log_level. Current castor.log size is $final_size_formatted after $duration_formatted."
    echo -dmS indexer_script bash -c "$(declare -f main_script format_size format_duration check_jq debug); main_script \"$swarm_ip\" \"$credentials\" \"$new_log_level\" \"$duration\" \"$clusterName\" \"$jq_or_grep\" | tee \"$output_log\""
        screen -r indexer_script
    elif command -v tmux &>/dev/null; then
        tmux new-session -d -s indexer_script "$(declare -f main_script format_size format_duration check_jq debug); main_script \"$swarm_ip\" \"$credentials\" \"$new_log_level\" \"$duration\" \"$clusterName\" \"$jq_or_grep\" | tee \"$output_log\""
     # Revert to original log level
    response=$(curl -u "$credentials" -sS -X PUT -H "Content-Type: application/json" \tmux attach-session -t indexer_script
    else
         "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
        -d "{\"value\": $current_log_level}")echo "Error: Neither screen nor tmux available. Run without --persistent."
        exit  reverted_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')1
    fi

    if [[ "$reverted_log_level" -eq "$current_log_level" ]]; then
        echo "Log level reverted successfully back to $current_log_level."
    else
  # Wait for the screen session to complete and then display the output log
    sleep 1
    while screen -list | grep -q "indexer_script"; do
     echo "Failed to revertsleep log1
level. Response: $response"  done

    echo exit""
1    cat fi"$output_log"
else
    echomain_script "$swarm_ip" "Log level change is permanent until manually modified.$credentials" "$new_log_level" "$duration" "$clusterName" "$jq_or_grep" | tee "$output_log"
fi