Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Managing Log Levels in Swarm Cluster Using a Shell Script

This script is designed to adjust the log level of a Swarm cluster. The script can set the log level permanently or temporarily (for a specified duration), after which it reverts to the original log level.

Script Overview

This script enables administrators to:

  • Adjust the log level of a Swarm cluster permanently or temporarily.

  • Monitor log file growth after setting a higher log level for diagnostics.

  • Revert the log level to its original setting automatically after a defined period.

Script Usage

./log_level_script.sh -d SWARM_IP -p ADMIN:PASSWORD -i LOG_LEVEL [-t DURATION]
  • Parameters:

    • -d / --swarm_ip: Specifies the IP address of the Swarm API endpoint. (Required)

    • -p / --credentials: Administrator credentials in admin:password format. (Required)

    • -i / --log.level: Desired log level to apply. (Required)

    • -t / --time: Duration in seconds to keep the new log level before reverting to the previous setting. If omitted, the change is permanent. (Optional)

Prerequistites

  • Access to the Swarm storage nodes and valid credentials

  • Verify network access to the Swarm storage nodes from the machine where the script runs.

Instruction for Use

  • Setting the Log Level Permanently To make a permanent change, omit the -t parameter:

    ./log_level_script.sh -d 192.168.1.100 -p admin:password -i 15

    This sets the log level to 15 permanently until you manually change it.

  • Setting the Log Level Temporarily Specify a duration (in seconds) with the -t parameter to revert automatically after a defined period:

    ./log_level_script.sh -d 192.168.1.100 -p admin:password -i 15 -t 600

    In this example, the log level is set to 15 and reverts to the original level after 600 seconds (5 minutes).

  • Monitoring Log File Size

    • The script identifies the log file (castor.log) in either /var/log/caringo/ or /var/log/datacore/.

    • Initial file size is shown before setting the new log level.

    • Final file size and size difference are displayed after the temporary duration ends, indicating logs generated during this period.

  • Output Details

    • The script displays Swarm IP, log file location, initial and final log file sizes, and the cluster name.

    • For temporary log levels, a countdown timer displays time remaining before reverting.

    • Once completed, it provides the approximate amount of logs generated, duration, and confirms reversion to the original log level.

Example Output

Swarm IP: 192.168.1.100
Credentials: [hidden for security]
Cluster Name: Cluster_01
Log file located at: /var/log/datacore/castor.log
Initial log file size: 10.5MB

Retrieving the current log level...
New log level: 15
Current log level is 30.
Updating log level to 15...
Log level changed successfully from 30 → 15.
Keeping log level at 15 for 600 second(s)...

Time's up! Reverting log level back to 30...
Approximate 1.2MB new logs were generated at log level 15. Current castor.log size is 11.7MB after 00:10:00.

Log level reverted successfully back to 30.

Error Handling

  • Missing Parameters: Missing parameters prompt a usage message.

  • Invalid Duration: If a non-numeric duration is provided, you’re prompted to enter a valid duration in seconds.

  • Connection Issues: If unable to connect to the Swarm API, check the IP, credentials, and network access.

Notes

  • Credentials are masked in the output for security.

  • Log file sizes are shown in human-readable format (GB, MB, KB, B).

This script provides administrators with an effective way to adjust and monitor Swarm logging, supporting both temporary and permanent log level changes for troubleshooting and performance monitoring.

Script Source Code

#!/bin/bash

# Function to display usage information
usage() {
    echo "Usage: $0 -d swarm_ip -p admin:password -i new_log_level [-t duration_in_seconds]"
    echo "  -d, --swarm_ip           IP address of the Swarm API endpoint"
    echo "  -p, --credentials        Credentials in the format admin:password"
    echo "  -i, --log.level          New log level to set"
    echo "  -t, --time               Duration in seconds to keep the new log level (optional)"
    exit 1
}

# Function to format file size
format_size() {
    local size=$1
    if (( size >= 1073741824 )); then
        echo "$(awk "BEGIN {printf \"%.1fGB\", $size/1073741824}")"
    elif (( size >= 1048576 )); then
        echo "$(awk "BEGIN {printf \"%.1fMB\", $size/1048576}")"
    elif (( size >= 1024 )); then
        echo "$(awk "BEGIN {printf \"%.1fKB\", $size/1024}")"
    else
        echo "${size}B"
    fi
}

# Function to format duration
format_duration() {
    local duration=$1
    local hours=$((duration / 3600))
    local minutes=$(( (duration % 3600) / 60 ))
    local seconds=$((duration % 60))
    printf "%02d:%02d:%02d" $hours $minutes $seconds
}

# Parse input arguments
while [[ "$#" -gt 0 ]]; do
    case $1 in
        -d|--swarm_ip) swarm_ip="$2"; shift ;;
        -p|--credentials) credentials="$2"; shift ;;
        -i|--log.level) new_log_level="$2"; shift ;;
        -t|--time)
            if [[ -n "$2" && "$2" != -* ]]; then
                duration="$2"
                shift
            else
                read -p "Enter duration in seconds: " duration
            fi
            ;;
        *) usage ;;
    esac
    shift
done

# Check if required arguments are provided
if [[ -z "$swarm_ip" || -z "$credentials" || -z "$new_log_level" ]]; then
    usage
fi

# Validate log level
allowed_log_levels=(0 5 10 15 20 30 40 50)
if [[ ! " ${allowed_log_levels[@]} " =~ " ${new_log_level} " ]]; then
    echo "Error: Invalid log level. Must be one of: ${allowed_log_levels[*]}"
    exit 1
fi

# Retrieve the cluster name
clusterName=$(curl -u admin:caringo -sS "http://$swarm_ip:91/api/storage/clusters" | grep -oP '"name":\s*"\K[^"]+')
if [[ -z "$clusterName" ]]; then
    echo "Failed to retrieve the cluster name. Please check your inputs."
    exit 1
fi

# Convert duration to an integer if it is set
if [[ -n "$duration" ]]; then
    if ! [[ "$duration" =~ ^[0-9]+$ ]]; then
        echo "Error: Duration must be a positive integer value in seconds."
        exit 1
    fi
fi

# Display input parameters
echo "Swarm IP: $swarm_ip"
echo "Credentials: [hidden for security]"
echo "Cluster Name: $clusterName"

# Identify the log file location
log_file=""
if [[ -f "/var/log/caringo/castor.log" ]]; then
    log_file="/var/log/caringo/castor.log"
elif [[ -f "/var/log/datacore/castor.log" ]]; then
    log_file="/var/log/datacore/castor.log"
fi

# Display log file information
if [[ -n "$log_file" ]]; then
    echo "Log file located at: $log_file"
    initial_size=$(stat -c%s "$log_file")
    initial_size_formatted=$(format_size "$initial_size")
    echo "Initial log file size: $initial_size_formatted"
else
    echo "Warning: Log file not found in expected directories."
fi

# Get the current log level
echo ""
echo "Retrieving the current log level..."
current_log_level=$(curl -u "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | grep -oP '"value":\s*\K[0-9]+')

# Check if the current log level was retrieved successfully
if [[ -z "$current_log_level" ]]; then
    echo "Failed to retrieve the current log level. Please check your inputs."
    exit 1
fi
echo "New log level: $new_log_level"
echo "Current log level is $current_log_level."

# Check if the new log level is the same as the current log level
if [[ "$current_log_level" -eq "$new_log_level" ]]; then
    echo ""
    echo "Log level is already set to $new_log_level. No changes made."
    exit 0
fi

# Update the log level using PUT
echo "Updating log level to $new_log_level..."
response=$(curl -u "$credentials" -sS -X PUT -H "Content-Type: application/json" \
    "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
    -d "{\"value\": $new_log_level}")

updated_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')
if [[ "$updated_log_level" -eq "$new_log_level" ]]; then
    echo "Log level changed successfully from $current_log_level → $new_log_level."
else
    echo "Failed to update log level. Response: $response"
    exit 1
fi

# If duration is specified, wait and revert after the specified time
if [[ -n "$duration" && "$duration" -gt 0 ]]; then
    echo "Keeping log level at $new_log_level for $duration second(s)..."
    echo ""

    for ((i=duration; i>0; i--)); do
        countdown=$(format_duration $i)
        echo -ne "Countdown: $countdown remaining...\r"
        sleep 1
    done
    echo -e "\n\nTime's up! Reverting log level back to $current_log_level..."

    # Revert to original log level
    response=$(curl -u "$credentials" -sS -X PUT -H "Content-Type: application/json" \
        "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \
        -d "{\"value\": $current_log_level}")

    reverted_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')
    if [[ "$reverted_log_level" -eq "$current_log_level" ]]; then
        echo "Log level reverted successfully back to $current_log_level."
    else
        echo "Failed to revert log level. Response: $response"
        exit 1
    fi
else
    echo "Log level change is permanent until manually modified."
fi

  • No labels