Table of Contents | ||
---|---|---|
|
Managing Log Levels in Swarm Cluster Using a Shell Script
This script The Swarm Log Level Management Script (castor-change-log-level.sh
) is designed to manage and dynamically adjust the log level of levels on a DataCore Swarm cluster. The script can set , providing options to change the log level permanently or temporarily (for temporarily and automatically revert it after a specified duration), after which it reverts to the original log level. It supports background execution via screen
or tmux
, making it ideal for long-running operations that require detachment from the terminal.
Script
...
This script enables administrators to:
...
Features
Set and Revert Log Levels: Temporarily change the log level of a Swarm cluster permanently or temporarily.
Monitor log file growth after setting a higher log level for diagnostics.
Revert the log level to its original setting automatically after a defined period.
Script Source Code
...
language | bash |
---|
...
and revert after a specified duration.
Flexible JSON Parsing: Uses
jq
for JSON parsing if available; defaults tojq
otherwise.Background Execution: Optionally runs in the background using
screen
ortmux
.Log Size Monitoring: Reports the log size generated during the temporary log level change.
Countdown Display: Shows a countdown for the specified duration.
Requirements
jq (optional): Used for parsing JSON responses; falls back to
grep
if unavailable.screen or tmux (optional): Required for background execution.
Permissions: Ensure sufficient permissions to execute on the DataCore Swarm server and access required files.
Script Usage
Code Block |
---|
./castor-change-log-level.sh -d <node_ip> -p <admin:password> -i <new_log_level> [-t <duration_in_seconds>] [--background] [-v] |
Parameter | Description |
---|---|
| IP address of the Swarm API endpoint (or set |
| Admin credentials in the format |
| New log level to set (values: |
| Duration in seconds to keep the new log level (optional). |
| Runs the script in a detached session using |
| Enables verbose mode to display debug information. |
Instruction for Use
Example 1: Set log level to 20
and keep it for 10
seconds
Code Block |
---|
./castor-change-log-level.sh -d 192.168.8.84 -p admin:datacore -i 20 -t 10 |
Example 2: Run in background mode with verbose logging
Code Block |
---|
./castor-change-log-level.sh -d 192.168.8.84 -p admin:datacore -i 20 -t 30 --background -v |
Behavior
Log Level Change: Sets the log level to the specified value. If the current log level matches the requested level, the script skips the update.
Countdown: During the specified duration, the script displays a countdown every second.
Revert Log Level: After the countdown, the log level reverts to the initial value.
Log Size Report: Provides approximately log size generated during the temporary log level change.
Debug Mode: When
-v
is specified, debug messages display the script's internal operations.
Output Messages
Message | Description |
---|---|
| Displays the specified Swarm IP address. |
| Credentials are masked for security. |
| Displays the cluster name retrieved from the Swarm API. |
| Shows the new log level requested. |
| Displays the current log level. |
| Indicates the beginning of the log level update process. |
| Confirms that the log level was successfully updated. |
| Shows the temporary period for which the new log level is retained, with a countdown. |
| Indicates that the temporary period has ended and the script is reverting the log level. |
| Provides information on the amount of logging activity generated during the temporary log level. |
Example Output
Code Block |
---|
[root@scs dist]# ./castor-change-log-level.sh -p admin:datacore -i 10 -t 300
Swarm IP: 192.168.1.84
Credentials: [hidden for security]
Cluster Name: gatewayadmindomain
New log level: 10
Current log level is 30.
Updating log level to 10...
Log level changed successfully from 30 → 10.
Keeping log level at 10 for 300 second(s)...
Countdown: 00:00:01 remaining...
Time's up! Reverting log level back to 30...
Approximate 69.4MB new logs were generated at log level 10. Current castor.log size is 371.3MB after 00:05:00.
Log level reverted successfully back to 30.
[root@scs dist]# |
Error Handling
Missing Parameters: Missing parameters prompt a usage message.
Invalid Duration: If a non-numeric duration is provided, you’re prompted to enter a valid duration in seconds.
Connection Issues: If unable to connect to the Swarm API, check the IP, credentials, and network access.
Notes
Credentials are masked in the output for security.
Log file sizes are shown in human-readable format (GB, MB, KB, B).
This script provides administrators with an effective way to adjust and monitor Swarm logging, supporting both temporary and permanent log level changes for troubleshooting and performance monitoring.
Script Source Code
Latest version: castor-change-log-level.sh
Code Block | ||
---|---|---|
| ||
#!/bin/bash # Written by Milton Suen (milton.suen@datacore.com) Oct 31, 2024 # Revision: Update to support running the script in a persistent session using screen or tmux. # Function to display usage information usage() { echo "Usage: $0 -d swarm_ip -p admin:password -i new_log_level [-t duration_in_seconds] [--persistent] [-v]" echo " -d, --swarm_ip IP address of the Swarm API endpoint (or set SCSP_HOST environment variable)" echo " -p, --credentials Credentials in the format admin:password" echo " -i, --log.level New log level to set" echo " -t, --time Duration in seconds to keep the new log level (optional)" echo " --persistent Run the script in a detached session using screen or tmux" echo " -v, --verbose Enable verbose mode for debug messages" exit 1 } # Default options persistent=false verbose=false output_log="script_output.log" # Log file for capturing persistent session output # Function to display debug messages if verbose mode is enabled debug() { if $verbose; then echo "[DEBUG] $1" fi } # Function to check if either 'screen' or 'tmux' is installed check_screen_or_tmux() { if ! command -v screen &>/dev/null && ! command -v tmux &>/dev/null; then echo "Error: Neither 'screen' nor 'tmux' is installed. Cannot run in persistent mode." persistent=false # Disable persistent session fi } # Function to format file size format_size() { local size=$1 if (( size >= 1073741824 )); then echo "$(awk "BEGIN {printf \"%.1fGB\", $size/1073741824}")" elif (( size >= 1048576 )); then echo "$(awk "BEGIN {printf \"%.1fMB\", $size/1048576}")" elif (( size >= 1024 )); then echo "$(awk "BEGIN {printf \"%.1fKB\", $size/1024}")" else echo "${size}B" fi } # Function to format duration format_duration() { local duration=$1 local hours=$((duration / 3600)) local minutes=$(( (duration % 3600) / 60 )) local seconds=$((duration % 60)) printf "%02d:%02d:%02d" $hours $minutes $seconds } # Function to check if jq is available and set up JSON parsing method check_jq() { if [[ -x "/usr/local/bin/jq" ]]; then echo "/usr/local/bin/jq" elif [[ -x "$(pwd)/jq" ]]; then echo "$(pwd)/jq" elif command -v jq &>/dev/null; then echo "jq" else echo "grep" fi } jq_or_grep=$(check_jq) # Parse input arguments while [[ "$#" -gt 0 ]]; do case $1 in -d|--swarm_ip) swarm_ip="$2"; shift 2 ;; -p|--credentials) credentials="$2"; shift 2 ;; -i|--log.level) new_log_level="$2"; shift 2 ;; -t|--time) if [[ -n "$2" && "$2" != -* ]]; then duration="$2" shift 2 else read -p "Enter duration in seconds: " duration shift fi ;; --persistent) persistent=true; shift ;; -v|--verbose) verbose=true; shift ;; *) usage ;; esac done # Check if 'screen' or 'tmux' is installed check_screen_or_tmux # If swarm_ip is not provided, try using SCSP_HOST environment variable if [[ -z "$swarm_ip" ]]; then if [[ -n "$SCSP_HOST" ]]; then swarm_ip="$SCSP_HOST" debug "Using Swarm IP from SCSP_HOST: $swarm_ip" else echo "Error: swarm_ip not provided and SCSP_HOST is not set." usage fi fi # Check if required arguments are provided if [[ -z "$credentials" || -z "$new_log_level" ]]; then echousage "Failedfi to retrieve# theRetrieve cluster name. Pleaseand checkhandle yourJSON inputs."parsing debug "Retrieving the cluster exitname 1 fi # Convert duration to an integer if it is setfrom Swarm API." if [[ -n "$jq_or_grep" == "$durationgrep" ]]; then clusterName=$(curl --user "$credentials" if ! [[ "$duration" =~ ^[0-9]+$ ]]; then echo "Error: Duration must be a positive integer value in seconds." exit 1 fi fi # Display input parameters echo "Swarm IP: $swarm_ip" echo "Credentials: [hidden for security]" echo-sS "http://$swarm_ip:91/api/storage/clusters" | grep -oP '"name":\s*"\K[^"]+') else clusterName=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters" | "$jq_or_grep" -r '._embedded.clusters[0].name') fi if [[ -z "$clusterName" ]]; then echo "Failed to retrieve the cluster name. Please check your inputs." exit 1 fi debug "Cluster Name: $clusterName" # Identify the log file location log_file="" if [[ -f "/var/log/caringo/castor.log" ]]; then log_file="/var/log/caringo/castor.log" elif [[ -f "/var/log/datacore/castor.log" ]]; then Main logic function to run the script tasks main_script() { local swarm_ip="$1" local credentials="$2" local new_log_level="$3" local duration="$4" local clusterName="$5" local log_file="/var/log/datacore/castor.log" fi # Display log file information and truncate if [[ -nlocal initial_size=$(stat -c%s "$log_file" ]]; then 2>/dev/null || echo 0) local current_log_level echo "Log file located at: $log_filelocal jq_or_grep="$6" # CaptureDisplay initial fileinformation size initial_size=$(stat -c%s "$log_file")echo "Swarm IP: $swarm_ip" initial_size_formatted=$(format_size "$initial_size")echo "Credentials: [hidden for security]" echo "Initial log file size: $initial_size_formatted" elseCluster Name: $clusterName" debug "Starting echo "Warning: Log file not found in expected directories." fi # Get the current log level echo "" echo "Retrieving the current log level..."main_script function..." # Retrieve current log level if [[ "$jq_or_grep" == "grep" ]]; then current_log_level=$(curl -u-user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | grep -oP '"value":\s*\K[0-9]+') # else Check if the current log level was retrieved successfully if [[ -z "$current_log_level" ]]; then echo "Failed to retrieve the current log level. Please check your inputs." exit 1 fi current_log_level=$(curl --user "$credentials" -sS "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" | "$jq_or_grep" -r '.value') fi echo "" echo "New log level: $new_log_level" echo "Current log level is $current_log_level." # CheckSkip update if the new log level ismatches the samecurrent aslevel the current log level if [[ "$current_log_level" -eq "$new_log_level" ]]; then echo "" echo "Log level is already set to $new_log_level. No changes made." return exit 0 fi # Update the log level using PUT echo "Updating log level to $new_log_level..." response=$(curl --uuser "$credentials" -sS -X PUT -H "Content-Type: application/json" \ "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \ -d "{\"value\": $new_log_level}") # Verify if the log level was updated [[ "$jq_or_grep" == "grep" ]]; then updated_log_level=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+') if [[ "$updated_log_level" -eq "$new_log_level" ]]; then '"value":\s*\K[0-9]+') else echo "Log level changed successfully from $current_log_level → $newupdated_log_level." else =$(echo "$response"Failed to update log level. Response: $response" | "$jq_or_grep" -r '.value') exit 1 fi # If duration is specified, wait and revert after the specified time if [[ -n "$duration$updated_log_level" &&-eq "$duration$new_log_level" -gt 0 ]]; then echo "Keeping log level atLog level changed successfully from $current_log_level → $new_log_level for $duration second(s)..."." else echo ""Failed to update log level. Response: #$response" Countdown loop for ((i=duration; i>0; i--)); do exit 1 fi # Calculate hours, minutes, Countdown and seconds revert log level hours=$((i / 3600)) minutes=$(( (i % 3600) / 60 )) seconds=$((i % 60))if [[ -n "$duration" && "$duration" -gt 0 ]]; then echo "Keeping log level at $new_log_level for $duration second(s)..." for ((i=duration; i>0; # Format countdown in hh:mm:ssi--)); do printf -v countdown "%02d:%02d:%02d" $hours $minutes $seconds echo -ne "Countdown: $countdown remaining...\r" sleep 1$((i/3600)) $(( (i%3600) / 60 )) $((i%60)) done echo -ene "\n\nTime's up! Reverting log level back to $current_log_levelCountdown: $countdown remaining...\r" # Check log file size before reverting final_size=$(stat -c%s "$log_file") final_size_formatted=$(format_size "$final_size") sleep 1 #done Calculate size difference size_diff=$(( final_size echo - initial_size )) size_diff_formatted=$(format_size "$size_diff")e "\n\nTime's up! Reverting log level back to $current_log_level..." # Display size difference and final log size # echo "Approximate $size_diff_formatted new logs was genreated at log level $new_log_level. Current castor.log size is $final_size_formatted." # Format the duration for display duration_formatted=$(format_duration "$duration")response=$(curl --user "$credentials" -sS -X PUT -H "Content-Type: application/json" \ "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \ -d "{\"value\": $current_log_level}") # Updated message withif duration in hh:mm:ss format [[ "$jq_or_grep" == "grep" ]]; then echo "Approximate $size_diff_formatted new logs was generated at log level $newreverted_log_level. Current castor.log size is $final_size_formatted after $duration_formatted."=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+') echo "" else # Revert to original log level responsereverted_log_level=$(curl -uecho "$credentials$response" -sS -X PUT -H "Content-Type: application/json" \| "$jq_or_grep" -r '.value') "http://$swarm_ip:91/api/storage/clusters/$clusterName/settings/log.level" \fi final_size=$(stat -dc%s "{\"value\": $current_log_level}"$log_file" 2>/dev/null || echo 0) reverted_log_level size_diff=$(echo "$response" | grep -oP '"value":\s*\K[0-9]+')( final_size - initial_size )) if [[ "$reverted_log_level" -eq "$current_log_level" ]]; then size_diff_formatted=$(format_size "$size_diff") echo "Log level reverted successfully back to $current_log_level."duration_formatted=$(format_duration "$duration") else echo "Approximate $size_diff_formatted new logs echowere "Failedgenerated toat revert log level. Response: $response" exit 1 $new_log_level. Current castor.log size is $(format_size "$final_size") after $duration_formatted." fi else if echo[[ "Log level change is permanent until manually modified." fi |
Script Usage
Code Block |
---|
./log_level_script.sh -d SWARM_IP -p ADMIN:PASSWORD -i LOG_LEVEL [-t DURATION] |
Parameters:
-d
/--swarm_ip
: Specifies the IP address of the Swarm API endpoint. (Required)-p
/--credentials
: Administrator credentials inadmin:password
format. (Required)-i
/--log.level
: Desired log level to apply. (Required)-t
/--time
: Duration in seconds to keep the new log level before reverting to the previous setting. If omitted, the change is permanent. (Optional)
Prerequistites
Access to the Swarm storage nodes and valid credentials
Verify network access to the Swarm storage nodes from the machine where the script runs.
Instruction for Use
Setting the Log Level Permanently To make a permanent change, omit the
-t
parameter:Code Block bash
Copy code
./log_level_script.sh -d 192.168.1.100 -p admin:password -i 3
This sets the log level to 3 permanently until you manually change it.
Setting the Log Level Temporarily Specify a duration (in seconds) with the
-t
parameter to revert automatically after a defined period:Code Block bash
Copy code
./log_level_script.sh -d 192.168.1.100 -p admin:password -i 3 -t 600
In this example, the log level is set to 3 and reverts to the original level after 600 seconds (10 minutes).
Monitoring Log File Size
The script identifies the log file (
castor.log
) in either/var/log/caringo/
or/var/log/datacore/
.Initial file size is shown before setting the new log level.
Final file size and size difference are displayed after the temporary duration ends, indicating logs generated during this period.
Output Details
The script displays Swarm IP, log file location, initial and final log file sizes, and the cluster name.
For temporary log levels, a countdown timer displays time remaining before reverting.
Once completed, it provides the approximate amount of logs generated, duration, and confirms reversion to the original log level.
Example Output
Code Block |
---|
Swarm IP: 192.168.1.100
Credentials: [hidden for security]
Cluster Name: Cluster_01
Log file located at: /var/log/datacore/castor.log
Initial log file size: 10.5MB
New log level: 3
Current log level is 2.
Updating log level to 3...
Log level changed successfully from 2 → 3.
Keeping log level at 3 for 600 second(s)...
Approximate 1.2MB new logs were generated at log level 3. Current castor.log size is 11.7MB after 00:10:00.
Log level reverted successfully back to 2. |
Error Handling
Missing Parameters: Missing parameters prompt a usage message.
Invalid Duration: If a non-numeric duration is provided, you’re prompted to enter a valid duration in seconds.
Connection Issues: If unable to connect to the Swarm API, check the IP, credentials, and network access.
Notes
Credentials are masked in the output for security.
Log file sizes are shown in human-readable format (GB, MB, KB, B).
...
$reverted_log_level" -eq "$current_log_level" ]]; then
echo "Log level reverted successfully back to $current_log_level."
else
echo "Failed to revert log level. Response: $response"
exit 1
fi
else
echo "Log level change is permanent until manually modified."
fi
}
# Run in persistent or directly
if $persistent; then
# Pass the main_script function to the screen session and store the output in a file
if command -v screen &>/dev/null; then
screen -dmS indexer_script bash -c "$(declare -f main_script format_size format_duration check_jq debug); main_script \"$swarm_ip\" \"$credentials\" \"$new_log_level\" \"$duration\" \"$clusterName\" \"$jq_or_grep\" | tee \"$output_log\""
screen -r indexer_script
elif command -v tmux &>/dev/null; then
tmux new-session -d -s indexer_script "$(declare -f main_script format_size format_duration check_jq debug); main_script \"$swarm_ip\" \"$credentials\" \"$new_log_level\" \"$duration\" \"$clusterName\" \"$jq_or_grep\" | tee \"$output_log\""
tmux attach-session -t indexer_script
else
echo "Error: Neither screen nor tmux available. Run without --persistent."
exit 1
fi
# Wait for the screen session to complete and then display the output log
sleep 1
while screen -list | grep -q "indexer_script"; do
sleep 1
done
echo ""
cat "$output_log"
else
main_script "$swarm_ip" "$credentials" "$new_log_level" "$duration" "$clusterName" "$jq_or_grep" | tee "$output_log"
fi |