Advanced Configuration#
This section describes implementing health checks and automated Vector restarts to improve application reliability and tracking. It also shows how to retrieve a streaming session identifier to tie a user to their application invocation logs.
Vector Health Check#
Create a new Health Check script:
nano vector_health_monitor.sh
Script overview#
PID-Based Monitoring: Uses process ID tracking to monitor Vector’s health status.
Automatic Restart: Restarts failed Vector processes automatically.
Health Check Loop: Continuously monitors Vector every 30 seconds using kill -0.
Startup Validation: Ensures Vector stabilizes for 15 seconds after restart before marking healthy.
Dual Logging: Outputs to both stdout and /tmp/kit_structured_logs.log.
Configuration variables (defaults shown):
VECTOR_HEALTH_CHECK_INTERVAL= 30 (seconds) — health check frequencyVECTOR_MAX_RESTART_ATTEMPTS= 3 — max restart attempts before giving upVECTOR_RESTART_COOLDOWN= 60 (seconds) — wait time between restart attemptsVECTOR_CONFIG_PATH= “/tmp/vector.toml” — path to Vector configVECTOR_BINARY_PATH= “/opt/vector/bin/vector” — Vector executable path
Health monitor script (copy into vector_health_monitor.sh):
#!/bin/bash
# Vector Health Monitor - PID-based health check service
# This script runs independently and monitors Vector's health via process checking
# Health check configuration
VECTOR_HEALTH_CHECK_INTERVAL=30 # Check every 30 seconds
VECTOR_MAX_RESTART_ATTEMPTS=3 # Maximum restarts before giving up
VECTOR_RESTART_COOLDOWN=60 # Wait 60 seconds between restarts
VECTOR_CONFIG_PATH="/tmp/vector.toml"
VECTOR_BINARY_PATH="/opt/vector/bin/vector"
# Logging function - outputs to both stdout and log file
log() {
local message="[$(date '+%Y-%m-%d %H:%M:%S')] [HealthMonitor] $1"
echo "$message"
echo "$message" >> /tmp/kit_structured_logs.log
}
# Vector health check function - PID-based
vector_health_check() {
local vector_pid=$(get_vector_pid)
if [ -n "$vector_pid" ]; then
# Check if process is actually running and responding
if kill -0 "$vector_pid" 2>/dev/null; then
log "Vector health check PASSED - PID $vector_pid is running"
return 0
else
log "Vector health check FAILED - PID $vector_pid is not responding"
return 1
fi
else
log "Vector health check FAILED - No Vector process found"
return 1
fi
}
# Get Vector PID
get_vector_pid() {
pgrep -f "vector --config $VECTOR_CONFIG_PATH" | head -1
}
# Stop Vector process
stop_vector() {
local vector_pid=$(get_vector_pid)
if [ -n "$vector_pid" ]; then
log "Stopping Vector process (PID: $vector_pid)"
kill "$vector_pid" 2>/dev/null
sleep 5
# Force kill if still running
if kill -0 "$vector_pid" 2>/dev/null; then
log "Force killing Vector process"
kill -9 "$vector_pid" 2>/dev/null
fi
log "Vector process stopped"
else
log "No Vector process found to stop"
fi
}
# Start Vector process
start_vector() {
if [ ! -f "$VECTOR_CONFIG_PATH" ]; then
log "ERROR: Vector config file not found at $VECTOR_CONFIG_PATH"
return 1
fi
if [ ! -x "$VECTOR_BINARY_PATH" ]; then
log "ERROR: Vector binary not found at $VECTOR_BINARY_PATH"
return 1
fi
log "Starting Vector process..."
"$VECTOR_BINARY_PATH" --config "$VECTOR_CONFIG_PATH" &
local new_pid=$!
log "Vector started with PID: $new_pid"
# Wait for Vector to start up and stabilize
log "Waiting for Vector to stabilize..."
local startup_timeout=15
local elapsed=0
while [ $elapsed -lt $startup_timeout ]; do
if kill -0 "$new_pid" 2>/dev/null; then
log "Vector startup successful and stable"
return 0
fi
sleep 2
elapsed=$((elapsed + 2))
done
log "Vector startup failed or process died during startup"
return 1
}
# Vector restart function
restart_vector() {
local restart_count="$1"
log "Attempting to restart Vector (attempt $restart_count/$VECTOR_MAX_RESTART_ATTEMPTS)"
# Stop Vector
stop_vector
# Start Vector
if start_vector; then
log "Vector restart successful"
return 0
else
log "Vector restart failed"
return 1
fi
}
# Wait for Vector to be initially available
wait_for_vector() {
log "Waiting for Vector to be available..."
local wait_timeout=60
local elapsed=0
while [ $elapsed -lt $wait_timeout ]; do
local vector_pid=$(get_vector_pid)
if [ -n "$vector_pid" ] && kill -0 "$vector_pid" 2>/dev/null; then
log "Vector is available (PID: $vector_pid), starting health monitoring"
return 0
fi
sleep 2
elapsed=$((elapsed + 2))
done
log "Vector is not available after $wait_timeout seconds"
return 1
}
# Main monitoring loop
monitor_vector_health() {
local restart_count=0
local last_restart_time=0
log "Starting Vector health monitoring - PID-based (checking every ${VECTOR_HEALTH_CHECK_INTERVAL}s)"
while true; do
sleep $VECTOR_HEALTH_CHECK_INTERVAL
# Skip health check if Vector was just restarted
local current_time=$(date +%s)
if [ $((current_time - last_restart_time)) -lt $VECTOR_RESTART_COOLDOWN ]; then
log "Skipping health check due to recent restart cooldown"
continue
fi
if ! vector_health_check; then
log "Vector health check failed!"
# Check if we have exceeded max restart attempts
if [ $restart_count -ge $VECTOR_MAX_RESTART_ATTEMPTS ]; then
log "CRITICAL: Maximum restart attempts ($VECTOR_MAX_RESTART_ATTEMPTS) exceeded!"
log "Vector health monitoring disabled. Manual intervention required."
# Continue monitoring but don't restart
sleep 300 # Wait 5 minutes before next check
continue
fi
restart_count=$((restart_count + 1))
last_restart_time=$(date +%s)
if restart_vector $restart_count; then
log "Vector restart successful"
# Reset restart count on successful restart
restart_count=0
else
log "Vector restart failed"
fi
else
# Reset restart count on successful health check
if [ $restart_count -gt 0 ]; then
log "Vector health restored, resetting restart counter"
restart_count=0
fi
fi
done
}
# Signal handlers
cleanup() {
log "Health monitor shutting down..."
exit 0
}
trap cleanup EXIT INT TERM
# Main execution
log "Vector Health Monitor starting (PID-based monitoring)..."
# Wait for Vector to be available initially
if ! wait_for_vector; then
log "ERROR: Vector is not available for initial health check"
exit 1
fi
# Start health monitoring
monitor_vector_health
Include the Health Check Monitor in the Entrypoint#
Update entrypoint_vector_dev.sh to start Vector independently and launch the health monitor (example below).
Edit the entrypoint:
nano entrypoint_vector_dev.sh
Replace or update with the following (copy into entrypoint_vector_dev.sh):
#!/bin/bash
echo "[entrypoint_vector_dev.sh] Starting container..."
# Set required environment variables for Kit
export USER="ubuntu"
export LOGNAME="ubuntu"
# Check if Vector OTEL processing is enabled
if [ "$VECTOR_OTEL_ACTIVE" = "TRUE" ]; then
echo "[Vector] Vector OTEL processing is ENABLED (VECTOR_OTEL_ACTIVE=TRUE)"
echo "[Vector] Health check setting: VECTOR_HEALTH_CHECK=${VECTOR_HEALTH_CHECK:-"(not set)"}"
# Create log files if they do not exist
echo "[Vector] Setting up log files..."
touch /tmp/kit_structured_logs.log
chmod 666 /tmp/kit_structured_logs.log
# Validate OTEL endpoint
if [ -z "$OTEL_EXPORTER_OTLP_LOGS_ENDPOINT" ]; then
echo "[Vector] ERROR: OTEL_EXPORTER_OTLP_LOGS_ENDPOINT is not set!"
exit 1
fi
if [[ ! "$OTEL_EXPORTER_OTLP_LOGS_ENDPOINT" =~ ^https?:// ]]; then
echo "[Vector] ERROR: Invalid OTEL endpoint format. Must start with http:// or https://"
exit 1
fi
echo "[Vector] Using OTEL endpoint: $OTEL_EXPORTER_OTLP_LOGS_ENDPOINT"
# Determine which Vector configuration to use
if [ ! -z "$VECTOR_CONF_B64" ]; then
echo "[Vector] Custom Vector configuration provided via VECTOR_CONF_B64"
echo "[Vector] Decoding and using customer-provided configuration..."
# Decode Vector config
echo "$VECTOR_CONF_B64" | base64 -d > /tmp/vector_raw.toml
# Replace OTEL endpoint
sed "s|PLACEHOLDER_OTEL_ENDPOINT|$OTEL_EXPORTER_OTLP_LOGS_ENDPOINT|g" /tmp/vector_raw.toml > /tmp/vector.toml
echo "[Vector] Using CUSTOM Vector configuration (from VECTOR_CONF_B64)"
else
echo "[Vector] No custom configuration provided. Using static/default Vector configuration..."
# Copy static configuration and replace OTEL endpoint
cp /opt/vector/static_config.toml /tmp/vector_raw.toml
sed "s|PLACEHOLDER_OTEL_ENDPOINT|$OTEL_EXPORTER_OTLP_LOGS_ENDPOINT|g" /tmp/vector_raw.toml > /tmp/vector.toml
echo "[Vector] Using STATIC Vector configuration (from /opt/vector/static_config.toml)"
fi
# Show the first few lines of the config for debug
echo "[Vector] First 10 lines of /tmp/vector.toml:" && head -n 10 /tmp/vector.toml
# Validate Vector config
echo "[Vector] Verifying Vector configuration..."
if [ -x "/opt/vector/bin/vector" ]; then
/opt/vector/bin/vector validate /tmp/vector.toml
if [ $? -ne 0 ]; then
echo "[Vector] ERROR: Vector configuration validation failed!"
exit 1
fi
fi
# Test OTEL endpoint connectivity
OTEL_HOST=$(echo "$OTEL_EXPORTER_OTLP_LOGS_ENDPOINT" | sed 's|http://||' | sed 's|https://||' | cut -d':' -f1)
OTEL_PORT=$(echo "$OTEL_EXPORTER_OTLP_LOGS_ENDPOINT" | sed 's|.*:||' | cut -d'/' -f1)
echo "[Vector] Testing connectivity to $OTEL_HOST:$OTEL_PORT"
if command -v nc >/dev/null 2>&1; then
timeout 5 nc -zv "$OTEL_HOST" "$OTEL_PORT" && echo "[Vector] Network connectivity: SUCCESS" || echo "[Vector] Network connectivity: FAILED"
fi
# Start Vector as completely independent background process
echo "[Vector] Starting Vector as independent background process..."
# Run Vector in background, keep stdout for transformed logs, redirect stderr
/opt/vector/bin/vector --config /tmp/vector.toml 2>/dev/null &
VECTOR_PID=$!
echo "[Vector] Vector started independently with PID: $VECTOR_PID"
# Start health monitor as completely independent background process (if enabled)
if [ "$VECTOR_HEALTH_CHECK" = "TRUE" ]; then
echo "[Vector] Health check is ENABLED (VECTOR_HEALTH_CHECK=TRUE)"
if [ -x "/vector_health_monitor.sh" ]; then
echo "[Vector] Starting health monitor as independent background process..."
nohup /vector_health_monitor.sh 2>&1 &
HEALTH_MONITOR_PID=$!
echo "[Vector] Health monitor started independently with PID: $HEALTH_MONITOR_PID"
else
echo "[Vector] Health monitor not available at /vector_health_monitor.sh"
fi
else
echo "[Vector] Health check is DISABLED (VECTOR_HEALTH_CHECK not set to TRUE)"
echo "[Vector] Vector will run without health monitoring"
fi
# Give Vector a moment to start up
sleep 2
echo "[Vector] Starting Kit application - completely independent of Vector..."
echo "[Vector] Kit app success/failure will not be affected by Vector issues"
# Run Kit app completely independently - pipe logs to file for Vector to pick up
# Kit app exit code is what matters for the container
stdbuf -oL /entrypoint.sh 2>&1 | stdbuf -oL tee -a /tmp/kit_structured_logs.log
# Capture Kit app exit code
KIT_EXIT_CODE=$?
echo "[Vector] Kit application completed with exit code: $KIT_EXIT_CODE"
echo "[Vector] Vector and health monitor continue running independently"
# Exit with Kit's exit code - Vector issues do not affect this
exit $KIT_EXIT_CODE
else
echo "[Vector] Vector OTEL processing is DISABLED (VECTOR_OTEL_ACTIVE=FALSE or not set)"
echo "[Vector] Running Kit without log processing."
exec /entrypoint.sh
fi
Dockerfile Updates (include health check script)#
Update your Dockerfile to copy the new health monitor and install netcat for connectivity tests. Here is an example Dockerfile snippet:
FROM kit_app_template:latest
USER root
RUN apt-get update && \
apt-get install -y curl netcat-openbsd && \
mkdir -p /opt/vector && \
curl -L https://packages.timber.io/vector/0.46.1/vector-0.46.1-x86_64-unknown-linux-gnu.tar.gz -o /tmp/vector.tar.gz && \
tar -xzf /tmp/vector.tar.gz -C /opt/vector --strip-components=2 && \
rm -rf /tmp/vector*
RUN mkdir -p /logs
# Ensure ubuntu home directory exists for NVCF compatibility (user already exists in base image)
RUN mkdir -p /home/ubuntu && \
chown -R ubuntu:ubuntu /home/ubuntu
# Create Vector data directory and give ubuntu user access
RUN mkdir -p /var/lib/vector && \
chown -R ubuntu:ubuntu /var/lib/vector
COPY entrypoint_vector_dev.sh /entrypoint_vector_dev.sh
COPY vector_health_monitor.sh /vector_health_monitor.sh
COPY vector.toml /opt/vector/static_config.toml
RUN chmod +x /entrypoint.sh /entrypoint_vector_dev.sh /vector_health_monitor.sh
# Switch back to ubuntu user for runtime
USER ubuntu
ENTRYPOINT ["/entrypoint_vector_dev.sh"]
Build the Kit Vector Container#
Verify the required files exist in the working directory:
ls -la
You should see the following files (example):
total 28
drwxrwxr-x 2 horde horde 4096 Jan 15 14:32 .
drwxrwxr-x 8 horde horde 4096 Jan 15 14:15 ..
-rw-rw-r-- 1 horde horde 1284 Jan 15 14:28 Dockerfile
-rwxrwxr-x 1 horde horde 4856 Jan 15 14:22 entrypoint_vector_dev.sh
-rw-rw-r-- 1 horde horde 6042 Jan 15 14:24 vector_health_monitor.sh
-rw-rw-r-- 1 horde horde 2847 Jan 15 14:25 vector.toml
Build your enhanced Kit container with Vector integration:
docker build -t byoo_kit_vector:latest .
Push the container and create the function following the Container to Function (DRAFT) instructions.
Environment Variables#
The health check behavior is controlled by the following environment variables:
VECTOR_OTEL_ACTIVE—TRUE|FALSE/ not set - WhenTRUE: container uses Vector for log processing and forwarding to NVCF collector. - WhenFALSEor unset: container bypasses Vector and runs Kit directly via/entrypoint.sh.VECTOR_CONF_B64— base64-encoded string - Provides a custom Vector configuration. If provided the entrypoint decodes and uses it; otherwise the default staticvector.tomlis used.VECTOR_HEALTH_CHECK—TRUE|FALSE(or not set) - WhenTRUE: enables Vector health monitoring with automatic restart capabilities. - WhenFALSEor unset, Vector runs without health monitoring (no automatic recovery).
Get Session UUID#
The documentation above includes information for retrieving a session identifier to tie a user to their application invocation logs. Use the Portal Sample or application-provided session identifiers (for example, session.id) and include them as attributes for correlation.
Sample KQL query for troubleshooting:
AppTraces
| where Properties.function_id == "xxxxxxxxxxxx"