Platform service monitoring
Depending on the monitoring system used in the organization, a specialist must choose a method for monitoring the status of platform processes.
The page describes all the processes that need to be kept in working order for the platform to fully operate.
We recommend checking for the existence of a process by name using the tools built into the monitoring system or using the pidof utility.
The following information must be collected for each host (or virtual machine):
- RAM usage and available memory;
- CPU usage and available CPU time;
- Available disk space and disk utilization information (speed, latency, SMART);
- Number of open files and connections.
Databases and services
| Default IP | Default port | Importance | Alert | |
|---|---|---|---|---|
| MongoDB | 127.0.0.1 | 27017 | Critical | Unable to establish TCP connection |
| ClickHouse | 127.0.0.1 | 9000 | Critical | Unable to establish TCP connection |
| SSDB (hb) | 127.0.0.1 | 4420 | Critical | Unable to establish TCP connection |
| SSDB (notify) | 127.0.0.1 | 4430 | Critical | Unable to establish TCP connection |
| RabbitMQ | 0.0.0.0 | 5672 | Critical | Unable to establish TCP connection |
RabbitMQ queues
| Virtual host | Queue name | Producer | Consumer | Description | Importance | Alert |
|---|---|---|---|---|---|---|
| altcraft | oxy_triggers | * | AK:proctrigger | Triggers (mailings) | High | Minimum number of messages in queue >= 10000 over 10 minutes |
| altcraft | oxy_triggers_prior | * | AK:proctrigger | Priority triggers (mailings) | High | Minimum number of messages in queue >= 5000 over 10 minutes |
| altcraft | trk_* | AK:trk* AK:cookie_saver | procactions procpixel | Events processed by trackings | High | Minimum number of messages in queue >= 5000 over 10 minutes |
| altcraft | akmta_* | AK:proctrigger AK:webcontrol campaign AK:procworkflow | AK:akmtad | Messages for sending | High | Number of messages does not decrease over 50 minutes |
| altcraft | geo_akmta_* | AK:proctrigger AK:webcontrol campaign AK:procworkflow | AK:akmtad | Messages for sending by time zones | High | Number of messages does not decrease over 50 minutes |
| altcraft | prior_akmta_* | AK:proctrigger AK:webcontrol campaign AK:procworkflow | AK:akmtad | Priority messages for sending | High | Number of messages does not decrease over 50 minutes |
AKD processes
To verify platform operability, it is sufficient to ensure the presence of the process in the virtual filesystem /proc.
The full list of processes can be viewed with the command – <BASEDIR>/akd --processes
The default PID file location is – <BASEDIR>/pids/<Executable file name>.pid
| Executable file name | Process name | Description | Importance | Alert |
|---|---|---|---|---|
webadmin | AK:webadmin | Admin panel | Medium | Process not found in the virtual filesystem /proc |
akmtad | AK:akmtad | Mail transfer agent, AKMTA | High | Process not found in the virtual filesystem /proc |
api | AK:api | API | High | Process not found in the virtual filesystem /proc |
cookie_saver | AK:cookie_saver | User cookie processing | High | Process not found in the virtual filesystem /proc |
proctask | AK:proctask | Task execution, mailing launch | High | Process not found in the virtual filesystem /proc |
procactions | AK:procactions | Event processing for statistics (clicks, opens, subscriptions, etc.) | High | Process not found in the virtual filesystem /proc |
procevent | AK:procevent | Event processing and writing to ClickHouse | High | Process not found in the virtual filesystem /proc |
prochook | AK:prochook | Capturing various events from the platform | High | Process not found in the virtual filesystem /proc |
procpixel | AK:procpixel | Pixel events processing | High | Process not found in the virtual filesystem /proc |
procintegras | AK:procintegras | Integration with external systems (AppMetrica, etc.) | High | Process not found in the virtual filesystem /proc |
procleadsaver | AK:procleadsaver | Import statistics processing | Medium | Process not found in the virtual filesystem /proc |
procnotify | AK:procnotify | Notification processing The process was removed in versions v2024.2.68.2.2206 and later; its functionality was migrated to proctask | Medium | Process not found in the virtual filesystem /proc |
procpush | AK:procpush | Push events processing | High | Process not found in the virtual filesystem /proc |
procresume | AK:procresume | Processing and database scanning to restore profile status | Medium | Process not found in the virtual filesystem /proc |
procrpc | AK:procrpc | RPC client for processing RPC connections with processes | High | Process not found in the virtual filesystem /proc |
procsenderev | AK:procsenderev | Event processing | High | Process not found in the virtual filesystem /proc |
procsmsev | AK:procsmsev | Requests information on SMS sendings | High | Process not found in the virtual filesystem /proc |
procsmslisten | AK:procsmslisten | Processing responses from SMS gateways | High | Process not found in the virtual filesystem /proc |
proctrigger | AK:proctrigger | Trigger mailing processing | High | Process not found in the virtual filesystem /proc |
procwebver | AK:procwebver | Web versions processing | Medium | Process not found in the virtual filesystem /proc |
procworkflow | AK:procworkflow | Scenarios processing | High | Process not found in the virtual filesystem /proc |
tariffcontroller | AK:tariffcontroller | Sending limit control (tariffs) The process was removed in versions v2024.2.68.2.2206 and later; its functionality was migrated to proctask | Medium | Process not found in the virtual filesystem /proc |
trkaction | AK:trkaction | Tracking event registration | High | Process not found in the virtual filesystem /proc |
trk_amazon_sns | AK:trk_amazon_sns | Amazon sender event registration | High | Process not found in the virtual filesystem /proc |
trkmandrill | AK:trkmandrill | Mandrill sender event registration | High | Process not found in the virtual filesystem /proc |
trkimage | AK:trkimage | Pixel events registration | High | Process not found in the virtual filesystem /proc |
trkpush | AK:trkpush | Push tracking events registration | High | Process not found in the virtual filesystem /proc |
trkread | AK:trkread | Email message read events registration | High | Process not found in the virtual filesystem /proc |
trksms | AK:trksms | SMS message read events registration | High | Process not found in the virtual filesystem /proc |
trkwebversion | AK:trkwebversion | Web version read events registration | High | Process not found in the virtual filesystem /proc |
webcontrol | AK:webcontrol | User web interface | High | Process not found in the virtual filesystem /proc |
CPU time
https://en.wikipedia.org/wiki/CPU_time
| Metric type | Description | Importance | Alert |
|---|---|---|---|
CPU system time | CPU usage by process in percent (system) | Info | - |
CPU iowait time | CPU usage by process in percent (iowait) | High | > 15% * number of vCPU cores |
CPU user time | CPU usage by process in percent (user) | Info | - |
CPU utilization | CPU usage by process in percent (total) | High | > 50% * number of vCPU cores |
Memory
| Metric type | Description | Importance | Alert |
|---|---|---|---|
RSS (resident set size) | https://en.wikipedia.org/wiki/Resident_set_size | High | > 20% of total memory |
SWAP | https://en.wikipedia.org/wiki/Paging#Unix_and_Unix-like_systems | High | > 5% of total SWAP |
Network
To identify network issues, it is recommended to monitor the number of connections for each state. https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Protocol_operation
| Metric type | Description | Importance | Alert |
|---|---|---|---|
CLOSE | Closed. Socket is not in use. | Info | - |
CLOSE_WAIT | Remote side has disconnected; waiting for socket closure. | Medium | Number of connections > 5000 |
CLOSING | Socket closed, then remote side disconnected; waiting for acknowledgment. | Info | - |
ESTABLISHED | Connection established. | Medium | Number of connections > 25000 |
FIN_WAIT1 | Socket closed; connection disconnecting. | Info | - |
FIN_WAIT2 | Socket closed; waiting for remote side disconnect. | Info | - |
LAST_ACK | Remote side disconnected, then socket closed; waiting for acknowledgment. | Info | - |
LISTEN | Waiting for incoming connections. | Info | - |
SYN_RECV | Initial connection synchronization in progress. | Info | - |
SYN_SENT | Actively attempting to establish connection. | High | Number of connections > 5000 |
TIME_WAIT | Socket closed, but waiting for packets still in the network for processing | High | Number of connections > 5000 |