Preventive maintenance
Calendar of scheduled interventions: cleaning, thermal paste renewal, health check, connector verification, preventive replacements. Typically every 24-36 months.
An enterprise server rarely dies suddenly. It almost always announces it in the logs for days or weeks before. Preventive maintenance is the discipline of reading it in time: hardware refresh, health check, thermal optimization, firmware, targeted replacements before the blocking failure.
Calendar of scheduled interventions: cleaning, thermal paste renewal, health check, connector verification, preventive replacements. Typically every 24-36 months.
Deep cleaning, CPU thermal paste, BMC/CMOS battery replacement, capacitor inspection. Health check with SEL/IPMI/SMART reading and written priority report.
Planned updates on validated compatibility matrix and rollback plan. Includes CPU microcode for side-channel vulnerabilities. Done in agreed maintenance windows.
Historic temperature analysis from BMC, targeted intervention to reduce throttling, fan curve calibration. Often the fastest way to recover 10-20% lost to throttling.
We offer hardware-only maintenance contracts and hardware + software contracts. Three SLA tiers: Essential (annual health check, on-site within 5 business days), Business (semestral health check, programmed hardware refresh, on-site within 2 business days), Critical (quarterly health check, on-site Lombardy within 4 business hours, pre-allocated cold-spare pool, dedicated technical account).
Depends on load, thermal environment, criticality. Pragmatic rule: complete hardware refresh every 24-36 months for servers in continuous production. Health check with SEL log analysis every 6-12 months.
Deep cleaning, CPU heatsink thermal paste replacement, internal connector verification, RAM heatsink re-tightening, BMC/CMOS battery replacement, visual capacitor inspection, fan lubrication or replacement, redundant PSU verification, critical firmware updates.
Often hardware that the OS exposes as a software error. Typical signs: kernel panics correlated with MCE events in logs, BSOD with WHEA_UNCORRECTABLE_ERROR codes, random reboots under load. Our analysis starts from BMC/IPMI logs before touching the OS.
Yes, with planning. We verify the vendor compatibility matrix, check release notes for licensing/feature impact, prepare a rollback plan. Update in agreed window with full configuration backup. Post-update state validated with stress tests before returning to production.
A targeted intervention to reduce CPU thermal throttling and bring sensors back to nominal ranges. It includes: reading historical data from the BMC (CPU, DIMM, VRM, ambient temperatures), airflow inspection, cleaning heatsinks and filters, thermal paste replacement (it degrades measurably after 3-4 years), fan curve calibration, and sometimes replacing fans that are inadequate for the current load. Often the fastest way to recover 10-20% of performance lost to throttling.
If the answer is "more than three years", it is probably losing performance to thermal throttling and accumulating events in the logs. Health check + hardware refresh is an investment that pays back in recovered useful life.