Document ID: SOC2-AVAIL-001 | Version: 1.0 | Date: March 2026 | Classification: Internal | Owner: Head of Engineering / CISO
Purpose
This policy establishes [Organisation Name]โs availability commitments and the controls required to meet them, in accordance with the SOC 2 Availability Trust Services Category (A1.1โA1.3).
1. Availability Commitments
| Service | Target Uptime | Measurement Window | RTO | RPO |
|---|---|---|---|---|
| Core product (API + application) | 99.9% | Monthly (excluding maintenance windows) | 1 hour | 15 minutes |
| Customer-facing web application | 99.9% | Monthly | 2 hours | 1 hour |
| Data ingestion pipeline | 99.5% | Monthly | 4 hours | 30 minutes |
| Internal tools | 99.0% | Monthly | 8 hours | 4 hours |
RTO (Recovery Time Objective): Maximum acceptable time to restore service after an outage. RPO (Recovery Point Objective): Maximum acceptable data loss measured in time.
2. Redundancy Requirements
| Component | Requirement |
|---|---|
| Application layer | Minimum 2 instances; auto-scaling with load balancer |
| Database | Multi-AZ deployment; read replicas for scale |
| Storage | Cross-region replication for customer data |
| Network | Redundant connectivity; CDN for static assets |
| DNS | Multiple DNS providers; TTL โค60 seconds for rapid failover |
| Secrets management | HA secrets manager (not single-node) |
3. Backup Requirements
| Data Type | Backup Frequency | Retention | Encryption | Offsite Copy |
|---|---|---|---|---|
| Production database | Continuous WAL + daily snapshot | 30 days daily; 12 months monthly | โ AES-256 | โ Cross-region |
| Application configuration | On every change | 90 days | โ | โ |
| Audit logs | Real-time | 12 months | โ | โ |
| Customer exports | On creation | 90 days | โ | โ |
Backup Restore Testing: Backups must be tested quarterly to confirm they can be restored within RTO. Results are documented and reviewed by Engineering Lead.
4. Performance Monitoring
The following must be monitored continuously with alerting:
- API response times (p50, p95, p99)
- Error rates (4xx, 5xx)
- Infrastructure resource utilisation (CPU, memory, disk, network)
- Database query performance
- Queue depths and processing lag
- External dependency health (third-party APIs)
Alerting thresholds must trigger PagerDuty / on-call rotation. P1 incidents (full outage) trigger immediate escalation.
5. Maintenance Windows
Scheduled maintenance that may affect availability:
- Notified to customers via status page and email โฅ48 hours in advance
- Scheduled during lowest-traffic periods (weekday 02:00โ05:00 customer local time)
- Duration limited to 4 hours maximum for planned maintenance
- Emergency maintenance permitted with concurrent notification
6. Capacity Management
Infrastructure capacity is reviewed monthly to ensure adequate headroom:
- Production systems maintain โฅ30% headroom on CPU and memory at peak load
- Storage capacity reviewed monthly; expansion triggered at 70% utilisation
- Annual capacity planning exercise conducted before peak usage periods
7. Review
This policy is reviewed annually and after any availability incident resulting in an SLA breach.
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | March 2026 | [Author] | Initial issue |