Recovery Point Objective (RPO)
Five to Nine has established a well-managed structure of systems in order to achieve minimum RPO. Our RPO maximum is 24 hours. This will reduce as our usage increases and the frequency of our daily backups increases.
Recovery Time Object (RTO)
Five to Nine uses AWS services along with some tools like TravisCI for automation. With all those tools in place, we have achieved RTO of a maximum of 10 hours. Our engineering team is committed to resolving the issues as soon as possible and the services will be up and running.
Amazon Web Services
Five to Nine uses Amazon Web services. There are many advantages to this and some of them are a major part of backups & DR plans. Amazon handles the physical disaster.
Database backups
We use RDS service provided by AWS to host our data. We are taking backups of databases on a daily basis. Once our usage increases, we will increase the frequency of our backups.
Servers backups
We have different servers/environments for different purposes such as our site, testing, and staging environments. We are using AWS Route53 for domain handling. This gives us more flexibility. In the case that our EB (Elasticbeanstalk) instance fails and we can’t restore it, then we can initialize a new instance and direct the live domain to it. Moreover, we can save snapshots of an instance.
Assets backups
AWS provides the ability to backup S3 bucket. We will be using that too. In case of recovery, we can also use those backups.
Server monitoring
Five To Nine has a plan to add monitoring tools on the servers too i.e. Monit. Monit will keep in check with services/servers like Puma, PostgreSQL and will restart or start the services accordingly.
Recovery
As described above, we have well-structured backups in place. We can use all those backups along with advanced tools like TravisCI, monitoring tools to achieve a good RTO and RPO. In case of disaster, we will be following these steps:
Identify the problem as soon as possible.
Allocate issue to priority level:
P0: Critical – 1 hour
P1: High – 8 hours
P2: Medium – 24 hours
P3: Low – 48 hours
If we can resolve the issue in less time, our engineering team will fix the problem according to the criticality level.
If the problem is critical enough that it will take more-than-expected time, our first priority is to make the servers up & running using the backups or new resources.
Business Continuity Plan | Five To Nine
Our continuity plan consists of the following:
Business Impact Analysis
Our analytical teams (technical and non-technical) both will analyze the behavior of our application.
Recovery Strategy
The team will create tickets under ice-box for all the things/sections that will need improvements, this can include the lack of performance, any session/cookies storage issue. If the issue is critical, our engineering team will get the system up and running in less than 4 hours.
Plan Development
We will plan out the development process and how to proceed with the tickets stacked in the ice-box and will move into the backlog section with proper timelines to meet the selected designed sprints.
Testing
Our engineering team will perform all the rigorous testing related to the particular tickets.