Recovery Time Objective vs Reality: Closing the Gap
Most IT teams can't meet their recovery time objective—and they don't even know it. In this episode of The Backup Wrap-up, Curtis and Prasanna explain why your RTO is probably fantasy, who should actually be setting it (hint: not you), and what recovery time actual really means. We cover the critical difference between objectives and reality, why testing is non-negotiable, and how to have honest conversations with business leadership about what's achievable. Learn about DR drills, chaos engineering, tabletop exercises, and why measuring your actual recovery times is the only way to close the gap. Stop feeling like a failure and start building realistic, tested recovery plans that actually work when disaster strikes.
You found the backup wrap up your go-to podcast for all things
Speaker:backup recovery and cyber recovery.
Speaker:In this episode, we're tackling one of the biggest lies in it,
Speaker:your recovery time objective.
Speaker:I don't care what your RTO documentation says or what you
Speaker:believe you've promised your bosses.
Speaker:If you haven't tested it, you can't meet it.
Speaker:Period persona and I break down why most organizations are living in fantasy
Speaker:land when it comes to recovery time, objective, and more importantly, what
Speaker:you can actually do to address that gap.
Speaker:If you've ever felt that pit in your stomach when someone
Speaker:asks you about recovery times.
Speaker:This is your episode.
Speaker:Let's get real about RTO.
Speaker:By the way, if you don't know who I am, I'm w Curtis Preston, AKA, Mr.
Speaker:Backup, and I've been passionate about backup and recovery ever since.
Speaker:I had to tell my boss that there were no backups of that production
Speaker:database that we had just lost.
Speaker:I don't want that to happen to you, and that's why I do this.
Speaker:On this podcast, we turn unappreciated backup admins into cyber recovery heroes.
Speaker:This is the backup wrap up.
Speaker:hi, and welcome to the backup wrap up.
Speaker:I'm your host, w Curtis Preston, AKA, Mr. Backup, and I have with
Speaker:me the rarest of All Beasts lately.
Speaker:Anyway, Prasanna Malaiyandi how's it going?
Speaker:Prasanna, I.
Speaker:I am good, Curtis.
Speaker:I know it's been a
Speaker:It is been, it's been a minute.
Speaker:we've
Speaker:why, that's why the listeners have been listening to like repeats.
Speaker:Uh, yeah.
Speaker:'cause you of course you're gonna blame it on me with I know, I know.
Speaker:I was working the election, I was working the, uh, I, for those who don't
Speaker:know, I'm gonna, I'm a poll worker, you know, and I'm not doing other things.
Speaker:Site
Speaker:Yeah.
Speaker:I, I am a site manager of the Yeah, the Bonsall Vote Center.
Speaker:In San Diego.
Speaker:And so we did have our special election and so I worked for 11 days, not
Speaker:including the setup and tear down day.
Speaker:So I've been a little busy.
Speaker:Okay.
Speaker:What, how many voters?
Speaker:Yeah, you've been a little busy and how many voters Yeah.
Speaker:Did you have
Speaker:Uh, we had like, like 75 over the first 10 days.
Speaker:And then on the last day we had about 400.
Speaker:Um.
Speaker:Oh
Speaker:Which was, which is, which is a lot.
Speaker:Um, and I, you know, I love, I love, I love democracy.
Speaker:I love people.
Speaker:I want everybody to, uh, to, to, to, to vote.
Speaker:Um, you know, if you don't vote, you don't get to bitch.
Speaker:That's my,
Speaker:But please don't wait
Speaker:but yeah, for the love of God, look into, look into your, your sight
Speaker:most, or your state most likely has.
Speaker:Early voting, look into early voting and early vote or vote by mail.
Speaker:Right?
Speaker:Um, those 400 people could have come any time in the previous
Speaker:10 days, and we would've, they could have voted just the same.
Speaker:Um, yeah.
Speaker:Anyway, so please vote.
Speaker:Um,
Speaker:Well, welcome back.
Speaker:Yeah.
Speaker:So, um.
Speaker:I wanted to, we're, we're gonna kind of, you know, we're kind of
Speaker:redoing things after a, you know, a couple different phases here.
Speaker:And, uh, we're gonna just try to do some, some hot topics
Speaker:that, um, I think are important.
Speaker:And one of them that we're gonna talk about this week is RTO.
Speaker:And specifically which recovery time objective.
Speaker:Of course, we're, we're gonna what?
Speaker:What?
Speaker:Yeah.
Speaker:Return to office.
Speaker:So we're gonna talk about return to office and then, um, and you know, what it is
Speaker:and why it is fantasy for most people.
Speaker:And then what, what they could do, um, you know, to, to address that.
Speaker:So first off, you want to define recovery time objective.
Speaker:Yeah, it's basically your objective, right?
Speaker:Your goal for how long it should take you to recover from some disaster, right?
Speaker:And get back to a good known spot.
Speaker:This is including things like recovering your data, reconfiguring
Speaker:your network, right, and different.
Speaker:disasters might have different recovery time objectives, so it's also important
Speaker:to remember, like recovering a file may be a lot less in terms of RTO
Speaker:say, recovering an entire data center.
Speaker:If it, uh, something
Speaker:Yeah, it's interesting you brought up that that's actually a hotly debated topic as
Speaker:to whether or not RTO should ever change.
Speaker:I agree with you that, um, the RTO is situational, situationally dependent,
Speaker:um, and that, you know, if, if you've been attacked by ransomware.
Speaker:For example, there's no way you're gonna meet sort of what
Speaker:I would call a normal RTO.
Speaker:Uh, and the same, you know, and the same with like, if it's a complete
Speaker:disaster that wipes out your entire data center and you have to physically
Speaker:build a building into which to put your servers or something like that,
Speaker:that RTO should be, um, larger than, you know, we lost a single server.
Speaker:Or like, you know, you said a lost a single file.
Speaker:Or if you go tell like your application admin who needs to recover data, oh, by
Speaker:the way, it's gonna take one week or two weeks to recover your data because that's
Speaker:the RTO you set for like site disasters.
Speaker:They're also probably gonna be unhappy,
Speaker:Yeah, absolutely.
Speaker:wait, wait, wait.
Speaker:That makes
Speaker:So that's another really important thing that you brought up right there,
Speaker:which is the, and, and this is a really important concept that goes
Speaker:through almost everything that we teach, and that is that you, meaning
Speaker:the backup admin, the sysadmin in charge of backups, whoever you happen
Speaker:to be, you do not determine the RTO.
Speaker:Right.
Speaker:The business unit determines the RTO or whatever, whatever term is appropriate at
Speaker:your governmental entity or NGO, right?
Speaker:Um, that is the, the entity that determines, uh, the the
Speaker:recovery time objective because it's based on finances, right?
Speaker:It's based on, uh, you know, like if it's a business, it's based on
Speaker:how much money are we going to lose.
Speaker:While we are down, right?
Speaker:Um, if it's a governmental organization, it's based, it,
Speaker:it's very different, right?
Speaker:It, it, it's more along the lines of how much damage to our organization
Speaker:like reputationally will happen based on how long we're down.
Speaker:And also how much more difficult will it be to redo the things that
Speaker:we, you know, to, to do the things we had to do while we were down.
Speaker:Uh, you might have to switch to, you know, to paper in the meantime.
Speaker:And, uh, so you, but, but the point is, all of these calculations are
Speaker:things that the, the business or management should be doing, not
Speaker:those, uh, in charge of backups.
Speaker:Uh, what role do you think the, the, the backup people play in determining,
Speaker:uh, recovery time objective?
Speaker:I think it is basically to figure out, okay, based on what the business has asked
Speaker:for, say they come back and say, okay, my Reto recovery time objective is one
Speaker:day.
Speaker:Based on that, here are some options that we can do technology wise,
Speaker:and I think their goal is to come back and say, okay, here's how much
Speaker:it will cost you if you want to support that recovery time objective
Speaker:Yeah.
Speaker:And, and you know, in the very beginning this is gonna be ballpark numbers, right?
Speaker:Um, well the first thing I would say is you come back and you go, okay,
Speaker:you've asked for A, we do B, right?
Speaker:We do a times four.
Speaker:Um, right.
Speaker:So, well, let let me ask you this.
Speaker:Why do you think, uh, I have, I have my opinions, uh, I'm
Speaker:curious, why do you think.
Speaker:Most organizations, if they have an RTO or even if it's poorly documented,
Speaker:et cetera, et cetera, et cetera.
Speaker:If we haven't agreed upon RTO, why are most organizations
Speaker:completely unable to meet that RTO?
Speaker:Well, the biggest thing is they probably haven't tested.
Speaker:To understand like is it actually like, and that's why I said like when you
Speaker:asked the definition of RTO, right?
Speaker:It's your desire, it's your objective.
Speaker:It doesn't mean what you will
Speaker:actually hit because there are so many other things involved.
Speaker:Like we talked about.
Speaker:Maybe part of your RTO is just bringing back the data or the
Speaker:application, but then what about.
Speaker:Like making sure I'm able to procure the servers to recover or get those
Speaker:up and running.
Speaker:Uh,
Speaker:maybe I,
Speaker:need to bring up active directory or Intra or whatever it's called
Speaker:now, whatever Microsoft calls
Speaker:it's, it was rebranded while we were on this recording.
Speaker:Yeah.
Speaker:right?
Speaker:But all of these other things, which maybe you don't necessarily have control
Speaker:of, and maybe you're only thinking as a backup admin of, Hey, I need to recover
Speaker:the application or just the data, restore
Speaker:Well, and also in addition, and again, let, let's make sure that
Speaker:we, we, we say that the recovery time objective has been met or not
Speaker:met when you, when the application.
Speaker:Is fully up and running and available for use by the user, right?
Speaker:It's not, oh, well I did my restore.
Speaker:We got a four hour R-T-O-I-I did my restore and it only took four hours.
Speaker:No, the question is, is the application back up and running?
Speaker:And that includes it, like I said, any hardware, uh, procurement, which, which
Speaker:hopefully you're doing in the cloud.
Speaker:But any hardware procurement, any.
Speaker:Stuff you gotta do.
Speaker:And if we're talking about ransomware, all of the, the, the stuff you gotta
Speaker:do to make sure that the, the server is ready to, uh, be restored and the
Speaker:restore, depending on what type of thing you're recovering from, the
Speaker:actual restore may be the smallest part of the, uh, recovery time.
Speaker:Actual is the term that we use.
Speaker:Based on your sort of consulting,
Speaker:Yeah.
Speaker:Mm-hmm.
Speaker:this right, what would you estimate is that ratio between time to
Speaker:actually restore the data or the application versus the end-to-end?
Speaker:RTO
Speaker:and I?
Speaker:I'm
Speaker:Yeah.
Speaker:If, if we're not talking about ransomware, it's like 80 20.
Speaker:Right.
Speaker:Uh, meaning 80% of the time spent doing the ba the Restore, the other 20%
Speaker:in a modern day scenario where we're probably, uh, gonna do this in the
Speaker:cloud so that we can, you know, snap our fingers and we have the hardware that
Speaker:we need, uh, we're doing the Restore.
Speaker:It's well, you know, well tested, uh, although often it is not right.
Speaker:And then there's some amount of time to do some initial functionality
Speaker:testing to make sure that all the dependencies have been met.
Speaker:And then, um, you know, and then we're, we're ready to roll.
Speaker:Right?
Speaker:So I'd say it's like 80 20 in a, in a ransomware scenario,
Speaker:it's like, you know, 10 90,
Speaker:Yeah,
Speaker:right?
Speaker:yeah,
Speaker:gonna spend most of your time making sure that you're recovering
Speaker:to a, uh, pristine environment.
Speaker:yeah.
Speaker:And I think that's one thing you just touched upon in your previous statement,
Speaker:which was like, sometimes things change.
Speaker:And we are talking and sort of like understanding why do most people's
Speaker:RTOs not meet what is expected?
Speaker:Do you wanna touch on some
Speaker:Well,
Speaker:I know
Speaker:yeah, so I'm gonna say the number one reason is that they simply don't have a
Speaker:backup or disaster recovery system that is capable of meeting that RTO just period.
Speaker:Hmm.
Speaker:They didn't do, uh, and, and this is.
Speaker:This is quite possibly they, I, I remember when, when I, you know,
Speaker:back, go back 30 years, right?
Speaker:That, that we knew we abs everyone knew that our backup system wasn't
Speaker:anywhere near capable of meeting the RTOs that we had discussed.
Speaker:Even though we didn't use that term back then, I, I'm sure the term was
Speaker:available, but I didn't use it and the.
Speaker:We just knew that it, it just wasn't, wasn't possible.
Speaker:Right.
Speaker:Um, I mean, in some cases it was laughably impossible, right?
Speaker:Um, and that we had servers that it took us a week.
Speaker:It took, took us a week to get a full backup.
Speaker:Okay.
Speaker:Like, how are we gonna meet a four hour R-P-R-T-O if it takes
Speaker:a week to do a full backup?
Speaker:Right?
Speaker:Uh, and by the way, our next episode we're gonna talk about
Speaker:RPO recovery point objective.
Speaker:So it's a, it's very much a sister episode to this, but that, I'd say
Speaker:that's the number one reason is that people's backup systems, and, and again,
Speaker:I used the term backup very broadly.
Speaker:Anything that.
Speaker:That brings the server back to the way it looked, you know, before the
Speaker:disaster is a backup system to me.
Speaker:There.
Speaker:Well, you just use 'em for different purposes, disaster recovery, et cetera.
Speaker:Um, but that's the number one reason.
Speaker:The other reason that's very closely related to that is
Speaker:that they have no idea, right?
Speaker:They, they haven't tested right.
Speaker:They, they've got a system, they've got a clue.
Speaker:Right.
Speaker:And they're like, oh, well it takes us, you know, three hours
Speaker:to, or four hours to back up.
Speaker:Therefore, we should be able to do a four hour restore.
Speaker:There's a lot of ifs in that, right?
Speaker:Yeah.
Speaker:the, the other thing is a, as you know, restores often a
Speaker:lot slower than backup, right?
Speaker:For a number of reasons that, you know, are all over the
Speaker:place that, that they, they
Speaker:the
Speaker:ahead.
Speaker:is like incremental
Speaker:Yeah.
Speaker:Yeah.
Speaker:Forever incremental.
Speaker:Right.
Speaker:That I would probably say is the biggest
Speaker:Yeah.
Speaker:That you're, that you're piecing together a restore from many, many, many stuff.
Speaker:Uh, I, I think if you're, if you, if you have a proper design that alone
Speaker:shouldn't, um, you know, impact you.
Speaker:If you are doing a, a, you know, sort of the old school full restore, followed
Speaker:by each incremental restore, and that means you're actually restoring some
Speaker:files multiple times, then Absolutely.
Speaker:Right.
Speaker:If you're doing, if you, if you have a, if you have a. A system that is
Speaker:properly de uh, developed, right?
Speaker:That fixes that issue where if we know a file has changed, then we're not
Speaker:gonna restore that file multiple times.
Speaker:We're just gonna restore the latest version of the file.
Speaker:If you have that, that's not really the problem.
Speaker:But you do have the issue of ddu, right?
Speaker:You have the, the DDU tax that quite often really rears its ugly
Speaker:head when we go to do a restore.
Speaker:Now why would that be?
Speaker:Why would that be the case Prasanna?
Speaker:Because when you're dup Deduplicating data, throwing away a whole
Speaker:Mm-hmm.
Speaker:But the problem is when you need to read it, you're basically doing random reads
Speaker:across the entire system in order to be able to recreate that single file.
Speaker:Because you might have old blocks from one part of the disc and a
Speaker:different part of the file from a different part of the disc.
Speaker:And so you end up with all these random reads, which as we know, our disc
Speaker:drives are not very good at doing random
Speaker:Yeah, it, it's the, it's the ultimate fragmented file system, right?
Speaker:Uh, you are just, you're absolutely guaranteeing that everything
Speaker:you need is everywhere, right?
Speaker:Um, and, um, the, that, that is absolutely one of the cases.
Speaker:And, and if we're coming from tape.
Speaker:Right.
Speaker:Uh, which is probably less likely for most people.
Speaker:But if we're coming from tape, then we really do start talking about that,
Speaker:the, the, the forever incremental stuff and, uh, you know, because you're having
Speaker:to load all these tapes, uh, but also there's a network can get in the way.
Speaker:There's also, depending on what.
Speaker:Raid, uh, we're using, right?
Speaker:If we're using RAID and, and we're using raid, right?
Speaker:Everybody's using raid.
Speaker:Yeah,
Speaker:depending on whether or not you opted for, does anybody
Speaker:wrap up for RAID 10 these days?
Speaker:I don't know.
Speaker:I
Speaker:I don't think so.
Speaker:I think everybody does raid six, right?
Speaker:Or, or, or something.
Speaker:So raid, dual parody or whatever, and that has a right penalty, right?
Speaker:So for a number of reasons, restores are often slower than
Speaker:backup and you will never know.
Speaker:Until you do what Prasanna.
Speaker:You
Speaker:Exactly.
Speaker:And again, go ahead.
Speaker:Well, I'm gonna bring, I'm gonna bring out a story.
Speaker:Sorry.
Speaker:Uh, going,
Speaker:okay.
Speaker:going back to going back to my first, you know, the first time that things were
Speaker:really, really bad was that time when we had a new backup system and we had
Speaker:used a compression feature on, on the way in, and it was software compression.
Speaker:And long story short, when we went to, uh, there, there was a. There
Speaker:was the, um, we went to do the first major restore after we needed it.
Speaker:Right?
Speaker:We, we didn't test restores, we only tested backups.
Speaker:And uh, when we went to do it, uh, the, it was a DD, s and it was like,
Speaker:blink, blink, long pause, right?
Speaker:Blink.
Speaker:Blink.
Speaker:And once we called into support and they were like, yeah, it's working as design.
Speaker:And basically we had not.
Speaker:Tested this at all.
Speaker:And not only was it slow, it wouldn't work.
Speaker:It, it was just literally without going, without taking too much time,
Speaker:it just literally wouldn't work.
Speaker:Right.
Speaker:And, uh, unless we like tripled the size of ram or something.
Speaker:Right.
Speaker:And, um, so yeah, you, you just do not know how your system is going to
Speaker:perform until you go to do a restore.
Speaker:Yeah.
Speaker:And.
Speaker:One thing similar to that story is you should also do a realistic restore test.
Speaker:Don't just be like, oh, I'm gonna just restore a file, or
Speaker:I'm just gonna restore a vm.
Speaker:I'm good
Speaker:Yeah.
Speaker:Right?
Speaker:Because that may not be a realistic scenario for when you have to
Speaker:recover a full application suite or your entire environment.
Speaker:So make sure you're doing the right little type of
Speaker:Yeah, absolutely.
Speaker:It should,
Speaker:any
Speaker:it should be whatever the, whatever the thing is that we're
Speaker:setting the RTO for, right?
Speaker:Uh, you don't have restore the entire environment, but you need to do
Speaker:representative restore tests, right?
Speaker:Um, entire servers, entire environments, entire recovery groups.
Speaker:What's a recovery group Prasanna?
Speaker:It is a group of things you need to restore in order for
Speaker:your application to come back.
Speaker:So it might be your database server plus your storage, plus your active
Speaker:directory or uh, system, right?
Speaker:Plus whatever else is needed in order to get that production application back
Speaker:exactly.
Speaker:And so, and, and so that becomes important too.
Speaker:I was actually gonna comment on that, because there's an order
Speaker:of operations you have to do, and so you have to account for that.
Speaker:When you calculate your RTO, it's not like, oh, I can just restore my, uh,
Speaker:database, have it up and running before I have active directory up and running.
Speaker:It's not
Speaker:Right.
Speaker:And, and by the way, that one, one of the things that prompted this episode,
Speaker:uh, Kaseya did a 2025 state of the backup industry, uh, and they said that more
Speaker:than 60% of respondents believed that they could recover under, in, under a day.
Speaker:However, only 35% could actually do that in reality, which is, that's
Speaker:quite, that's quite a, a gap there.
Speaker:Um.
Speaker:Yeah.
Speaker:Another interesting thing was that only 10% of businesses reported
Speaker:no outages in the last 12 months.
Speaker:Uh, which means that 90% tested their backup systems the hard way.
Speaker:Uh, not quite as hard as, uh, our Alaskan friend, but, uh, which for those of
Speaker:you that haven't heard that episode, uh, he tested DR system by deleting
Speaker:the entire surfer for the entire.
Speaker:Data center and then restoring, and that was his first test.
Speaker:And it was like, gee, I hope it works.
Speaker:Don't do it like that.
Speaker:Um,
Speaker:but hey, it worked out
Speaker:yeah, exactly right.
Speaker:Um, and remember, again, going back to the things that fit into the RTO, right?
Speaker:You know, you, you also have to, to include things like
Speaker:detecting that there's a problem.
Speaker:Right, because the RTO clock starts the moment the outage happens, not
Speaker:the moment the restore happens, right?
Speaker:So, uh, the moment you have the outage and then you're like, what's going on?
Speaker:Right?
Speaker:Because so many times the, the symptom that gets your attention has nothing to
Speaker:do with the thing that actually went bad.
Speaker:Right.
Speaker:Uh, I mean, it does have something to do with it, but it's not
Speaker:the thing that went bad, right?
Speaker:So you gotta figure that out.
Speaker:You gotta understand how bad it is if it's a ransomware attack.
Speaker:Again, you gotta figure out, you know, how bad this, you know, how big the scope is.
Speaker:You might have to get approvals, um, you know, all these different things, right?
Speaker:Yeah.
Speaker:And well, just one thing to add to that, because I was thinking
Speaker:about the, uh, what was the.
Speaker:Company,
Speaker:Rackspace.
Speaker:with their hosted exchange.
Speaker:Right.
Speaker:I think one of the things to also consider.
Speaker:Uh, when you're thinking about RTO is order to bring my app back up and
Speaker:running, do I need to restore all my data?
Speaker:an example?
Speaker:Maybe I only need a subset of my data in order for my application to come up
Speaker:and I can solely backfill all my old data that's archived or other things
Speaker:like that, I can still get people up and running and ready to go without
Speaker:waiting for everything to be done.
Speaker:And so there might also be slight nuances depending on the application
Speaker:of what the expectations are.
Speaker:Yeah.
Speaker:The other thing I would say regarding that Rackspace outage, if you're in
Speaker:the middle of your recovery or you're about to begin your recovery, don't
Speaker:change all the rules, right In, in their case, they're like, we tested how to
Speaker:do this recovery, but you know, just before they went to do the recovery,
Speaker:they're like, ah, what if we just move everything over to Microsoft 365?
Speaker:Right?
Speaker:And it's like, oh, well that would mean that we have to like.
Speaker:Basically you, you can't, you can no longer restore the exchange
Speaker:databases directly into the user.
Speaker:Uh, you have to, um, you'll have to restore it and then migrate
Speaker:the data over individually, which is a much bigger process.
Speaker:Much, it's gonna just take much, much longer, and it ended up taking months.
Speaker:You may recall, and there was a, uh, some lawsuits regarding that.
Speaker:So make sure that whatever scenario in which you do, uh, you do the testing,
Speaker:you, you, you have to do the testing.
Speaker:So, um,
Speaker:Yep.
Speaker:I know we talked earlier about, okay, that 24 hour RTO for some businesses,
Speaker:but there are some industries, right?
Speaker:Where even like seconds make a big difference, right?
Speaker:Yeah, definitely.
Speaker:Yeah, definitely like financial trading firms, banking organizations, the more you
Speaker:can attach a real number when you can say one hour of downtime costs us this much.
Speaker:If, if you can do that, if the business can do that.
Speaker:One, $1 billion.
Speaker:Um, yeah, I'm sorry, I, I gotta do the, the pinky, right?
Speaker:Um, if you can do that, the more you can do that, the, the, the, the
Speaker:much more equipped you will be as a, you know, backup and dr. Person.
Speaker:To be able to make enhancements to the backup and recovery system if needed.
Speaker:Right.
Speaker:So let's talk about, uh, some of the things that you
Speaker:could do to close this gap.
Speaker:Obviously, the first, the first thing is if you can have an
Speaker:iterative discussion on, uh, okay.
Speaker:You said you want one minute, we can do 10 hours.
Speaker:Right.
Speaker:Let's figure out, you know, let's get the, let's get the RTO set to somewhere near.
Speaker:Um, you know, uh, realistic that we, that we can actually meet, right?
Speaker:And you can, you can say, we're gonna set the RTO for now at this.
Speaker:We're gonna move towards, uh, a better RTO at a later, a later time.
Speaker:Um, any thoughts there?
Speaker:Yeah, no, I think that makes sense because it also takes time to implement
Speaker:new technologies because if, say for instance, your RTO is 10 hours based
Speaker:on your existing infrastructure, and now they're like, oh, we need
Speaker:an hour or 10 minutes, right?
Speaker:You're now going to need to think of something very different that's
Speaker:gonna elongate the time it takes, and so it really is important to
Speaker:ask the question, do you need.
Speaker:Yeah.
Speaker:Yeah.
Speaker:The help, just,
Speaker:start with
Speaker:yeah, everybody's gonna say zero and zero for your RTO and RPO, right?
Speaker:So it is just, you gotta justify it and you gotta say, well, if it's
Speaker:really worth $10 million every minute.
Speaker:Then you need to give us, you know, whatever the number is.
Speaker:Right.
Speaker:Um, so then if we're gonna do testing, if we can automate that testing, the more we
Speaker:can automate testing, the more the, you know, the better that things are gonna be.
Speaker:Doing it very regularly, doing it small, it's sort of like, uh, the
Speaker:same as opinions that I have on testing your, it's kind of like in
Speaker:cybersecurity where you have a company that actively tries to send phishing.
Speaker:You know, phishing tests to the users to see if, um, to
Speaker:see if they fall for it, right?
Speaker:Same thing here, where over there, more frequent, smaller bite-sized testing is
Speaker:preferred to the once a year I have to do this and it takes two hours, right?
Speaker:Keeping it on the mind, keeping a recovery mindset is really important.
Speaker:So I think regular Dr. Drills is part of that.
Speaker:And I think having the regular DR drills is important because if something
Speaker:changes in your environment, you
Speaker:Yeah,
Speaker:rather than sort of that
Speaker:exactly.
Speaker:And then there's also the concept of chaos engineering, um, whi, which, you
Speaker:know, like the chaos monkey, right?
Speaker:You wanna talk about that?
Speaker:Yep.
Speaker:Yep, yep, yep.
Speaker:Yeah.
Speaker:So.
Speaker:just try breaking things in your environment, see what happens and
Speaker:see did I miss something that I wasn't backing up as an example.
Speaker:Maybe you forgot a backup active directory, and now in order and something
Speaker:happened to it, you lost all the data there and you realize, oh, I can't recover
Speaker:my application because I don't actually have a backup of active directory.
Speaker:And so you start to understand the dependencies in your
Speaker:environment and point out sort of.
Speaker:Issues that you might not foresee different failure scenarios or like if
Speaker:the network goes down or The other thing is, it's not even just technology, right?
Speaker:It might be even a person.
Speaker:I know Curtis, you used to talk about at the bank doing
Speaker:testing with someone who did not
Speaker:Yeah, exactly.
Speaker:Exactly.
Speaker:Uh, well back then, I didn't write the book, but Yeah.
Speaker:Yeah.
Speaker:I wa I wasn't Mr. Backup yet.
Speaker:I wa I was, I was Mr. Backup junior.
Speaker:Um, and, and then, you know, the idea is if, if you do this, the whole, the
Speaker:whole idea of doing this on a frequent basis to get better at it, to create
Speaker:and, and, and improve your runbooks, to create and improve decision trees.
Speaker:What do we do when this happens?
Speaker:Right?
Speaker:Um, we also, we didn't talk, uh, at all about, um, uh, tabletop exercises.
Speaker:Those are, uh, obviously a great, uh, uh, tool here.
Speaker:Uh, you know, do 'em at lunch.
Speaker:Do 'em so that they're not like, so again, do them frequently in smaller.
Speaker:We're not where the whole world isn't, you know, don't do 'em like just before
Speaker:your, your performance review time.
Speaker:Which makes them like much more stressful.
Speaker:Do them frequently and, and, and have fun at it.
Speaker:And, and then learn from it and improve your runbooks,
Speaker:improve your decision trees.
Speaker:Cross train your teams.
Speaker:Do what we were talking about before.
Speaker:Don't use the, don't rely on one person.
Speaker:Uh, you know, you know, because that one person might not be available.
Speaker:Uh.
Speaker:You know, uh, at that time, right?
Speaker:And then measure, again, measure and report reality.
Speaker:Here's where we are.
Speaker:Make sure that everyone is on the same page.
Speaker:We've asked for this, we've agreed to this for now, we would like to get to here.
Speaker:Here's where we are.
Speaker:Report those gaps.
Speaker:Uh, and then let, let business leadership decide.
Speaker:What to do about that.
Speaker:It is not your responsibility.
Speaker:Right.
Speaker:I do remember like
Speaker:Yep.
Speaker:bad because the backup system wasn't capable, but it's like, I'm not magic.
Speaker:All I can do is make recommendations.
Speaker:Right.
Speaker:And I do remember, by the way, I do remember when I, and I had a shell
Speaker:script that was doing everything right.
Speaker:We had like, I dunno, like 50 servers and I was doing all this with like a, a Unix.
Speaker:You know, shell script.
Speaker:Right.
Speaker:And at some point I couldn't, but, and all of it was based on that each
Speaker:server could fit on a tape drive.
Speaker:And then one day we bought a server that it didn't fit on.
Speaker:50 tape drives, right?
Speaker:On 50 tapes.
Speaker:And, and it, it just, that and other servers that weren't
Speaker:quite that bad, it just broke.
Speaker:It broke my ability to do it right.
Speaker:And I said, I'm just not, I can't do that.
Speaker:And then, and I just went to the boss and I said, Hey, I can't do this.
Speaker:And she said, well, aren't there like commercial products that do this,
Speaker:that we can like spend money on?
Speaker:Oh, because you flipping
Speaker:I was, I was flipping, I was flipping out what I was doing,
Speaker:and I, I remember feeling like a failure because I couldn't fix this.
Speaker:Right.
Speaker:I wasn't that good at scripting.
Speaker:I, I, I don't think anybody could deal with the 50, you know, the right.
Speaker:But I, I remember at the time feeling like a failure, and I guess I'm
Speaker:saying don't try not to feel that way.
Speaker:Right?
Speaker:Go and give an honest assessment of where you're at.
Speaker:And, um, even if you're the one that put you in that scenario, right.
Speaker:Um, I, I remember another story of a guy that told me that he bought a
Speaker:particular vendor's DDU product that had a 90% DDU tax, meaning that the restore
Speaker:speed was 10% of the backup speed.
Speaker:And, and he's like, I don't know what to do.
Speaker:I'm like, well, you have to tell your boss.
Speaker:And he's like, I'm the one that recommended the system.
Speaker:It's okay.
Speaker:You gotta be honest.
Speaker:You get, because you can't get there from, you can't get there from here
Speaker:if you, if you don't address that.
Speaker:Right?
Speaker:Yeah, so one question I wanted
Speaker:Yeah.
Speaker:Curtis.
Speaker:There is a term though, right?
Speaker:So you
Speaker:Mm-hmm.
Speaker:and then you have like, okay, you're doing these tests, you
Speaker:actually figure out like, okay,
Speaker:Yeah.
Speaker:long it takes.
Speaker:There's a term for that though, and I don't think it's
Speaker:No, it's nowhere near.
Speaker:Yeah,
Speaker:it's
Speaker:nowhere near as As.
Speaker:Yeah, thanks.
Speaker:Nowhere near as widely used as RTO.
Speaker:Right?
Speaker:And it's RTA recovery time actual.
Speaker:Some people say recovery time reality, it doesn't matter.
Speaker:Just have a different term.
Speaker:Don't say.
Speaker:Our RTO is an hour when what you're saying is this is how fast you can
Speaker:recover your RTO is your objective.
Speaker:Your other thing, I don't care what you call it, recovery time actual is good.
Speaker:This is where we are.
Speaker:The difference between your recovery time actual and your recovery.
Speaker:Time objective is the gap that you need to address with whatever changes in
Speaker:process, documentation, or quite possibly enhancements to your backup system.
Speaker:Yeah.
Speaker:All right.
Speaker:I think, I think we've covered enough.
Speaker:And then next, next week we're gonna recover recovery
Speaker:point objective, which is.
Speaker:It It is.
Speaker:Yeah.
Speaker:It's weird.
Speaker:Like all the, yeah.
Speaker:Uh, and this is going to be basically how much data we agree that we
Speaker:can lose, which is something very different than how long the system is.
Speaker:Yeah, exactly.
Speaker:We should, we should recover in zero minutes and we should lose zero data.
Speaker:We all agree.
Speaker:That would be amazing.
Speaker:Uh, it's also not gonna happen.
Speaker:Well, thanks for, uh, thanks for joining me again.
Speaker:I am.
Speaker:Enjoy these.
Speaker:I
Speaker:Yeah, I'm glad.
Speaker:Yeah.
Speaker:more.
Speaker:I think now, now that we, you've figured out your world and I figured out my
Speaker:new world, uh, we, we should be good.
Speaker:And, uh, thanks to the listeners you're, why we do this.
Speaker:Uh, that is a wrap.