Last week, in Part I of my series on Paranoid DBA Practices, we learned that our jobs are somewhat unforgiving and we do make a mistake from time to time. This week, we will discuss what can we do to reduce the chance of an error occurring.
Poka-Yoke for DBAs!
I am a big proponent of Poka-Yoke”. Poka-Yoke is a Japanese term that means “fail-safing” or “mistake- proofing”. Wikipedia’s definition of Poka-Yoke is: “its purpose is to eliminate product defects by preventing, correcting or drawing attention to human errors as they occur.”
Since I’m a car nut, here’s a couple of automotive Poka-Yoke examples. You can’t take the keys out of most modern cars until the car is in park. In addition, most cars won’t allow you to shift out of park until the key is in the "ON" position. How about gas caps that have the little tether that prevents us from driving off without the cap? Most gas caps are also attached using a ratchet assembly that ensures proper tightness and prevents over tightening.
Take a look around you, you’ll see dozens of Poka-Yokes during your daily activities:
- The little holes in bathroom sinks that prevent overflows
- Microwaves will stop when the door is opened
- Same thing with dryer doors
- Lawn movers that have a safety bar that must be depressed before they will run
- Disk brakes that begin to make a noise before they are completely ground down
- Rumble strips on roads
The list really is endless. At RDX, we have checklists, process documentation, best practices, sign-off sheets – the works. Here's some general ones that I recommend.
The Second Set of Eyes
As I have stated in previous blogs, I have over 20 years of experience using Oracle and have done my fair share of database recoveries. During my career as an Oracle instructor, I have assisted in hundreds of database recoveries in Oracle's classroom environments. If I had access to a fellow DBA, I would still have them review my recovery strategy and recovery steps before I began the recovery process. I used backup and recovery just as an example. Whatever the process is you are performing, a second opinion may prevent you from making a mistake. A review from a fellow DBA will save you. I may be described as having an ego (I have no idea where they get that opinion) but it doesn't prevent me from asking for help from others.
This is a standard best practice here at RDX. Any critical activity, like monitoring installs for example, have detailed checklists that are signed off by the installer, complete with screenshots of configuration parameters. QA personnel are responsible for reviewing the checklist and then signing off that the review is complete.
A few years before I joined RDX, I used to work for a shop that subscribed to the "everybody in one big room" philosophy. I guess it was supposed to allow everyone to work together as a team and become "at one with each other". It may have achieved that purpose but it sure didn't allow you to concentrate on your work very well. You could hear so many different conversations they had to pump in white noise. The constant 'whhhsssssshhhssshhh" made me feel like I was a crewmember of the Starship Enterprise.
Like all DBA units, our particular area was often populated with various developers and O/S technicians. Many different conversations were occurring, some that could be described as "animated". The environment did not allow you to concentrate on the task at hand. We often had to go into small conference rooms to work on critical tasks.
The point I'm trying to make is that no matter what type of environment you work in; if you can concentrate OK, but if you can’t, find a spot where you can. Block off some time, send questions to other DBAs and concentrate on the task at hand. Don't attempt to answer questions and code a complex script at the same time. The more complex and critical the activity is, the less multi-tasking should be done. If you don’t, it's a recipe for a problem. Once you are done, follow rule number one and have someone review your work. Another best practice in work at RDX.
What Database Are You Working IN?
Working in the wrong database is a common problem for database experts as well as their less experienced counterparts. How many times have YOU found yourself running statements in the wrong environment? Each database has commands to show you the database you are connected to. Do yourself a favor - USE THEM.
At RDX, we have an entire library of customer specific repeatable processes. Since we are a remote services provider, documentation is critical. Repetition is the foundation for a high quality support environment. If the scripts and administrative processes worked correctly the first time, chances are they will continue to work correctly in the future. We have dedicated repeatable SOP (Standard Operating Procedure) templates.
Automating and documenting complex administrative processes such as production to decision support database refreshes and application upgrade activities will allow future iterations of these activities to be executed more quickly and with less errors. As you continue reading my blogs, you'll understand the importance I place on documentation. Here at RDX, we have built our entire foundation of customer support on documentation and database support best practices.
Have you ever tried to refresh an ERP application test environment from production when that test environment didn't have enough space to hold all of production's data? 4,000 steps later and you begin to second-guess your choice of professions. The more complex the process is, the greater the need for detailed documentation becomes.
The moral of this story is: If you don't want to be the only one that can perform that 900 step ERP Application production to test refresh, script it and then document it.
RDX has also implemented SOP (Standard Operating Procedure) checklists for all critical or complex procedures we perform. From monitoring installations to DBA onboarding training, checklists are used for all key activities. Designated personnel review the completed checklists and inspect the work on a regular basis to ensure quality.
Saving Time VS Creating a Problem
At one my previous employers, I once watched an onsite consultant perform a rather complex set of administrative tasks to solve a problem. He was rapidly flipping back and forth between at least 15 active screens, copying and pasting and editing and copying and pasting and editing… I describe this particular activity as "Multiple Screen Syndrome". He also had several other screens open that were connected to other databases. He was multi-tasking to its highest degree. Take a break, take a breath and look at what you are doing.
How about the rm -r /u0*/ora*/prod*/*/*.* command in UNIX? It's the command that drops multiple databases in multiple directories. All in one painful swoop. How many times have you heard of a mistake caused by commands like this causing mass mayhem? When you make a mistake like this, you become immortalized in conversations for years to come. Get a few technicians together after work and ultimately the conversation will include "remember when Bob so-and-so ran that command by mistake and wiped out the entire O/S on our production web server?” You can't tell me you haven't heard stories like this.
As someone who has a lot of experience in this profession, I would rather you take your time than showcase your multi-tasking and time saving skills. The more complex and critical the activity, the more basic you should become in your plan of attack. Trust me when I say I won't be impressed with your time savings "cut and paste" and wildcard expertise if I think it can even remotely be dangerous.
Safety First Mindset
I once saw a DBA log in to a production database using a particular schema account. He then logged into a different database using an account and dropped the schema with the same name he was logged into on the first database. I asked him why he logged into the first database using the schema account he just dropped in the second. He stated "the database product won't let you drop a schema that has someone connected to it. No matter what happens after this, I'm positive that I won't drop the user in this database by mistake." I like that Safety First mindset in a DBA.
You need to think Safety First when you are performing any particular complex or critical activity. Take the time and put one or two safeguards in place like the DBA did when he dropped the user.
Other DBAs may call you paranoid, I'll call you an experienced DBA that would rather be safe than sorry.
The intent of this blog was to not provide you with a laundry list of recommendations. It was intended to help jumpstart your creative juices to think about different methods to protect yourself against problems.
Thanks for Reading,
Director Of Service Delivery