Every professional that work with databases such as a programmer, a DBA, a system administrator or an analyst should have the ability to solve unexpected problems that appears during a normal day at work. This ability, also known as troubleshooting, is very important to maintain the computing system and applications up and running in order to support the enterprise business.
Based in this scenario, this article presents a discussion regarding the attitudes that can be taken when an unexpected problem arises in the database during normal daily operations. This discussion is presented in the format of questions and possible alternatives that should be chosen in order to solve the proposed problematic situation. The alternatives available focus on operational aspects, behavioral actions and technical procedures needed to solve the issue.
To know the best option among many in problematic situation is one of most important and appreciated skills that IT professionals should have specially when working with databases and mission critical applications. The benefits of having these skills include the trustworthy of the professional, the gain of advanced technical knowledge, the feasibility to justify the purchase of new resources and also autonomy to work alone without constant supervision.
In this second part of the article more problematic situations are presented. Next, we are going to discuss some the situations, the alternatives presented and talk about what is the best attitudes given the context in which the DBA must handle the unexpected problems.
Situation 6) After participated in a two hours long and boring meeting about the migration of systems that don't even use databases there is an urgent note on top of the DBA desk stating the following:
The development team finished with success the application of a patch in an application component and, as always, we restarted the servers (including the database server). However, without any modification being made in the database by the development team the Oracle now is reporting the error ORA-12560: TNS: protocol adapter error. We demand the fix for this problem before 14:00 since in this time we are going to demonstrate the new features of the application to the board of directors and the president and if we did not please this public we could expect several layoffs in the IT department.
The development team"
After taking a deep breath, cancel the lunch plans and try to keep calm what should the DBA do?
a) Complain with the development team arguing that they should not restarted the database server without the advice of the DBA first.
b) Go complain with the development manager about the lack of procedures, professionalism of the team and the disrespect about the DBA's effort to keep all the databases up and running.
c) Make sure that the Oracle server is started and it is possible to make a local connection with the server. Take a look at the connection string used by the application, check the physical connection between the clients and the server, look for the IP settings, view the content of the ORACLE_HOME and ORACLE_SID environment variables and finally verify if the database status is NOMOUNT, MOUNT or UPGRADE.
d) Try to keep calm again and go talk with the development team in order to understand what steps they make and which actions were taken to restart the database server. Invite the developers to solve together with the DBA this problem following each step taken by the development team.
Situation 7) After migrate the database of a legacy system to a new one the DBA made a mistake: he/she executed an UPDATE statement with a WHERE clause too broad. This problem in the statement changed more data than it should and these modifications were not noted by the DBA neither by the developers. After one week of problems and corrections in the style of a tree stooges comedy episode, the problem was solved. Now the DBA and the two main developers involved must participate in a meeting where there is a good possibility that someone take the blame and be fired. What you, as the DBA, say in this type of meeting knowing that the mistakes were made by all the staff?
a) Start the meeting saying: "let's not look for anyone to blame".
b) Assume your portion in the problem and transfer the responsibility to all the developers that did not test the application.
c) Never admit that you made a mistake, state that everything that happen was because you was working too much and finish saying that more time is needed to find out what really happen. In other words: try to be the innocent guy.
d) Recognize your mistakes but do not exaggerate in the regrets or the flaws. Makes clear that it was not an intentional mistake that will not happen again because you already took measures to fix what was wrong and what was needed to avoid the repetition of this type of problem.
Situation 8) One of the developers needs to create a procedure that exports data of a specific table automatically. Without asking for help from the DBA, this developer used a procedure that send to the operating system commands that check if a folder exist, delete this folder if it exists and recreate the folder with the exported files. This approach worked well in the development and test environments but in the production environment there were a different folder structure. So, the execution of the stored procedure in the production environment deleted some important operating system files that comprised the server since now it is not possible to login local or remotely. And now DBA, what should you do?
a) Take both hands in your head and thinks with yourself: "This is going to be a long long day. I knew I should stay in bed this morning..."
b) Transfer all the blame to the developer and assign to him/her the task to reinstall the operating system, the database and all the configurations in order to punish him/her.
c) Warn the development and the infrastructure manager about what happen and coordinates with the IT staff which is the best way to solve this situation taking into account the resources needed and the downtime required by a complete reinstallation of the server.
d) Face the developer and ask in a serious tone: "Is this some king of April's fool joke?"
Situation 9) The development manager hired a junior DBA in order to help the senior DBA in the simplest and repetitive tasks. After a brief explanation of the database environment to the new employee the senior DBA explained how a simple task should be done by following defined steps without any deviation. However, trying to cause a good impression and be noted by its skills the junior DBA changed some steps of the task: instead of running and already tested script he/she altered three statements to obtain a performance gain. When the junior DBA returned from lunch he/she talked to the senior DBA that the task was accomplished with success. The senior DBA, already worried about the change since it was a very complex script, checked the results of the task: three tables were created without primary and foreign keys with more than 2GB of data in each table (the junior deleted the constraints of the script in order to make the data import faster and forget to recreate them). What did you do now DBA?
a) Ask if the DBA junior can finish every sentence with "extra size for only more 25 cents sir?" since it is require to decorate this phrase in the new job his/her is going to.
c) Give a very guilty look at the junior DBA, change the script to the previous one and start the process to import data again without saying a single word to the junior DBA.
c) Tell to every developer that there is an error in the import process without providing further details. Say to the junior DBA to pay more attention while you fix the problem and don't say anything else. After the problem is fixed call the junior DBA to a private talk and scold him/her about the importance of following orders and, at least in the beginning, do not try to make anything new alone without talking with others.
d) Call the attention of everyone in the IT department, managers included, and states that there is a strong candidate to receive the award for employee of the month. Also note that the database will be unavailable for at least the whole day due to some new untested modification performed by the junior DBA.
Commentaries about the situations
Before the presentation of some commentaries about each situation and the possible alternatives it is important to remember that the DBA should always try not to follow its primary instincts and solve every problem as soon as they appear. This means that urgent situations, killer deadlines, technical faults and communication issues happen all the time and requires the attention of the DBA. However, the IT area evolved from the past 10 years: nowadays we have agile methodologies, test oriented development, continuous integration, pair programming and other practices that will avoid the need to stop everything that the DBA is doing in order to fix an urgent problem.
At the same pace that we evolved in the handling of unexpected situations we also can really on techniques that prevent the problems to reach critical levels. Techniques such as estimates, previsions, organization and active attitudes instead of reactive ones more and more are employed by the DBAs that try to reduce not only the occurrences of unexpected situations but also prepare the environment in such a way that the problem can be solved quickly, safely and aligned with the methodology, rules, norms, and without quick fixes and workarounds that are only temporary.
Commentaries about situation 1. The first situation presented in this article is very common: a problem that takes the database offline is detected at the beginning of work hours. The DBA should resist to the temptations presented in the a, b and c alternatives remembering that is mandatory to inform what is happening to the managers and developers instead of going to fix the problem right away. Note that the c alternative represents the professional that perform hard work but forget that technical solutions are one of the steps needed to take when the situation proposed in this scenario happen. The alternatives a and b are not recommended because in them the professional neglect to take an action.
Commentaries about situation 2: In this situation it is necessary to think first in damage control, i.e. inform the web users that the blog and part of the corporate portal are offline. However, it is not wrong to complain as it was presented in the a and b alternatives, because this type of scenario is ideal to let frustrations flourish but terrible to fix the issue. The negligent approach provided by the d alternative also should be avoided because it represent the unnecessary prolongation in of the task's execution, one of the most common corporate unethical attitude used to increase the amount of hours worked.
Commentaries about situation 3: The situation 3 shows two important facts: 1) the need for work out of normal business hours especially in cases of migrations and maintenance; and 2) The felling of revenge inside some professionals that happen when a superior employee decides to not proceed with the DBA's decision. The professional that keep this anger and revenge felling must remember that in the corporate world the right place to solve this issues is outside the front desk, i.e. not in the company.
The a alternative contains a quick fix solution: a temporary workaround that can save the day but will generate many problems in the near future if no permanent action take place. The solution presented in the b alternative never should even be considered since it shows that there is no team play and demonstrate serious revenge feelings. There is nothing wrong with the c alternative if the is an agreement stating that it is OK to contact the developer out of business hours. The d alternative is the most recommended because it first tries to notify the responsible and also create a plan B that can be used to help solving the problem as a team instead of a lone wolf approach where individual actions break the union of the team.
Commentaries about situation 4: The situation 4 is about the dialog and how a DBA should handle errors and problems caused by developers that access the database. Here is important to remember that all of us are humans that make mistakes, even the DBA. This means that we must expect mistakes from everyone. To lose the head and take actions based on very strong feelings (such as anger) will not prevent future errors because, again, we all are humans and, by definition, we will make mistakes.
The script in Listing 1 shows that the analyst started an implicit transaction which means that after the execution of the DELETE statement there is no way to return the data that was erased (only with backup). Maybe what the developer wanted to do was test the use of explicit transactions and, if this is the case, he/she should insert the command SET IMPLICIT_TRANSACTIONS ON in the beginning of the script.
Although the a alternative seems reasonable and is adopted by many experienced DBAs that are in this type of situation, act in a radical manner is seldom beneficial to everyone. The c alternative shows benevolence from the DBA and could be employed, however to show one person's error to the entire staff can be vexatory. The b alternative is the most indicated course of actions in this situation. Besides, the task to copy the data is a responsibility of the senior DBA and he/she should not punish anyone though the delegation of this responsibility. The d alternative undercover a dangerous profile: the DBA that like to make people own favors and be in debt with he/she to latter calls a favor back. This attitude of making and asking favors is another unethical work practice that must be eliminated of the corporation because this type of environment make people afraid of asking help and makes the DBA close to a 1950's gangster.
Commentaries about situation 5: The situation presented in this scenario is very common in the IT departments: a problem occurs and only after a while the issue is forwarded to the DBA and, as the Murphy’s law suggest, in the worst possible time.
It suffices to say that the neglect attitude presented in the a alternative should be avoided even if the DBA think that is not worthwhile work for the company of if the DBA is leaving to work somewhere else. While the professional holds the DBA title he/she must be aware of its duties and responsibilities and act as a professional and not like an amateur.
The b alternative is wrong because a SQL Server restore process must start with a full backup and not with a transaction log backup. The c alternative is also wrong because it is not possible to use the point in time recovery options only with a full backup. The d alternative is the right one only if it is acceptable to lose what was made in the database between 12:50 and the moment that the DBA started the recovery process. Maybe the best solution for this situation is to restore the backup in another server and try to mix the data between the two databases in order to lose the minimum amount of data.
Murphy's Law: Murphy's law is an adage or epigram that is typically stated as: "Anything that can go wrong will go wrong". The creator of this law was the American Air force captain named Edward Murphy who was the first victim of its own law. He was one of the engineers that worked in tests regarding the effects of fast deceleration of aircraft pilots.
This second part of the article presented some unexpected situations that may happen during the daily work of a DBA and discuss the problems and alternatives to solve them. The first five situations were explained with some possible attitudes that can be taken to correct them. Next, each situation was commented in order to guide the DBA behavior when an emergency happen.
It is important to remember that the DBA must resist to its primary instincts that lead to the fast solving of the problem. Each scenario presents a different situation that must be analyzed according with the rule and procedures of the company. Also, it is recommended not to panic, always focus on the problem and the circumstances and be responsible by the problem in the first place.
Each unexpected situation can improve the DBA career since know how to dealt with unexpected problems is a major skill for every professional that work with databases. Besides, the expertise obtained while solving daily problems is a characteristic appreciated by the companies and can generate good benefits such as the trust of superiors, the highlight of the professional among others, and the possibility to be a strong candidate for the next promotion regarding a leadership position in the company.