SQL Best Practices in less than 20 minutes 2/3


Uploaded by roughsealtd on 27.01.2008

Transcript:
Tried-and-true recipe number 2 consists in making existing indexes unusable.
Whenever you search a key in an index, it’s very similar to your searching a word in a
dictionary.
The database server compares what it finds in tree nodes to the key being searched and
descends the tree according to a comparison between
what is searched and the value found in the current node,
just like you move backwards and forwards in the dictionary depending
on the first and last words on a page. The word you search is like the index key.
Now, if I give you a dictionary and ask you to find the number of the page where the word
rill is located, it’s a question that is very
similar to this SQL query.
Suppose now that the dictionary is an English-French dictionary. If I ask you to find the page
where to find the word that translates to ‘ru’ in French, you are at a loss because
you can no longer browse the dictionary.
The only option left is to start reading the dictionary from A to Z and check the translation
of all words. You will never do it, but a database
server will, and this happens as soon as you apply
a function to an indexed column that you want to search.
What you will do, though, and this is what you should do in SQL too, is go to the English-French
part of the dictionary, find the translation of “ru” in English, and then use the word
found to do a regular search.
Applying functions at the wrong place is a very common mistake, particularly with dates.
Tried-and-true recipe number 3 is to hard-code everything.
All products don’t behave exactly the same in this respect, but if you want to hit Oracle
particularly hard, this is a very good idea.
When SQL queries are built by program and simple concatenation, what happens when the
value of an identifier changes? For the DBMS, it’s a
brand new query, and it goes through analysis, often called
parsing, and the optimizer recomputes an execution plan, using a lot of CPU in the process.
This is a very simple Oracle PL/SQL example to show the extent of damages.
In the first example I execute repeatedly a query by concatenating the value of the
id that identifies the row to return to the statement. It doesn’t
run for ages, but the important thing to note is that
the elapsed time is all CPU time, and parsing represents about two thirds of it.
In the second example, I execute the same query but pass the value of the identifier
as a parameter. The result is that parsing has dropped to
something negligible, and we divide run-time by a factor two
and a half for the very same process.
It is so obvious an improvement that both Oracle and Microsoft have done something to
try and fix poorly written code. You can ask the SQL engine to
force, more or less aggressively, the translation of constant
values to parameters so that queries are parsed only once.
But the problem is with the distribution of data. When your main criterion is something
like an indexed status indicator, which is usually very unevenly
distributed, you may have cases where the value that
is passed is very common and not selective at all,
and other cases where the value is very rare and very selective.
In such a case, you don’t want all constants to be replaced by parameters
because if you have statistics telling the frequency of the different values,
it is very easy to have the optimizer derailed.
It will have to choose for instance, between using an index on a date, which will be best
for common values, and using an index on the status, which is the best for infrequent values.
For many databases, the choice of the execution plan depends on the first parameter value
that is met. It can cause plan instability, and queries
that run fast in the morning and slowly in the afternoon if
the query has been reparsed with the wrong parameter in-between.
In such a case, what you really want is to have TWO different statements that will execute
differently.
Tried-and-true recipe number 4 is to abuse procedural database accesses.
In my experience, it is no longer big bad SQL queries that are the biggest performance
issue.
Missing indexes are also, for the most part, a thing of the past – actually, many tables
I come across are over-indexed.
Most performance issues today are with too many fast or moderately fast queries. If you
take a look at the following procedure for instance (found
in production), there is no reason to insert data,
then update what has just been inserted: it is very easy to insert the right data directly.
But I think that what explains the issue best is to show you week-end shopping, as I believe
it is practiced by most developers.
First they jump into their car.
Then they drive a little, until they arrive at the supermarket.
At this point there is something of an orientation run among the shelves of the various departments,
until they find what they want,
then head for the cashier,
drive back home,
store the milk into the fridge,
then start again for the next item on the shopping list.
This is exactly the way most programs use databases.
Where things become really funny, it’s when you consider the cures that are usually recommended.
First, black magic. Here you have a DBA in traditional DBA garb performing the magical
parameter dance.
Then you hear suggestions such as “we need some more powerful hardware”,
“We need more bandwidth”,
“we must tune the search query”,
or “we should try and improve I/Os”.
All of this will improve speed, somewhat.
But is it the issue, really? Well, I don’t think so.
Just keep in mind that unitary processing kills.
Tried-and-true recipe number 4.5 is to put user-functions everywhere.
Why 4.5? Because actually it’s a procedural variant.
I am going to take an example that talks to everyone: money.
Suppose we want to convert a certain amount from a foreign currency into our own currency.
We can write a function that takes the amount and the currency, then queries the table that
holds currency rates and returns the value converted to the last known exchange rate.
The query in the function is extremely fast.
If I run this query against a 2 million row table, it become awfully slow. I am cheating
a bit here, I got my result in a little over 9 minutes.
I rewrote the function, and brought the elapsed time down to one minute and sixteen seconds,
an improvement by a factor or more than seven.
How did I do it? My magic touch …
Actually, even a big international bank deals in very few currencies.
You will have the local currency, the currencies of the main trade partners,
then dollars, euros, british pounds, yens and the like.
It’s always the same values, and if you call the function to retrieve thousands of
times the exchange rate with the same currency,
you are doing a lot of useless work.
So what I did is that I used session variables as cache to store the various exchange rates
as I was referencing them, fetching each one only once, to reduce the number of calls to
the SQL engine.
If you think the result is awesome, think twice: if you use a simple join, it’s even
faster than the improved function, and it’s even
more striking with Oracle and SQL Server than with MySQL.
For the optimizer, a function is a black box and you have a lot of context switches between
the regular execution of a statement and the function
call. It’s, at a micro level, another case of
chatty application, not unlike the shopping algorithm
you have seen.