Oracle Architecture in less than 10 minutes


Uploaded by roughsealtd on 09.02.2009

Transcript:
Hello,
My name is Stephane Faroult, I am a database performance consultant.
The first Oracle database I ever installed was a beta version of Oracle 5,
an awful number of years ago.
I have published two books with O’Reilly, "The Art of SQL" and more recently "Refactoring
SQL Applications".
For me, there is a clear link between performances and the understanding of what happens inside
the database.
I’ve had the opportunity to witness that for many developers the "persistence layer"
is kind of cloudy.
I’d like to take you behind the scene, and explain without getting too much into details
how everything works.
I’ll start by explaining what happens when you start the database, something which is
most often automatically performed by scripts when a
machine boots. I’ll talk more precisely about the Unix
implementation of Oracle, but it’s not massively different under Windows.
An administrator who wants to manually start a database (called a server by some other
products) must have at least two important variables set
in his environment or in the registry : a variable that says
where the distribution of Oracle has been installed, called ORACLE_HOME, and a variable
that identifies the database, called ORACLE_SID.
You can have on the same machine several different databases running, that may be running under
different versions of Oracle.
The administator will run the sqlplus program, and "connect" - actually the connection is
dummy at this stage, it’s just a check that the program
is invoked from a Unix or Windows account with the right
privileges. You can then run the magical startup command.
The program starts by reading a file called spfile.ora that contains the parameters
for the database identified by ORACLE_SID. It’s
a binary file, and you may hear about the init.ora file
that is a text file that historically predates the binary file. With a binary file, parameters
are modified using sqlplus, and many of them can
by dynamically changed when the database is up and running.
Database parameters will give the size of a shared memory area that is allocated at
this stage and which is called the System Global Area, or SGA.
This is where Oracle works. A number of utility processes will
also be started. The number increases with each version of Oracle, some of them are mandatory,
others are optional. Under Windows, they arent’
processes but threads.
All the processes are running the same oracle program and have the SGA in their address-space.
It sometimes look like they are consuming a lot of memory, but actually most of this
memory is shared.
When memory is allocated and processes started, SQL*Plus displays the total SGA size as well
as the size of four subparts, the fixed part, the
variable part, the database buffers and the redo buffers.
At this point, Oracle is going to open and read a file that is so critical that it’s
always duplicated : the control file, found at a location that
is specified in the parameter file. This is where Oracle finds
the names of all the data files, temporary files and log files that together make up
the database.
The control file will also allow to check global consistency, whether something needs
to be restored and whether the database was properly shutdown
or if some unfinished transactions will have to be rolled back.
Without a control file, your database is dead.
After passing this stage, SQL*Plus will display that the database is mounted, which mostly
means that all files are identified. The only step that
remains is the opening of the various files for read/write
operations.
When files are opened, then the database is open and users who aren’t database administrators
can connect. That is, they can connect if they are logged
onto the machine that hosts the database.
If you want to enable users or an application server to connect from another machine, which
is the most common case, you have to launch with the lsnrctl
start command another program called tnslsnr, which is
the “listener” that waits for incoming connections. After displaying a lot of intimate
details about its configuration, found in the listener.ora file,
the program runs in the background and everything is
in place.
Let’s now contemplate a client program that runs on another machine. If it wants to access
the database, it needs to provide, first of all, three pieces
of information: the name of the host on which Oracle runs,
the port on which tnslsnr is listening, and the service, that is in practice the identifier
of the database you want to connect to. There are
several ways to provide these pieces of information. You can directly provide everything in the
function that performs the connection request; this is for
instance what you do with JDBC and a pure java driver. If your client is using the Oracle
client libraries, you can also provide an alias, and your client
will transparently fetch the associated information from
a file called tnsnames.ora.
These two methods are by far the most common ones, but there are other possibilities, such
as using an LDAP directory or system utilities such
as the Network Information Services as a substitute to
the tnsnames.ora file.
Obviously Oracle will not give you uncontrolled access to the data. You must sign-on, so that
you are identified and the database clearly knows
what you are allowed to do. Usually, you will provide a
username and a password. Authentification through an LDAP directory is also sometimes
possible.
In practice, your client issues a connection request that is handled by the listener. The
listener will either spawn a sub-process that will
run the oracle code and become the client’s dedicated server,
or redirect the connection to an already existing shared server, which is a much less common
configuration.
The server process is, so to speak, the client’s proxy on the Oracle side. It can access the
SGA, that is shared between all processes. It will
also allocate a private memory area called PGA,
where it stores what is private to the process and to the session. But let’s take a closer
look to the various parts in the SGA. There isn’t much
to say about the fixed system area. But the variable area
contains the shared pool, which holds compiled versions of queries, as well as various buffer
pools that will contain for instance data dictionary
information.
The data buffer contains data from the tables but also index structures and various work
areas. Everything is split in pages or blocs; the
block is the Oracle basic storage unit, and a typical size
for a block is 8K. Whenever you query or change the content of a table, you operate in memory
on blocks that are in this cache.
Finally, a fourth area is used to store information before it is written to log files to ensure
transactional consistency even in the occurrence of a crash.
The parts that really matter to understand how Oracle works are the shared pool, the
data buffer and the log buffer. This is where the dedicated
server process will operate, as well as the main utility
processes started with the database. Let’s see what happens when the client issues
an SQL query. The query arrives as text to the server
process that will start by computing a hash value that is used as a checksum. Then the
server will check whether this query has already been
recently executed, and will inspect the shared pool.
If the query is found, the analysis stops here, this phase is called soft-parsing and
the server process can execute the query.
If a prior query with exactly the same text cannot be found, then we have hard-parsing
that is CPU intensive and locks some shared resources.
SQL syntax must be analyzed, the objects that are referred
to in the query must be identified, some synonyms may have to be translated, and the best execution
plan must be determined among sometimes a very large number of possible execution plans.
It may cause recursive queries on the data dictionary and input/output operations. This
is why you shouldn’t concatenate to the text of a query
identifiers that change with every call, but you should
pass them as parameters, as if the query were a function.
Once the query is analyzed and compiled, it is going to be loaded into the shared pool
and executed. It will then reference pieces of data that
may or may not be in the data buffer. If data blocks aren’t
in memory, the server process fetches them from the data files and loads them. Blocks
that have been modified are written asynchronously to the
datafile by one or several database writer processes.
When you update data, mechanisms for analyzing the statement and uploading data blocks into
memory are the same ones. But before modifying data,
initial values are copied to a work area called “undo segment”.
Another process that wants to read data being modified isn’t blocked, but reads data from
the undo segment instead of reading from the table
block. Initital and new value are also copied to the redo buffer;
if the redo buffer fills up, the utility process LGWR, or log writer, dumps its content to
a log file.
When you commit the transaction, Oracle records that the initial value is no longer required
for rolling back the change, it records that the
change is validated and the LGWR process writes the
transaction to the log file. This time the operation is synchronous, and the server process
waits for the go ahead before the call returns to the
client.
That means that if you commit every single update in a loop, you will waste a lot of
time just waiting for acknowledgment.
This is a very quick introduction, but I hope it will allow you to understand better what
happens.
Thank you for your attention.