RAC nodes regularly freezes for about 10 seconds

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Marcin Szarek

    RAC nodes regularly freezes for about 10 seconds

    Hi!

    For a few months we suffer mysterious problem with Oracle 10g RAC (more
    details on server configuration at the bottom). At regular basis (every 5
    minutes) nodes of our cluster "freeze" - during couple of seconds
    operating system for some mysterious reason does nothing - as far as we're
    concerned - in userspace. Every single userspace process stops for this
    period. After a few seconds system comes back to life and all suspended
    processes content for CPU time and other resources, which in effect leads
    to higher load.

    We tried many investigations, which brougt us to following conclusions:
    - when Oracle instance is stopped, freezes disappear
    - since the moment we reduced shared_servers parameter from 60 to 20
    freezes last unsignificantly shorter (about 4 seconds shorter)
    - after instance restart freezes are unnoticeable, but as times
    goes by, they are again as long as 6-9 seconds

    Unfortunately investigation is very hard. /var/log/messages reports
    nothing, dmesg reports nothing, Oracle alert log also has nothing to say.

    Have you any idea what may be misconfigured or damaged? Could you please
    suggest us some further tests?


    Thank you in advance for any followups!



    Database server characteristics
    -------------------------------
    OS: RHEL 3 ES
    Kernel: 2.4.21-32.ELsmp
    Oracle: 10.2.0.1.0
    Storage: SAN accessed by QLogic HBA
    Cluster storage: OCFS v.1


    --
    - MARCIN SZAREK <me@some.where. com -
    -- There are only 10 types of people in this world; --
    -- those who understand binary, and those who don't --

Working...