Bo's Oracle Station

查看: 8006|回复: 14

请教一种很奇怪的数据库状态

[复制链接]

81

主题

181

帖子

781

积分

高级会员

Rank: 4

积分
781
发表于 2016-12-7 17:15:59 | 显示全部楼层 |阅读模式
唐老师:
我看到一个库 版本11.2.0.1的,遇到一种很奇怪的现象
1,已经连接上的中间件,或者PL/SQL客户端访问正常
2,新的PL/SQL客户端连接,报什么内存溢出
3,到数据库本地用SQLPLUS / as sysdba  报告这一个 idle instance,就像没有开库的样子
4,这个时候如果执行 startup 报告参数文件找不到
5,此时检查过 echo $ORACLE_HOME  echo $ORACLE_SID  echo $ORACLE_BASE都是正确的,没有问题
6,重新 oraenv确认过,也重新 source  .bash_profile了,问题依旧是sqlplus下提示 idle instance
7,这种情况不定期出现,什么都不做,等个把小时就恢复正常。如果急就只能重启整个操作系统
8,库是建在文件系统上的,没有grid

很奇怪,这个时候数据库是开启的,EM可以打得开,已经连接上的客户端也都正常访问,ORA_SMON进程也在。
记得上课的时候说过,生产环境不用11.2.0.1
现在这种情况,能不能说就是版本BUG的问题
我担心装了11.2.0.4版本后,导入导出迁移以后,依然这样。

操作系统跟库都是别人装的,现在遇到这样的情况,他们要求加内存,但是问题依然存在。
再要求加内存,我认为这个情况不太对劲,建议他们做装高版本的,然后dump导入数据进去。
现在基本也认可这样的方案,我觉得我进了一个坑,这事我说的,我做。。。 如果倒腾一圈依然这样,我岂不是里外不是人?
突然觉得很后悔多嘴!
唐老师指点以下迷津吧。

回复

使用道具 举报

1005

主题

1469

帖子

1万

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
12012
发表于 2016-12-8 10:03:10 | 显示全部楼层
本帖最后由 botang 于 2016-12-8 10:06 编辑

这种现象也不像BUG。很诡异。应该是操作系统调度信号不够造成的。
检查/etc/sysctl.conf:
  1. # Kernel sysctl configuration file for Red Hat Linux
  2. #
  3. # For binary values, 0 is disabled, 1 is enabled.  See sysctl(8) and
  4. # sysctl.conf(5) for more details.

  5. # Controls IP packet forwarding
  6. net.ipv4.ip_forward = 0

  7. # Controls source route verification
  8. net.ipv4.conf.default.rp_filter = 1

  9. # Do not accept source routing
  10. net.ipv4.conf.default.accept_source_route = 0

  11. # Controls the System Request debugging functionality of the kernel
  12. kernel.sysrq = 0

  13. # Controls whether core dumps will append the PID to the core filename.
  14. # Useful for debugging multi-threaded applications.
  15. kernel.core_uses_pid = 1

  16. # Controls the use of TCP syncookies
  17. net.ipv4.tcp_syncookies = 1

  18. # Disable netfilter on bridges.
  19. net.bridge.bridge-nf-call-ip6tables = 0
  20. net.bridge.bridge-nf-call-iptables = 0
  21. net.bridge.bridge-nf-call-arptables = 0
  22. fs.file-max = 65536
  23. kernel.sem = 250 32000 100 128
  24. kernel.shmall = 2097152
  25. kernel.shmmni = 4096
  26. kernel.shmmax = 2022342656
  27. net.core.rmem_default=134217728
  28. net.core.rmem_max=134217728
  29. net.core.wmem_default=134217728
  30. net.core.wmem_max=134217728
  31. net.core.optmem_max=134217728
  32. net.ipv4.ip_local_port_range = 1024 65000
  33. net.ipv4.tcp_timestamps = 0
  34. net.ipv4.tcp_keepalive_time = 120
  35. net.ipv4.tcp_keepalive_probes = 2
  36. net.ipv4.tcp_keepalive_intvl = 75
  37. net.ipv4.tcp_syncookies = 1
  38. net.ipv4.tcp_tw_reuse = 1
  39. net.ipv4.tcp_tw_recycle = 1
  40. net.ipv4.tcp_fin_timeout = 30

复制代码

尤其
kernel.sem = 250 32000 100 128
可以考虑最大改成10倍:
kernel.sem = 250 320000 100 128

试试。以上配置是EBS套件在9i的库上都可以配的。以上文件改后,要sysctl -p不用重启就生效。
回复 支持 反对

使用道具 举报

81

主题

181

帖子

781

积分

高级会员

Rank: 4

积分
781
 楼主| 发表于 2016-12-8 10:40:10 | 显示全部楼层
感谢唐老师指点,这个库的管理员不是我,操作系统是centos的,系统上线以来一直小毛病不断。
我曾经看过一些数据库基本参数,发现全部都是默认值,只有process 跟open_cursors做了一些改动,大约分别是1000,500
归档没有,日志50M一路,控制文件一路,操作系统的桌面也感觉怪怪的,好像是是语言有些问题,说不上来的感觉。
上课以后,我想过去检查这个库安装数据库软件时的先决条件,其中就包括/etc/sysctl.conf,还有一些依赖包。
但是不知道有没有好的办法可以查,就像安装数据库软件的时候那个检查的界面。

感谢唐SIR的提醒,我会建议他们检查这个/etc/sysctl.conf文件,是不是按照规定设置了。
搞不好就是一路忽略下去安装起来的。
当初搞这个的就是刘志云,我怀疑这里有问题很久了,所以当时上课的时候问过唐老师关于他的情况
回复 支持 反对

使用道具 举报

1005

主题

1469

帖子

1万

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
12012
发表于 2016-12-8 10:45:16 来自手机 | 显示全部楼层
他做redhat和思科的,当年共事过。
回复 支持 反对

使用道具 举报

81

主题

181

帖子

781

积分

高级会员

Rank: 4

积分
781
 楼主| 发表于 2016-12-12 10:39:30 | 显示全部楼层
唐老师,我今天终于看到这个数据库的日志了,还是有些线索的
alertr日志如下:
2016-12-12 10:02:27.244000 +08:00
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
2016-12-12 10:02:33.246000 +08:00
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
2016-12-12 10:02:34.247000 +08:00
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
2016-12-12 10:02:46.287000 +08:00
Thread 1 cannot allocate new log, sequence 135700
Checkpoint not complete
  Current log# 3 seq# 135699 mem# 0: /u01/app/oracle/oradata/orcl/redo03.log
2016-12-12 10:02:47.853000 +08:00
Thread 1 advanced to log sequence 135700 (LGWR switch)
  Current log# 1 seq# 135700 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
2016-12-12 10:02:50.258000 +08:00
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
2016-12-12 10:04:12.276000 +08:00
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
2016-12-12 10:04:44.284000 +08:00
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
Process J000 died, see its trace file
kkjcre1p: unable to spawn jobq slave process
Errors in file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
2016-12-12 10:05:16.286000 +08:00
Thread 1 cannot allocate new log, sequence 135701
Checkpoint not complete
  Current log# 1 seq# 135700 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
2016-12-12 10:05:18.173000 +08:00
Thread 1 advanced to log sequence 135701 (LGWR switch)
  Current log# 2 seq# 135701 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
2016-12-12 10:05:53.173000 +08:00
Process W000 died, see its trace file
2016-12-12 10:05:57.181000 +08:00
Process W000 died, see its trace file
2016-12-12 10:06:01.188000 +08:00
Process W000 died, see its trace file
2016-12-12 10:06:05.185000 +08:00
Process W000 died, see its trace file
2016-12-12 10:06:07.408000 +08:00
Process m000 died, see its trace file
2016-12-12 10:06:09.192000 +08:00

1.process m000 died 这个应该就是导致oracle客户端连接不上的日志
2,前面发生了kkjcre1p: unable to spawn jobq slave process,这个是什么 意思,是process不足导致的吗?
3,最重要的是不是这一条:这是不是意味这50M的日志3组,来不及切换导致检查点失败
   2016-12-12 10:05:16.286000 +08:00
   Thread 1 cannot allocate new log, sequence 135701
   Checkpoint not complete
   Current log# 1 seq# 135700 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
   这条日志后面紧接这就是:
   2016-12-12 10:05:18.173000 +08:00
   Thread 1 advanced to log sequence 135701 (LGWR switch)
   Current log# 2 seq# 135701 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
   2016-12-12 10:05:53.173000 +08:00
   Process W000 died, see its trace file
   2016-12-12 10:05:57.181000 +08:00
   Process W000 died, see its trace file
   这里有因果关系吗?
   如果增加process,以及增加日志大小,是不是就能解决问题?
4,另外,跟机器的内存有没有关系FREE大小有没有关系? 32G的物理内存,分给了数据库24G,/dev/shm=24G
     是不是操作系统内存分配不够,也有关系?

     [oracle@trace]$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2              20G  3.3G   16G  18% /
tmpfs                  24G   15G  9.3G  62% /dev/shm
/dev/sda1             194M   31M  153M  17% /boot
/dev/mapper/vg_oa-lv_oa
                      227G   52G  164G  24% /u01

[oracle@trace]$ free
             total       used       free     shared    buffers     cached
Mem:      32883988   32595872     288116          0      36128   17887192
-/+ buffers/cache:   14672552   18211436
Swap:      8191992    2373984    5818008
     
回复 支持 反对

使用道具 举报

81

主题

181

帖子

781

积分

高级会员

Rank: 4

积分
781
 楼主| 发表于 2016-12-12 11:09:13 | 显示全部楼层
刚才又多看了下日志
adrci> show alert -p "message_text like 'Checkpoint not complete%'"

发现情况一直很严重:连续的检查点失败
唐老师,检查点失败可以理解,因为日志切换太快,前后衔接不上,写来不及导致的,对不对?
但是这样的问题会导致连sqlplus / as sysdba提示idle instance吗? 管理员sqlplus都不能用?

我统计过日志的切换频率,实际上3分钟不到即switch一次,显然太频繁,这会导致IO性能问题和lgwr进程繁忙,redolog resync很高,我是明白的。
但是即便如此,3号日志写完的时候,距离一号日志写完也有5分钟了吧(期间经历了2号日志写2分半钟,3号日志写2分半钟)
5分钟的时间,不够让一号日志处于 inactive状态吗?1号日志50M而已


2016-12-12 10:02:46.288000 +08:00
Checkpoint not complete
2016-12-12 10:05:16.286000 +08:00
Checkpoint not complete
2016-12-12 10:15:22.032000 +08:00
Checkpoint not complete
2016-12-12 10:17:51.489000 +08:00
Checkpoint not complete
2016-12-12 10:20:21.232000 +08:00
Checkpoint not complete
2016-12-12 10:22:48.343000 +08:00
Checkpoint not complete
2016-12-12 10:25:17.480000 +08:00
Checkpoint not complete
2016-12-12 10:27:20.724000 +08:00
Checkpoint not complete
2016-12-12 10:29:40.872000 +08:00
Checkpoint not complete
2016-12-12 10:32:03.656000 +08:00
Checkpoint not complete
2016-12-12 10:34:23.677000 +08:00
Checkpoint not complete
2016-12-12 10:36:41.485000 +08:00
Checkpoint not complete
2016-12-12 10:39:03.512000 +08:00
Checkpoint not complete
2016-12-12 10:41:23.535000 +08:00
Checkpoint not complete
2016-12-12 10:43:45.124000 +08:00
Checkpoint not complete
2016-12-12 10:46:05.637000 +08:00
Checkpoint not complete
2016-12-12 10:48:26.511000 +08:00
Checkpoint not complete
2016-12-12 10:50:44.482000 +08:00
Checkpoint not complete
2016-12-12 10:53:02.218000 +08:00
Checkpoint not complete
2016-12-12 10:55:13.911000 +08:00
Checkpoint not complete
2016-12-12 10:57:26.044000 +08:00
Checkpoint not complete
2016-12-12 10:59:36.813000 +08:00
Checkpoint not complete
回复 支持 反对

使用道具 举报

1005

主题

1469

帖子

1万

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
12012
发表于 2016-12-12 20:32:41 | 显示全部楼层
本帖最后由 botang 于 2016-12-12 20:36 编辑

kkjcre1p: unable to spawn jobq slave process

1. 我很怀疑process建库时用默认值:150。太小了。
2. 另外查下job_queue_processes是1000吗?  还是被改成很小的值。
3. “Checkpoint not complete”出现很多是由于日志组太小,导致日志切换引起的增量检查点事件比较多造成的。
4. 日志切换后,正常的数据库也会报告“Checkpoint not complete”。
5. “Thread 1 cannot allocate new log, sequence 135701”就这句来看,本身只是说日志组满了,需要切换了。正常数据库在切换日志时,也有这信息。
6. j000和w000会意外die, 这些都和job_queue_processes有关。内核参数有检查吗?看看这两个进程的trace日志。

如果增加process,以及增加日志大小,是不是就能解决问题?
你的分析方向算对的。

机器内存应该分配得没有问题。




回复 支持 反对

使用道具 举报

81

主题

181

帖子

781

积分

高级会员

Rank: 4

积分
781
 楼主| 发表于 2016-12-13 11:57:53 | 显示全部楼层
本帖最后由 lujiaguai 于 2016-12-13 15:51 编辑

唐老师,这种情况的出现,我原本以为这一块已经理解了,看来还差得远

1,process参数原本是1000,并不是150,昨天我把它改成了1450
2,show parameter job_queue_processes;值是1000
3,“Checkpoint not complete”出现很多是日志组太小,我在其他一些野生的库上也看到过,检查点事件没有完成,此时数据库是不是挂住的,知道下一个“Thread 1 advanced to log sequence ****”这期间如果间隔很久,是不是意味着存储扛不住了。
     以此为例:10点05分16秒 cannot allocate new log,Checkpoint not complete。但是10点18秒就切过去了Thread 1 advanced to log sequence
     2016-12-12 10:05:16.286000 +08:00
     Thread 1 cannot allocate new log, sequence 135701
     Checkpoint not complete
     Current log# 1 seq# 135700 mem# 0: /u01/app/oracle/oradata/orcl/redo01.log
     2016-12-12 10:05:18.173000 +08:00
     Thread 1 advanced to log sequence 135701 (LGWR switch)
     Current log# 2 seq# 135701 mem# 0: /u01/app/oracle/oradata/orcl/redo02.log
     唐老师,这里能不能理解为间隔不到3秒,这期间数据库是挂住的?之所以没有很明显的感觉,是因为间隔还是很短。
     如果存储很烂,这里间隔是不是会拉大,导致更严重的问题?
     我看过课堂上的试验环境以及这台真实环境在晚上空闲时候,是不会发生 cannot allocate new log,Checkpoint not complete的,只有在负载较大,日志切换频繁的时候才出现这种日志,那么您说真实环境里出现这样的情况是不是正常的?是不是看到这样的日志就意味着增加日志大小或者增加日志组?
4,我昨天扩大了日志到1G以后,再没有出现Checkpoint not complete,而是Thread 1 cannot allocate new log, sequence **** Private strand flush not complete,这样的提示跟Checkpoint not complete有什么区别?
     从下面的2断日志来看,Thread 1 cannot allocate new log, sequence **** Private strand flush not complete,Thread 1 advanced to log sequence
     这3个动作在同一个时间内完成的,而昨天的checkpoint not complete和Thread 1 advanced to log sequence间隔了2秒多
     唐老师,能不能这么认为, Private strand flush not complete这种日志才是正确的状态?
     而Checkpoint not complete在真实环境中会出现,出现的时候意味着3组日志切换来不及,类似1个current,2个active这样,对吗?
     Tue Dec 13 09:56:44 2016
     Thread 1 cannot allocate new log, sequence 135749
     Private strand flush not complete
     Current log# 1 seq# 135748 mem# 0: /u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_1_d4w6v9wz_.log
     Thread 1 advanced to log sequence 135749 (LGWR switch)
     Current log# 3 seq# 135749 mem# 0: /u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_3_d4w6on9c_.log
     Tue Dec 13 10:47:22 2016
     Thread 1 cannot allocate new log, sequence 135750
     Private strand flush not complete
     Current log# 3 seq# 135749 mem# 0: /u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_3_d4w6on9c_.log
     Thread 1 advanced to log sequence 135750 (LGWR switch)
     Current log# 2 seq# 135750 mem# 0: /u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_2_d4w6snnh_.log

6,J000的trc提示路径如下:/u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc:
     文件很长,我贴在下面,我看过大量的:Process J000 is dead (pid=32723 req_ver=24 cur_ver=24 state=KSOSP_SPAWNED)
     从提示来看,就是进程不足,昨天看到一篇文章,也提到类似的问题,见链接:http://www.cnblogs.com/vtable9999/articles/2716445.html
     文章里验证了,如果process不足,就会出现如本贴开头说描述的那种故障,客户端新连接提示内存溢出,服务端splplus提示 idle instance
     我昨天已经把 process 提高到1450,今天暂时没有再出现值钱的问题,上午我查过三次:select count(*) from v$process;结果都只有200多
     那么昨天以及近期的process如果是因为超过1000,才导致数据库不能连接,这个现象跟日志切换过快有没有关联?

唐老师,这件事应该当两件事来看待?还是当成一件事来看待?
表面上看,日志切换过快是一回事,process过大是另一回事。
我是不是再观察一段时间,看看日志改大以后,是不是还会出现process过大的情况。

另外,sysctl那个参数还没有动,前面那个关于process导致数据库idle instance的文章最后,其实并没有给出很明确的结论跟办法,也提出了内存问题。
很奇葩的是这是一个很小的数据库,数据文件只有5G多,但按这个日志量如果开归档,每天至少12G的日志,然后process能如此之多?
内存已经分配到24G了,乙方还在把这个问题归咎于内存过小。
唐老师给个建议把?这样的情况如果推翻掉,用oel6.5的版本+11.2.0.4再做,然后就几个G的数据文件,直接dump进去,再收集一下统计信息,是不是更正确的做法?记得唐老师上课说过,上一个错误的前提下去解决错误,方法论不对。

最后附上那个trace文件内容:
[oracle@OADB1 ~]$ cat /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc
Trace file /u01/app/oracle/diag/rdbms/orcl/ecology/trace/ecology_cjq0_7593.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1
System name:    Linux
Node name:      OADB1
Release:        2.6.32-220.el6.x86_64
Version:        #1 SMP Wed Nov 9 08:03:13 EST 2011
Machine:        x86_64
Instance name: ecology
Redo thread mounted by this instance: 1
Oracle process number: 29
Unix process pid: 7593, image: [url=mailtoracle@OADB1]oracle@OADB1[/url] (CJQ0)

*** 2016-11-22 22:00:00.065
*** SESSION ID987.5578) 2016-11-22 22:00:00.065
*** CLIENT ID) 2016-11-22 22:00:00.065
*** SERVICE NAMESYS$BACKGROUND) 2016-11-22 22:00:00.065
*** MODULE NAME) 2016-11-22 22:00:00.065
*** ACTION NAME) 2016-11-22 22:00:00.065

Setting Resource Manager plan SCHEDULER[0x3004]EFAULT_MAINTENANCE_PLAN via scheduler window
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3005]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-23 22:00:00.024
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3006]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-24 22:00:00.173
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3007]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-25 22:00:00.009
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3008]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-26 06:00:00.006
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3009]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-27 06:00:00.011
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3003]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-28 22:00:00.028
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3004]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-29 22:00:00.037
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3005]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-11-30 22:00:00.034
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
*** 2016-12-01 09:38:10.939
Process J000 is dead (pid=32723 req_ver=24 cur_ver=24 state=KSOSP_SPAWNED).
*** 2016-12-01 09:38:11.997
Process J000 is dead (pid=32725 req_ver=19 cur_ver=19 state=KSOSP_SPAWNED).
*** 2016-12-01 09:38:17.999
Process J000 is dead (pid=32727 req_ver=25 cur_ver=25 state=KSOSP_SPAWNED).
*** 2016-12-01 09:38:19.010
Process J000 is dead (pid=32729 req_ver=20 cur_ver=20 state=KSOSP_SPAWNED).
*** 2016-12-01 09:39:55.043
Process J000 is dead (pid=435 req_ver=26 cur_ver=26 state=KSOSP_SPAWNED).
*** 2016-12-01 09:39:56.044
Process J000 is dead (pid=437 req_ver=21 cur_ver=21 state=KSOSP_SPAWNED).
*** 2016-12-01 09:44:22.100
Process J000 is dead (pid=793 req_ver=28 cur_ver=28 state=KSOSP_SPAWNED).
*** 2016-12-01 09:44:23.101
Process J000 is dead (pid=796 req_ver=1212 cur_ver=1212 state=KSOSP_SPAWNED).
*** 2016-12-01 09:45:10.038
Process J000 is dead (pid=844 req_ver=29 cur_ver=29 state=KSOSP_SPAWNED).
*** 2016-12-01 09:45:11.039
Process J000 is dead (pid=846 req_ver=1224 cur_ver=1224 state=KSOSP_SPAWNED).
*** 2016-12-01 10:07:51.013
Process J000 is dead (pid=2710 req_ver=35 cur_ver=35 state=KSOSP_SPAWNED).
*** 2016-12-01 10:07:52.042
Process J000 is dead (pid=2716 req_ver=30 cur_ver=30 state=KSOSP_SPAWNED).
*** 2016-12-01 10:17:24.587
Process J000 is dead (pid=3598 req_ver=1707 cur_ver=1707 state=KSOSP_SPAWNED).
*** 2016-12-01 10:17:25.626
Process J000 is dead (pid=3600 req_ver=40 cur_ver=40 state=KSOSP_SPAWNED).
*** 2016-12-01 10:31:38.106
Process J000 is dead (pid=4711 req_ver=45 cur_ver=45 state=KSOSP_SPAWNED).
*** 2016-12-01 10:31:40.120
Process J000 is dead (pid=4715 req_ver=33 cur_ver=33 state=KSOSP_SPAWNED).
*** 2016-12-01 10:40:25.025
Process J000 is dead (pid=5450 req_ver=47 cur_ver=47 state=KSOSP_SPAWNED).
*** 2016-12-01 10:40:26.047
Process J000 is dead (pid=5453 req_ver=34 cur_ver=34 state=KSOSP_SPAWNED).
*** 2016-12-01 10:40:27.048
Process J000 is dead (pid=5456 req_ver=25 cur_ver=25 state=KSOSP_SPAWNED).
*** 2016-12-01 11:03:28.390
Process J000 is dead (pid=7335 req_ver=2397 cur_ver=2397 state=KSOSP_SPAWNED).
*** 2016-12-01 11:03:29.413
Process J000 is dead (pid=7337 req_ver=2482 cur_ver=2482 state=KSOSP_SPAWNED).
*** 2016-12-01 11:03:31.415
Process J000 is dead (pid=7341 req_ver=2398 cur_ver=2398 state=KSOSP_SPAWNED).
*** 2016-12-01 11:06:17.453
Process J000 is dead (pid=7613 req_ver=2440 cur_ver=2440 state=KSOSP_SPAWNED).
*** 2016-12-01 11:06:18.454
Process J000 is dead (pid=7616 req_ver=2483 cur_ver=2483 state=KSOSP_SPAWNED).
*** 2016-12-01 11:07:34.471
Process J000 is dead (pid=7821 req_ver=2460 cur_ver=2460 state=KSOSP_SPAWNED).
*** 2016-12-01 11:07:35.472
Process J000 is dead (pid=7823 req_ver=2484 cur_ver=2484 state=KSOSP_SPAWNED).
*** 2016-12-01 11:26:31.812
Process J000 is dead (pid=9801 req_ver=2744 cur_ver=2744 state=KSOSP_SPAWNED).
*** 2016-12-01 11:26:32.828
Process J000 is dead (pid=9803 req_ver=2489 cur_ver=2489 state=KSOSP_SPAWNED).
*** 2016-12-01 11:26:34.830
Process J000 is dead (pid=9807 req_ver=2745 cur_ver=2745 state=KSOSP_SPAWNED).
*** 2016-12-01 11:49:36.203
Process J000 is dead (pid=11853 req_ver=2494 cur_ver=2494 state=KSOSP_SPAWNED).
*** 2016-12-01 11:49:37.228
Process J000 is dead (pid=11855 req_ver=559 cur_ver=559 state=KSOSP_SPAWNED).
*** 2016-12-01 11:49:38.229
Process J000 is dead (pid=11858 req_ver=3092 cur_ver=3092 state=KSOSP_SPAWNED).
*** 2016-12-01 12:12:39.586
Process J000 is dead (pid=13623 req_ver=3437 cur_ver=3437 state=KSOSP_SPAWNED).
*** 2016-12-01 12:12:41.602
Process J000 is dead (pid=13627 req_ver=3438 cur_ver=3438 state=KSOSP_SPAWNED).
*** 2016-12-01 12:12:43.604
Process J000 is dead (pid=13631 req_ver=529 cur_ver=529 state=KSOSP_SPAWNED).
*** 2016-12-01 12:35:39.958
Process J000 is dead (pid=15265 req_ver=3783 cur_ver=3783 state=KSOSP_SPAWNED).
*** 2016-12-01 12:35:40.983
Process J000 is dead (pid=15267 req_ver=2507 cur_ver=2507 state=KSOSP_SPAWNED).
*** 2016-12-01 12:35:42.985
Process J000 is dead (pid=15271 req_ver=3784 cur_ver=3784 state=KSOSP_SPAWNED).
*** 2016-12-01 12:58:44.340
Process J000 is dead (pid=17013 req_ver=2513 cur_ver=2513 state=KSOSP_SPAWNED).
*** 2016-12-01 12:58:45.341
Process J000 is dead (pid=17015 req_ver=562 cur_ver=562 state=KSOSP_SPAWNED).
*** 2016-12-01 12:58:46.342
Process J000 is dead (pid=17017 req_ver=4129 cur_ver=4129 state=KSOSP_SPAWNED).
*** 2016-12-01 13:21:47.678
Process J000 is dead (pid=18639 req_ver=4474 cur_ver=4474 state=KSOSP_SPAWNED).
*** 2016-12-01 13:21:49.694
Process J000 is dead (pid=18643 req_ver=4475 cur_ver=4475 state=KSOSP_SPAWNED).
*** 2016-12-01 13:21:50.695
Process J000 is dead (pid=18645 req_ver=563 cur_ver=563 state=KSOSP_SPAWNED).
Setting Resource Manager plan SCHEDULER[0x3006]EFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-01 22:00:00.025
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3007]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-02 22:00:00.176
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3008]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-03 06:00:00.024
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3009]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-04 06:00:00.005
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3003]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-05 22:00:00.043
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3004]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-06 22:00:00.007
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
*** 2016-12-07 15:49:47.908
Process J000 is dead (pid=8013 req_ver=6 cur_ver=6 state=KSOSP_SPAWNED).
*** 2016-12-07 15:49:48.978
Process J000 is dead (pid=8015 req_ver=4 cur_ver=4 state=KSOSP_SPAWNED).
*** 2016-12-07 15:49:54.980
Process J000 is dead (pid=8019 req_ver=7 cur_ver=7 state=KSOSP_SPAWNED).
*** 2016-12-07 15:49:55.981
Process J000 is dead (pid=8021 req_ver=5 cur_ver=5 state=KSOSP_SPAWNED).
*** 2016-12-07 15:51:02.017
Process J000 is dead (pid=8065 req_ver=8 cur_ver=8 state=KSOSP_SPAWNED).
*** 2016-12-07 15:51:03.018
Process J000 is dead (pid=8067 req_ver=6 cur_ver=6 state=KSOSP_SPAWNED).
*** 2016-12-07 15:51:09.021
Process J000 is dead (pid=8069 req_ver=9 cur_ver=9 state=KSOSP_SPAWNED).
*** 2016-12-07 15:51:10.022
Process J000 is dead (pid=8071 req_ver=7 cur_ver=7 state=KSOSP_SPAWNED).
*** 2016-12-07 15:52:56.045
Process J000 is dead (pid=8189 req_ver=10 cur_ver=10 state=KSOSP_SPAWNED).
*** 2016-12-07 15:52:57.047
Process J000 is dead (pid=8193 req_ver=8 cur_ver=8 state=KSOSP_SPAWNED).
*** 2016-12-07 16:00:02.028
Process J000 is dead (pid=8634 req_ver=908 cur_ver=908 state=KSOSP_SPAWNED).
*** 2016-12-07 16:00:03.029
Process J000 is dead (pid=8640 req_ver=11 cur_ver=11 state=KSOSP_SPAWNED).
*** 2016-12-07 16:00:04.731
Process J000 is dead (pid=8649 req_ver=9 cur_ver=9 state=KSOSP_SPAWNED).
*** 2016-12-07 16:00:06.733
Process J000 is dead (pid=8653 req_ver=909 cur_ver=909 state=KSOSP_SPAWNED).
*** 2016-12-07 16:17:07.993
Process J000 is dead (pid=10091 req_ver=919 cur_ver=919 state=KSOSP_SPAWNED).
*** 2016-12-07 16:17:09.995
Process J000 is dead (pid=10095 req_ver=1810 cur_ver=1810 state=KSOSP_SPAWNED).
*** 2016-12-07 16:17:10.996
Process J000 is dead (pid=10097 req_ver=12 cur_ver=12 state=KSOSP_SPAWNED).
*** 2016-12-07 16:31:38.716
Process J000 is dead (pid=11556 req_ver=1157 cur_ver=1157 state=KSOSP_SPAWNED).
*** 2016-12-07 16:31:39.717
Process J000 is dead (pid=11558 req_ver=865 cur_ver=865 state=KSOSP_SPAWNED).
*** 2016-12-07 16:37:40.006
Process J000 is dead (pid=12284 req_ver=1160 cur_ver=1160 state=KSOSP_SPAWNED).
*** 2016-12-07 16:37:41.007
Process J000 is dead (pid=12290 req_ver=866 cur_ver=866 state=KSOSP_SPAWNED).
*** 2016-12-07 16:40:11.930
Process J000 is dead (pid=12631 req_ver=5928 cur_ver=5928 state=KSOSP_SPAWNED).
*** 2016-12-07 16:40:12.931
Process J000 is dead (pid=12633 req_ver=1161 cur_ver=1161 state=KSOSP_SPAWNED).
*** 2016-12-07 16:40:13.932
Process J000 is dead (pid=12646 req_ver=867 cur_ver=867 state=KSOSP_SPAWNED).
*** 2016-12-07 16:40:15.934
Process J000 is dead (pid=12650 req_ver=1162 cur_ver=1162 state=KSOSP_SPAWNED).
*** 2016-12-07 16:40:16.935
Process J000 is dead (pid=12652 req_ver=1603 cur_ver=1603 state=KSOSP_SPAWNED).
*** 2016-12-07 17:03:13.361
Process J000 is dead (pid=14784 req_ver=6274 cur_ver=6274 state=KSOSP_SPAWNED).
*** 2016-12-07 17:03:15.363
Process J000 is dead (pid=14789 req_ver=6275 cur_ver=6275 state=KSOSP_SPAWNED).
*** 2016-12-07 17:03:16.365
Process J000 is dead (pid=14791 req_ver=868 cur_ver=868 state=KSOSP_SPAWNED).
*** 2016-12-07 17:03:17.366
Process J000 is dead (pid=14793 req_ver=1604 cur_ver=1604 state=KSOSP_SPAWNED).
*** 2016-12-07 17:03:18.367
Process J000 is dead (pid=14795 req_ver=6276 cur_ver=6276 state=KSOSP_SPAWNED).
*** 2016-12-07 17:26:14.747
Process J000 is dead (pid=16475 req_ver=1175 cur_ver=1175 state=KSOSP_SPAWNED).
*** 2016-12-07 17:26:15.748
Process J000 is dead (pid=16477 req_ver=6620 cur_ver=6620 state=KSOSP_SPAWNED).
*** 2016-12-07 17:26:17.750
Process J000 is dead (pid=16502 req_ver=1605 cur_ver=1605 state=KSOSP_SPAWNED).
*** 2016-12-07 17:26:18.751
Process J000 is dead (pid=16504 req_ver=6621 cur_ver=6621 state=KSOSP_SPAWNED).
*** 2016-12-07 17:26:19.752
Process J000 is dead (pid=16506 req_ver=1176 cur_ver=1176 state=KSOSP_SPAWNED).
Setting Resource Manager plan SCHEDULER[0x3005]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-07 22:00:00.033
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3006]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-08 22:00:00.030
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3007]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-09 22:00:00.038
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3008]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-10 06:00:00.055
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
Setting Resource Manager plan SCHEDULER[0x3009]:DEFAULT_MAINTENANCE_PLAN via scheduler window
*** 2016-12-11 06:00:00.013
Setting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameter
*** 2016-12-12 10:02:26.202
Process J000 is dead (pid=28564 req_ver=199 cur_ver=199 state=KSOSP_SPAWNED).
*** 2016-12-12 10:02:27.244
Process J000 is dead (pid=28568 req_ver=133 cur_ver=133 state=KSOSP_SPAWNED).
*** 2016-12-12 10:02:33.246
Process J000 is dead (pid=28571 req_ver=200 cur_ver=200 state=KSOSP_SPAWNED).
*** 2016-12-12 10:02:34.247
Process J000 is dead (pid=28573 req_ver=134 cur_ver=134 state=KSOSP_SPAWNED).
*** 2016-12-12 10:02:50.258
Process J000 is dead (pid=28583 req_ver=201 cur_ver=201 state=KSOSP_SPAWNED).
*** 2016-12-12 10:02:51.259
Process J000 is dead (pid=28586 req_ver=135 cur_ver=135 state=KSOSP_SPAWNED).
*** 2016-12-12 10:04:12.276
Process J000 is dead (pid=28643 req_ver=1466 cur_ver=1466 state=KSOSP_SPAWNED).
*** 2016-12-12 10:04:13.277
Process J000 is dead (pid=28645 req_ver=202 cur_ver=202 state=KSOSP_SPAWNED).
*** 2016-12-12 10:04:44.284
Process J000 is dead (pid=28670 req_ver=1467 cur_ver=1467 state=KSOSP_SPAWNED).
*** 2016-12-12 10:04:45.285
Process J000 is dead (pid=28672 req_ver=203 cur_ver=203 state=KSOSP_SPAWNED).
*** 2016-12-12 10:08:28.002
Process J000 is dead (pid=28938 req_ver=7303 cur_ver=7303 state=KSOSP_SPAWNED).
*** 2016-12-12 10:08:29.003
Process J000 is dead (pid=28940 req_ver=1469 cur_ver=1469 state=KSOSP_SPAWNED).
*** 2016-12-12 10:13:13.441
Process J000 is dead (pid=29432 req_ver=7374 cur_ver=7374 state=KSOSP_SPAWNED).
*** 2016-12-12 10:13:15.443
Process J000 is dead (pid=29436 req_ver=205 cur_ver=205 state=KSOSP_SPAWNED).
*** 2016-12-12 10:31:38.802
Process J000 is dead (pid=31075 req_ver=7650 cur_ver=7650 state=KSOSP_SPAWNED).
*** 2016-12-12 10:31:40.804
Process J000 is dead (pid=31079 req_ver=7651 cur_ver=7651 state=KSOSP_SPAWNED).
*** 2016-12-12 10:36:14.772
Process J000 is dead (pid=31437 req_ver=7720 cur_ver=7720 state=KSOSP_SPAWNED).
*** 2016-12-12 10:36:15.773
Process J000 is dead (pid=31439 req_ver=1478 cur_ver=1478 state=KSOSP_SPAWNED).
*** 2016-12-12 10:36:17.775
Process J000 is dead (pid=31443 req_ver=3291 cur_ver=3291 state=KSOSP_SPAWNED).
*** 2016-12-12 10:59:19.136
Process J000 is dead (pid=895 req_ver=1482 cur_ver=1482 state=KSOSP_SPAWNED).
*** 2016-12-12 10:59:20.137
Process J000 is dead (pid=897 req_ver=8067 cur_ver=8067 state=KSOSP_SPAWNED).
*** 2016-12-12 10:59:21.138
Process J000 is dead (pid=900 req_ver=3292 cur_ver=3292 state=KSOSP_SPAWNED).
*** 2016-12-12 11:22:22.482
Process J000 is dead (pid=3388 req_ver=398 cur_ver=398 state=KSOSP_SPAWNED).
*** 2016-12-12 11:22:23.483
Process J000 is dead (pid=3403 req_ver=2206 cur_ver=2206 state=KSOSP_SPAWNED).
*** 2016-12-12 11:22:25.485
Process J000 is dead (pid=3407 req_ver=399 cur_ver=399 state=KSOSP_SPAWNED).
  
回复 支持 反对

使用道具 举报

81

主题

181

帖子

781

积分

高级会员

Rank: 4

积分
781
 楼主| 发表于 2016-12-13 17:03:41 | 显示全部楼层
本帖最后由 lujiaguai 于 2016-12-13 17:04 编辑

查阅课堂笔记,开启了检查点记录,观察了一个下午,如下:
Tue Dec 13 16:45:26 2016
Thread 1 cannot allocate new log, sequence 135756
Private strand flush not complete
  Current log# 3 seq# 135755 mem# 0: /u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_3_d4w6on9c_.log
Beginning log switch checkpoint up to RBA [0x2124c.2.10], SCN: 3935999647
Thread 1 advanced to log sequence 135756 (LGWR switch)
  Current log# 2 seq# 135756 mem# 0: /u01/app/oracle/flash_recovery_area/ORCL/onlinelog/o1_mf_2_d4w6snnh_.log
Tue Dec 13 16:50:32 2016
Completed checkpoint up to RBA [0x2124c.2.10], SCN: 3935999647

发现切换日志的时候做检查点:Beginning log switch checkpoint up to RBA [0x2124c.2.10], SCN: 3935999647
一直到检查点完成,Completed checkpoint up to RBA [0x2124c.2.10], SCN: 3935999647
这中间经历了5分钟时间,也就是说dbwr进程写磁盘需要5分钟

在日志扩大之前,切一次日志2分多的时间,这样切2次日志也在5分钟以内,上一个检查点还没有完成,就会触发新的检查点。
所以才一直报checkpoint not complete
可是,唐老师这样真的不会有影响吗?我在另一个野生的库上见过更频繁的切换,每分钟切换2到4次~,也是不断的checkpoint not complete
但是那个库却跑的好好的,并没有反馈说有什么问题。那个库跟这个库有差异,那个库版本是11.2.0.4的,另外有开归档。这个库版本11.2.0.1,没开归档。

唐老师,检查点可以允许这样永远都在这样没有完成的状态吗?
回复 支持 反对

使用道具 举报

1005

主题

1469

帖子

1万

积分

管理员

Rank: 9Rank: 9Rank: 9

积分
12012
发表于 2016-12-13 21:00:01 | 显示全部楼层
lujiaguai 发表于 2016-12-13 17:03
查阅课堂笔记,开启了检查点记录,观察了一个下午,如下:
Tue Dec 13 16:45:26 2016
Thread 1 cannot al ...
检查点可以允许这样永远都在这样没有完成的状态吗?

不是的,在它下面有一段距离后你会看到“Completed checkpoint up to RBA“
上课时讲过:所有fast_start_mttr_target增量检查点,都是延迟登记的。启动该检查点的SCN时,内存中还有比这个SCN更老的脏块未写。如果很多的话当然会经常看到“Checkpoint not complete”,下一个或下下一个检查点事件去“Completed checkpoint up to RBA“
查下fast_start_mttr_target不要是0
发现切换日志的时候做检查点:Beginning log switch checkpoint up to RBA [0x2124c.2.10], SCN: 3935999647
一直到检查点完成,Completed checkpoint up to RBA [0x2124c.2.10], SCN: 3935999647
这中间经历了5分钟时间,也就是说dbwr进程写磁盘需要5分钟

肯定不是持续写了 5分钟。
这里能不能理解为间隔不到3秒,这期间数据库是挂住的?之所以没有很明显的感觉,是因为间隔还是很短。
     如果存储很烂,这里间隔是不是会拉大,导致更严重的问题?
     我看过课堂上的试验环境以及这台真实环境在晚上空闲时候,是不会发生 cannot allocate new log,Checkpoint not complete的,只有在负载较大,日志切换频繁的时候才出现这种日志,那么您说真实环境里出现这样的情况是不是正常的?是不是看到这样的日志就意味着增加日志大小或者增加日志组?

日志组太少太小造成的经常“cannot allocate new log“,不是频繁出现就算正常。
当你加大后报告的错误体现数据块写得不够快:
给它增加一个 dbwr

进程问题跟以上没有直接因果关系,DEFAULT_MAINTENANCE_PLAN控制3个自动化后台任务的资源放大10%看看。操作系统的参数也要看看。
回复 支持 反对

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

QQ|手机版|Bo's Oracle Station   

GMT+8, 2024-12-5 03:10 , Processed in 0.047018 second(s), 25 queries .

快速回复 返回顶部 返回列表