Amardeep Sidhu

Subscribe to Amardeep Sidhu feed
Just another technology blog
Updated: 3 hours 25 min ago

[FATAL] [INS-44000] Passwordless SSH connectivity is not setup

Wed, 2021-03-10 06:47

Faced this while running installer for setting up a 2 node RAC setup (version 19.8) on an Oracle SuperCluster. The error reported in the log is:

[FATAL] [INS-44000] Passwordless SSH connectivity is not setup from the local node node1 to the following nodes:
[node2]
[INS-06006] Passwordless SSH connectivity not set up between the following node(s): [node2]

From the error it appears that the ssh is not setup between two nodes but actually that is not the case. Here the error message is bit misleading. It turned out to be an issue with scp with openssh version 8.x. Running the setup with -debug option gives the clue:

<protocol error: filename does not match request>

The reason is a new check introduced in openssh version 8.x. It is explained here, here and here. MOS note 2555697.1 also talks about it.

Workaround is to pass the -T option to scp to ignore the new checks. You can rename the scp binary to something like scp.original and create a new shell script there like this:

cd /usr/bin
mv scp scp.original
vi scp
/usr/bin/scp.original -T $*
chmod 555 scp

This time, the install should succeed. You can revert the changes back once the install is done.

Categories: BI & Warehousing

Doing an Exadata mixed cells config with OEDA

Tue, 2020-10-27 08:23

Earlier versions of OEDA didn’t allow you to have mixed cells in the configuration i.e. High Capacity (HC) and Extreme Flash (EF). The way to deal with that configuration was that deploy the system with either HC or EF cells and then manually configure the remaining cells.

I am not sure when did it change but the newer versions allow you have mixed type of cells in a single OEDA configuration. Once you select the hardware, there is an additional option called Enable Additional Storage, where you can select the other type of cells. The minimum number of cells has to be three to use this option. Also the cells that are at the bottom of the rack physically should be selected as main storage and the other cells should be added as additional storage as that is how OEDA builds the configuration files.

Once this is selected, on the Diskgroups screen, select Diskgroup layout as custom and you can create multiple diskgroups and select cells for each diskgroup (as EF & HC cells can’t be part of the same diskgroup).

Once the configuration is generated, it can be deployed with OneCommand without any manual intervention. A small feature but makes life easier by getting rid of all the manual steps.

Categories: BI & Warehousing

Implementing ZDLRA – Part 2

Tue, 2020-10-06 07:36

In part 1, we discussed few things that you should take care before implementation of a ZDLRA. In this post, we will discuss few more things that you should review before or at the time of implementation:

  1. If you are getting two ZDLRAs (one each for primary and standby sites), there are two ways they can be deployed. One scenario is where all the primary databases (or the database that have no standby) backup to RA at the primary site and then the data is replicated from primary RA to RA at the standby site. This works well for the DBs that have no standby database. For the DBs where there is a standby database, there is a better architecture that can be deployed. In that scenario, primary databases backup to primary RA and the standby databases backup to standby RA. That saves you all the traffic over replication network. Oracle has published a whitepaper on how to do this configuration. Few of the instructions in this paper are a bit dated but it gives a good overall idea of how to do the implementation.
  2. Keep an eye on the features supported for different DB versions. An interesting one is that real-time redo shipping from standby databases is supported on 12c+ databases only. It is not supported for 11g. There could be other similar things. MOS note 1995866.1 has these details.
  3. Depending upon the ZDLRA software version being deployed, it may need a minimum version of EM and the ZDLRA plugin. MOS note 2542836.1 has these details.
  4. Make sure after discovering the the primary and standby databases in EM, their primary-standby relationship is reflected.
  5. Real-time redo sent to ZDLRA is compressed but the archive logs backup will be compressed only if you use compression in the RMAN command. It is always good to include backup archivelog command with daily incremental job to make sure that no archive log is missed.
  6. Many of the environments have separate networks for backup traffic. Make sure the backup traffic to ZDLRA uses DB server’s backup network. If that is not the case, you may need to add an explicit route on DB server for ZDLRA client/VIP/scan IPs.
  7. There are going to be different users that you will need to use: one OS user for deploying the EM agent, one DB user that will be used to run the backups. Depending upon your environment, it may oracle OS user, SYS DB user or could be some other named user created for this purpose.

In next few posts, we will discuss some of the issues I have faced while doing ZDLRA implementation for some customers.

PS: Fernando Simon has written some brilliant posts related to ZDLRA on his blog. I highly recommend to review all of them. Brilliant stuff.

Categories: BI & Warehousing

PRVF-4657 : Name resolution setup check for “db-scan” (IP address: x.x.x.101) failed

Fri, 2020-09-25 09:11

A quick note about an error I faced while running root.sh on an Exadata machine. The configuration tools failed with the following error:

Error is PRVF-4657 : Name resolution setup check for "db-scan" (IP address: x.x.x.101) failed

I did nslookup on the scan name and it all seemed good. So why the error ? After spending another 5 minutes, I looked at /etc/hosts and there was it. Someone had populated /etc/hosts of DB nodes with all the hostnames entries including the scan name. Something like:

x.x.x.101	db-scan.example.com	db-scan
x.x.x.102	db-scan.example.com	db-scan
x.x.x.103	db-scan.example.com	db-scan

As /etc/hosts can return only one IP against a hostname whereas for scan, DNS is supposed to return 3 IPs, hence the problem. The solution is to comment out the scan name entries in /etc/hosts on all the db nodes and let the system do the name resolution via the DNS.

Categories: BI & Warehousing

Implementing ZDLRA – Part 1

Wed, 2020-09-09 08:03

Zero Data Loss Recovery Appliance (ZDLRA) is Oracle’s solution for database backups. It has many advantages over other backup solutions that are available in the market. This post has a brief introduction to ZDLRA and few links for further reading. This is a quick post about few of things that you should keep in mind if you are planning to get a ZDLRA (RA in short). Of course, there is a lot more that is needed while executing the whole plan, but these are some of the basics:

  1. The very first thing is capacity planning. Depending upon the number & sizes of the DBs that you plan to backup, you need to choose the required configuration. In most cases, an Oracle guy would be doing this for you but you should actively participate in the exercise by providing all the necessary information so that the calculations can be as accurate as possible.
  2. Another things that plays an important role in deciding the capacity needed is the retention period i.e. period for which you would like to keep the backups in RA. More the number of days, more is the space that you will need.
  3. Another important thing to consider is whether you are getting only one RA (for primary or standby site) or getting two of them i.e. one each for primary and standby site. Both scenarios need different type of configurations (including the bandwidth requirements between primary and standby sites) so it needs to be planned accordingly.
  4. One more aspect you need to consider is long term retention. It could be Oracle Cloud object storage or some tape solution.
  5. Once you have enabled DB backups to ZDLRA, you will need to stop all other backups. Plan that accordingly. Oracle provides way to run the legacy and ZDLRA backups together but that is for short duration i.e. when you are migrating from legacy backups to ZDLRA. That is not really a way to run 2 backup strategies together for long term.

In the next post, will talk about few more things that are important at the time of actual implementation.

Categories: BI & Warehousing

Using Secure Fabric for network isolation in KVM environments on Exadata

Fri, 2020-07-17 10:51

Exadata storage software version 20.1 introduces a new feature called “Secure Fabric” for KVM based multi cluster deployments (Exadata X8M). It enables network isolation between multiple tenants (i.e. KVM VMs based RAC clusters). This feature aligns with Infiniband Partitioning on OVM based systems. There are customers who in such scenarios want that VMs of one RAC shouldn’t be able to see traffic of the other RAC VMs. This feature achieves that. Similar to Pkeys in IB switches, here it uses a double VLAN tagging system where the first tag identiefies the network partition and the second tag is used to denote membership level of the VM. Exadata documention has more details.

The minimum Exadata software version needed to enable this feature is 20.1. This release comes with RoCE switches firmware version 7.0(3)I7(8).

Starting Jun 2020, OEDA supports this configuraion and this feature can be enabled in OEDA itself. To enable it in OEDA, under Cluster Networks click on the Advanced button and you will see the Enable Secure Fabric option.

Once this option is enabled, you will see VLANs enabled for the private network. While doing the deployment, OneCommand will take care of the configuration needed.

As per documentation, at present there is no way to enable it on existing systems except doing a re-deployment.

Categories: BI & Warehousing

Exadata Virtualized DB node restore

Mon, 2020-05-11 11:01

There are two common scenarios when we may need this:

  • An existing DB node has crashed and is unrecoverable (due to some failure and non-availability of any backups. Though some of the things may need to be done even if the backups were available).
  • We have an existing Exadata rack that is virtualized. Now there is a new DB node and the existing clusters need to be extended to include the VMs on this new node.

I recently faced the first scenario where a virtualized DB node crashed and wasn’t recoverable. A bare metal DB node restore is a relatively simple procedure where we just have to reimage the node, create the needed directories, users etc and add it to the RAC cluster. In case of virtualization, the creation of VMs is an additional step that needs to be done. That makes it slightly more complex.

So the scenario is that we have an Exadata quarter rack where DB node1 has issues and needs to be reimaged and reconfigured. There are multiple VMs (so RAC clusters) created. As one of the DB node has gone down, each RAC cluster is running with one less instance. This failed node will need to be cleaned up from the RAC configuration before adding it back. Here are the steps that we need to follow to restore it back:

  1. Reimage the node using an ISO and make it ready for creation of User Domains (aka VMs)
  2. Create the required VMs
  3. Create the required users, setup ssh with other nodes
  4. Clear the failed node configuration from existing RAC clusters
  5. Add the newly created VMs back to the respective RAC clusters

Now let’s discuss these steps in detail.

  1. Reimage : The simplest way to reimage an Exadata node is to connect the ISO (We can download the ISO for the version we need from MOS note 888828.1) using ILOM, set the next boot device to CD-ROM, reboot/reset the node and let it boot from CD-ROM. Most of the installation part is automated and doesn’t ask any questions. Once it is done installing, ipconf starts in interactive mode and asks for all the information like Name servers, NTP servers, IP addresses and hostnames for various network interfaces etc. Once done, it will boot into the Linux partition. Since we need to virtualize the node, we need to switch it to OVS by running a script called /opt/oracle.SupportTools/switch_to_ovm.sh. It will reboot the node to OVS partition. Next step is to run reclaim /opt/opt/oracle.SupportTools/reclaim.sh -free -reclaim to reclaim the space used for bare metal partition. At this moment we are done with the reimaging part. To use ILOM in a browser and be able to access the console, we need a Java enabled Windows/Linux system. And if there is a firewall between that system and the server, this link lists the ports that need to be allowed in the firewall.
  2. VMs creation : Next step is the creation of VMs. We will use OneCommand to achieve this. In this case, we had the original XML file used for deployment. Now we need to edit that configuration and remove the existing node’s details from it. We can import the XML into OEDA, make the required changes and save the configuration files. This needs to be done carefully as a simple mistake like a duplicate IP may cause issues with the ASM/DBs running on the other node. Once this is done, we can download the OneCommand patch (MOS note 888828.1) and run the create VMs step of OneCommand. As we have only one node in the XML file, so it is not going to touch the existing configuration.
  3. Create users : Now we need to create the users on the newly created VMs. OneCommand’s create users step can be used here. It will create users on all the VMs. There are some things that we need to do manually here. First thing is to remove binaries from Grid & DB home. As we are going to use addnode.sh to add new nodes to existing RAC clusters, so binaries are going to be copied from an existing node. Then we need to change ownership of Grid & DB home directory tree to oracle:oinstall. Also for each VM, we need to setup passwordless ssh with the respective other VM (& vice versa) that is going to be part of the cluster.
  4. Clear failed node config : Next we need to clear the failed node’s configuration from each of the RAC clusters. That is pretty much the standard stuff we do in RAC.
  5. Add the new nodes : This again is just the standard addnode stuff we do in RAC.

I have used the terms VM and Node interchangeably here but the context should make it clear if I am referring to the physical node or a VM. There is another method to do this using OEDACLI and it is documented in Exadata documentation. That automates a lot of these things. Check this link for the details.

Categories: BI & Warehousing

dbnodeupdate.sh appears to be stuck

Fri, 2019-12-20 19:07

I was patching an Exadata db node from 18.1.5.0.0.180506 to 19.3.2.0.0.191119. It had been more than an hour and dbnodeupdate.sh appeared to be stuck. Trying to ssh to the node was giving “connection refused” and the console had this output (some output removed for brevity):

[  458.006444] upgrade[8876]: [642/676] (72%) installing exadata-sun-computenode-19.3.2.0.0.191119-1...
<>
[  459.991449] upgrade[8876]: Created symlink /etc/systemd/system/multi-user.target.wants/exadata-iscsi-reconcile.service, pointing to /etc/systemd/system/exadata-iscsi-reconcile.service.
[  460.011466] upgrade[8876]: Looking for unit files in (higher priority first):
[  460.021436] upgrade[8876]: /etc/systemd/system
[  460.028479] upgrade[8876]: /run/systemd/system
[  460.035431] upgrade[8876]: /usr/local/lib/systemd/system
[  460.042429] upgrade[8876]: /usr/lib/systemd/system
[  460.049457] upgrade[8876]: Looking for SysV init scripts in:
[  460.057474] upgrade[8876]: /etc/rc.d/init.d
[  460.064430] upgrade[8876]: Looking for SysV rcN.d links in:
[  460.071445] upgrade[8876]: /etc/rc.d
[  460.076454] upgrade[8876]: Looking for unit files in (higher priority first):
[  460.086461] upgrade[8876]: /etc/systemd/system
[  460.093435] upgrade[8876]: /run/systemd/system
[  460.100433] upgrade[8876]: /usr/local/lib/systemd/system
[  460.107474] upgrade[8876]: /usr/lib/systemd/system
[  460.114432] upgrade[8876]: Looking for SysV init scripts in:
[  460.122455] upgrade[8876]: /etc/rc.d/init.d
[  460.129458] upgrade[8876]: Looking for SysV rcN.d links in:
[  460.136468] upgrade[8876]: /etc/rc.d
[  460.141451] upgrade[8876]: Created symlink /etc/systemd/system/multi-user.target.wants/exadata-multipathmon.service, pointing to /etc/systemd/system/exadata-multipathmon.service.

There was not much that I could do so just waited. Also created an SR with Oracle Support and they also suggested to wait. It started moving after some time and completed successfully. Finally when the node came up, i checked that there was an NFS mount entry in /etc/rc.local and that was what created the problem. For the second node, we commented this out and it was all smooth. Important to comment out all NFS entries during patching to avoid all such issues. I had commented the ones in /etc/fstab but the one in rc.local was an unexpected one.

Categories: BI & Warehousing

Understanding grid disks in Exadata

Mon, 2019-02-18 07:07

Use of Exadata storage cells seems to be a very poorly understood concept. A lot of people have confusions about how exactly ASM makes uses of disks from storage cells. Many folks assume there is some sort of RAID configured in the storage layer whereas there is nothing like that. I will try to explain some of the concepts in this post.

Let’s take an example of an Exadata quarter rack that has 2 db and 3 storage nodes (node means a server here). Few things to note:

  • The space for binaries installation on db nodes comes from the local disks installed in db nodes (600GB * 4 (expandable to 8) configured in RAID5). In case you are using OVM, same disks are used for keeping configuration files, Virtual disks for VMs etc.
  • All of the ASM space comes from storage cells. The minimum configuration is 3 storage cells.

So let’s try to understand what makes a storage cell. There are 12 disks in each storage cell (latest X7 cells are coming with 10 TB disks). As I mentioned above that there are 3 storage cells in a minimum configuraiton. So we have a total of 36 disks. There is no RAID configured in the storage layer. All the redundancy is handled at ASM level. So to create a disk group:

  • First of all cell disks are created on each storage cell. 1 physical disk makes 1 cell disk. So a quarter rack has 36 cell disks.
  • To divide the space in various disk groups (by default only two disk groups are created : DATA & RECO; you can choose how much space to give to each of them) grid disks are created. grid disk is a partition on the cell disk. slice of a disk in other words. Slice from each cell disk must be part of both the disk groups. We can’t have something like say DATA has 18 disks out of 36 and the RECO has another 18. That is not supported. Let’s say you decide to allocate 5 TB to DATA grid disks and 4 TB to RECO grid disks (out of 10 TB on each disk, approx 9 TB is what you get as usable). So you will divide each cell disk into 2 parts – 5 TB and 4 TB and you would have 36 slices of 5 TB each and 36 slices of 4 TB each.
  • DATA disk group will be created using the 36 5 TB slices where grid disks from each storage cell constitute one failgroup.
  • Similarly RECO disk group will be created using the 36 4 TB slices.

What we have discussed above is a quarter rack scenario with High Capacity (HC) disks. There can be somewhat different configurations too:

  • Instead of HC disks, you can have the Extreme Flash (EF) configuration which uses flash cards in place of disks. Everything remains the same except the number. Instead of 12 HC disks there will be 8 flash cards.
  • With X3 I think, Oracle introduced an eighth rack configuration. In an eighth rack configuration db nodes come with half the cores (of quarter rack db nodes) and storage cells come with 6 disks in each of the cell. So here you would have only 18 disks in total. Everything else works in the same way.

Hope it clarified some of the doubts about grid disks.


Categories: BI & Warehousing

ORA-04080: trigger ‘PRICE_HISTORY_TRIGGERV1’ does not exist

Tue, 2019-01-22 07:45

It is actually a dumb one. I was disabling triggers in a schema and ran this SQL to generate the disable statements. (Example from here)

HR@test> select 'alter trigger '||trigger_name|| ' disable;' from user_triggers where table_name='PRODUCT';

'ALTERTRIGGER'||TRIGGER_NAME||'DISABLE;'
--------------------------------------------------------------------------------
alter trigger PRICE_HISTORY_TRIGGERv1 disable;

HR@test> alter trigger PRICE_HISTORY_TRIGGERv1 disable;
alter trigger PRICE_HISTORY_TRIGGERv1 disable
*
ERROR at line 1:
ORA-04080: trigger 'PRICE_HISTORY_TRIGGERV1' does not exist


HR@test>

WTF ? It is there but the disable didn’t work. I was in hurry, tried to connect through SQL developer and disable and it worked ! Double WTF ! Then i spotted the problem. Someone created it with one letter in the name in small. So to make it work, we need to use double quotes.

HR@test> alter trigger "PRICE_HISTORY_TRIGGERv1" disable;

Trigger altered.

HR@test>

One of the reasons why you shouldn’t use case sensitive names in Oracle. That is stupid.

Categories: BI & Warehousing

Error while running ggsci

Sat, 2019-01-12 08:21

This was another issue that I faced while trying to configure GoldenGate in HA mode. ggsci was working fine after normal installation but after configuring it in HA mode and trying to run ggsci, it resulted in this:

[oragg@node2 product]$ ggsci
Oracle GoldenGate Command Interpreter for Oracle
Version 12.3.0.1.4 OGGCORE_12.3.0.1.0_PLATFORMS_180415.0359_FBO
Linux, x64, 64bit (optimized), Oracle 12c on Apr 16 2018 00:53:30
Operating system character set identified as UTF-8.
Copyright (C) 1995, 2018, Oracle and/or its affiliates. All rights reserved.
2019-01-08 16:28:37.913
CLSD: An error occurred while attempting to generate a full name. Logging may not be active for this process
Additional diagnostics: CLSU-00100: operating system function: sclsdgcwd failed with error data: -1
CLSU-00103: error location: sclsdgcwd2
(:CLSD00183:)
GGSCI (exadatadb02.industowers.com) 1>

No obvious clues in the error message but little searching revealed that it had something to do with permissions. It was on Exadata so i tried to do a strace of ggsci and see if it could give some clues. There we go:

[oragg@node2 product]$ strace ggsci
.
.
mkdir("/u01/app/oracle/product/12.1.0.2/dbhome_4/log/exadatadb02", 01777) = -1 EACCES (Permission denied)

That is the Oracle database home, GoldenGate is supposed to use. It is trying to create a directory at the mentioned path and not able to do it. There was another directory called client needed inside this. I created both of them and set the needed permissions & the sticky bit and it worked fine. It was working fine on the other node, so i could check the permissions over there and do the same on this node.

Categories: BI & Warehousing

Failed to execute the command “”/u01/app/xag/bin/clsecho”

Tue, 2019-01-08 11:22

I was configuring GoldenGate in HA mode by following this document. Everything worked ok but in the end while running agctl config goldengate to view the configuration of GoldenGate resource, it was failing with the following error:

[oracle@exadatadb02 ~]$ agctl config goldengate GG_TARGET
Failed to execute the command ""/u01/app/xag/bin/clsecho" -p xag -f xag -m 5080 "GG_TARGET"" (rc=134), with the message:
Oracle Clusterware infrastructure fatal error in clsecho.bin (OS PID 126367_140570897783808): Internal error (ID (:CLSB00107:)) - Error -1 (ORA-08275) determining Oracle base
/u01/app/xag/bin/clsecho: line 45: 126367 Aborted (core dumped) ${CRS_HOME}/bin/clsecho.bin "$@"
Failed to execute the command ""/u01/app/xag/bin/clsecho" -p xag -f xag -m 5081 "/u01/app/oragg/product"" (rc=134), with the message:

If you look at the error in bold it sounds kinda obvious that it is not able to figure our where the ORACLE_BASE is. But somehow it didn’t strike me at that moment. So started looking around. If we look at the command it is running, it runs clsecho. This is simply a shell script which in turn calls $CRS_HOME/bin/clsecho.bin . In the script, it sets various environment variables and that is where the problem was. There are lines like:

ORACLE_BASE=
export ORACLE_BASE

Nowhere in the script, it is setting the value of ORACLE_BASE. That was causing an issue. I changed the first line to set the ORACLE_BASE location and it worked fine after that. There was another issue i faced with ggsci after doing xag configuration. Will do another blog post on that.

Categories: BI & Warehousing

dbca doesn’t list diskgroups

Wed, 2018-12-26 09:31

This is an Exadata machine running GI version 18.3.0.0.180717 and DB version 12.1.0.2.180717. On one of the DB nodes while running dbca, it doesn’t list the diskgroups. it works fine on the other node.

I cheked the dbca trace and found that the kfod command was failing. I tried to run it manually and got the same error:

[oracle@exadb01 ~]$ /u01/app/18.0.0.0/grid/bin/kfod op=groups verbose=true
KFOD-00300: OCI error [-1] [OCI error] [Could not fetch details] [-105777048]

KFOD-00105: Could not open pfile 'init@.ora'
[oracle@exadb01 ~]$

I ran it with strace then:

[oracle@exadb01 ~]$ strace /u01/app/18.0.0.0/grid/bin/kfod op=groups verbose=true
execve("/u01/app/18.0.0.0/grid/bin/kfod", ["/u01/app/18.0.0.0/grid/bin/kfod", "op=groups", "verbose=true"], [/* 18 vars */]) = 0
brk(0) = 0x2641000
.
.
.
.
.
open("/u01/app/18.0.0.0/grid/dbs/ab_+ASM1.dat", O_RDONLY) = -1 EACCES (Permission denied)
geteuid() = 1003
open("/u01/app/18.0.0.0/grid/rdbms/mesg/kfodus.msb", O_RDONLY) = 13
fcntl(13, F_SETFD, FD_CLOEXEC) = 0
lseek(13, 0, SEEK_SET) = 0
read(13, "\25\23\"\1\23\3\t\t\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, 280) = 280
lseek(13, 512, SEEK_SET) = 512
read(13, "\352\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, 512) = 512
lseek(13, 1024, SEEK_SET) = 1024
read(13, ".\1=\1E\1M\1X\1\352\3\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"…, 512) = 512
lseek(13, 1536, SEEK_SET) = 1536
read(13, "\n\0d\0\0\0D\0e\0\1\0e\0f\0\1\0\230\0g\0\1\0\306\0h\0\2\0\325\0"…, 512) = 512
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), …}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f43f85f2000
write(1, "KFOD-00300: OCI error [-1] [OCI "…, 78KFOD-00300: OCI error [-1] [OCI error] [Could not fetch details] [-132605848]
) = 78

The text in bold just before the kfod error caught my attention. When I checked actually oracle user wasn’t able to read the file. The permissions looked like this:

[root@exadb01 dbs]# ls -ltr
total 20
-rw-r--r-- 1 oragrid oinstall 3079 May 14 2015 init.ora
-rw-r--r-- 1 oragrid oinstall 587 Dec 12 15:33 initbackuppfile.ora
-rw-rw---- 1 oragrid asmadmin 1656 Dec 20 14:26 ab_+ASM1.dat
-rw-rw---- 1 oragrid oinstall 1544 Dec 20 14:26 hc_+APX1.dat
-rw-rw---- 1 oragrid oinstall 1544 Dec 21 16:57 hc_+ASM1.dat
[root@exadb01 dbs]#

Whereas on node2 they were like:

[oracle@exadb02 dbs]$ ls -ltr 
total 16
-rwxrwxrwx 1 oragrid oinstall 3079 Dec 12 14:52 init.ora
-rwxrwxrwx 1 oragrid oinstall 1544 Dec 21 16:57 hc_+ASM2.dat
-rw-rw---- 1 oragrid oinstall 1720 Dec 21 16:57 ab_+ASM2.dat
-rwxrwxrwx 1 oragrid oinstall 1544 Dec 21 16:57 hc_+APX2.dat
[oracle@exadb02 dbs]$

Since oracle user isn’t member of asmadmin group, it is not able to read the meniotned file. Changing the owner to oracle:oinstall fixed the issue.

Categories: BI & Warehousing

New web based OEDA for Exadata

Wed, 2018-11-21 03:17

It started with an xls sheet (that was called dbm configurator) . Then OEDA (Oracle Exadata Deployment Assistant) was introduced that was a Java based GUI tool to enter all the information needed to configure an Exadata machine. Now with the latest patch released in Oct, OEDA has changed again; to become a web based tool. It is deployed on WebLogic and comes with some new features as well. SuperCluster deployments will continue to use the Java based OEDA tool.  The new interface has support for Exadata, ZDLRA and ExaCC. It is backward compatible and can import the XMLs generated by older versions of OEDA. Some of the new features include the ability to configure single instance homes, create more than 2 diskgroups, create more than 1 database homes and databases, allow ILOMs to have a different subnet etc.

To configure the OEDA application you need to unzip the contents and run the installWls script with -p switch (that mentions the port). It will deploy the application on WebLogic and give you the URL to access the OEDA. The interface is similar to the older version. Just that it runs in a browser and there are some new features added. MOS note 2460104.1 and the Exadata documentation has more details:

Using Oracle Exadata Deployment Assistant

 

 

Categories: BI & Warehousing

Garbled display while running FMW installer on Linux

Sat, 2017-11-18 04:56

A colleague faced this while running FMW installer on a Linux machine. The display appeared like this

 

 

 

 

 

 

 

This thread gave a clue that it could have something to do with fonts. So I checked what all fonts related stuff was installed.

[root@someserver ~]# rpm -aq |grep -i font
stix-fonts-1.1.0-5.el7.noarch
xorg-x11-font-utils-7.5-20.el7.x86_64
xorg-x11-fonts-cyrillic-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-1-75dpi-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-9-100dpi-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-9-75dpi-7.5-9.el7.noarch
libXfont-1.5.2-1.el7.x86_64
xorg-x11-fonts-ISO8859-14-100dpi-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-1-100dpi-7.5-9.el7.noarch
xorg-x11-fonts-75dpi-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-2-100dpi-7.5-9.el7.noarch
libfontenc-1.1.3-3.el7.x86_64
xorg-x11-fonts-ethiopic-7.5-9.el7.noarch
xorg-x11-fonts-100dpi-7.5-9.el7.noarch
xorg-x11-fonts-misc-7.5-9.el7.noarch
fontpackages-filesystem-1.44-8.el7.noarch
fontconfig-2.10.95-11.el7.x86_64
xorg-x11-fonts-ISO8859-2-75dpi-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-14-75dpi-7.5-9.el7.noarch
xorg-x11-fonts-Type1-7.5-9.el7.noarch
xorg-x11-fonts-ISO8859-15-75dpi-7.5-9.el7.noarch
[root@someserver ~]#

stix-fonts looked suspicious to me. So I removed that with rpm -e stix-fonts.

That actually fixed the issue. After this the Installer window was displaying fine.

 

Categories: BI & Warehousing

root.sh fails with CRS-2101:The OLR was formatted using version 3

Sat, 2017-11-18 04:33

Got this while trying to install 11.2.0.4 RAC on Redhat Linux 7.2. root.sh fails with a message like

ohasd failed to start
Failed to start the Clusterware. Last 20 lines of the alert log follow:
2017-11-09 15:43:37.883:
[client(37246)]CRS-2101:The OLR was formatted using version 3.

This is bug 18370031. Need to apply the patch before running root.sh.

Categories: BI & Warehousing

Presenting at Cloud day event of North India Chapter of AIOUG

Mon, 2017-11-06 05:47

I will be presenting a session titled “An 18 pointers guide to setting up an Exadata machine” at Cloud Day being organized by North India chapter of AIOUG. Vivek Sharma is doing multiple sessions on various cloud and performance related topics. You can register for the event here

https://www.meraevents.com/event/aioug-nic-cloud-day 

 

Categories: BI & Warehousing

ksplice kernel updates and Exadata patching

Sun, 2017-11-05 11:32

If you have installed some one off ksplice fix for kernel on Exadata, remember to uninstall it before you do a kernel upgrade  eg regular Exadata patching. As such fixes are kernel version specific so they may not work with the newer version of the kernel. 

Categories: BI & Warehousing

ORA-15040 ORA-15042 with EXTERNAL redundancy Diskgroup

Fri, 2017-11-03 12:57

A colleague was working on an ASM issue (Standalone one, Version 11.2.0.3 on AIX) at one of the customer sites. Later on, I also joined him. The issue was that the customer added few news disks to an existing diskgroup. Everything went well and the rebalance kicked in. After some time, something happened and all of a sudden the diskgroup was dismounted. While trying the mount the diskgroup again, it was giving

ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "27" is missing from group number "2"

Here is the relevant text from the ASM alert log

ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 19: <strong>No such device</strong>
Additional information: -1
Additional information: 1048576
WARNING: <strong>Write Failed</strong>. group:2 disk:27 AU:1005 offset:0 size:1048576
Fri Nov 03 10:55:27 2017
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_dbw0_58983380.trc:
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 19: No such device
Additional information: -1
Additional information: 4096
WARNING: Write Failed. group:2 disk:27 AU:0 offset:16384 size:4096
NOTE: cache initiating offline of disk 27 group DATADG
NOTE: process _dbw0_+asm1 (58983380) initiating offline of disk 27.3928481273 (DISK_01) with mask 0x7e in group 2
Fri Nov 03 10:55:27 2017
WARNING: Disk 27 (DISK_01) in group 2 mode 0x7f is now being offlined
WARNING: Disk 27 (DISK_01) in group 2 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 2, dsk = 27/0xea27ddf9, mask = 0x6a, op = clear
ERROR: failed to copy file +DATADG.263, extent 1952
GMON updating disk modes for group 2 at 36 for pid 9, osid 58983380
ERROR: Disk 27 cannot be offlined, since diskgroup has external redundancy.
ERROR: too many offline disks in PST (grp 2)
ERROR: ORA-15080 thrown in ARB0 for group number 2
Errors in file /u01/app/oracle/diag/asm/+asm/+ASM1/trace/+ASM1_arb0_57672234.trc:
ORA-15080: synchronous I/O operation to a disk failed
Fri Nov 03 10:55:27 2017
NOTE: stopping process ARB0
WARNING: Disk 27 (DISK_01) in group 2 mode 0x7f offline is being aborted
WARNING: Offline of disk 27 (DISK_01) in group 2 and mode 0x7f failed on ASM inst 1
NOTE: halting all I/Os to diskgroup 2 (DATADG)
Fri Nov 03 10:55:28 2017
NOTE: cache dismounting (not clean) group 2/0xDEB72D47 (DATADG)
NOTE: messaging CKPT to quiesce pins Unix process pid: 62128816, image: oracle@tiiproddb1.murugappa.co.in (B000)
NOTE: dbwr not being msg'd to dismount
Fri Nov 03 10:55:28 2017
NOTE: LGWR doing non-clean dismount of group 2 (DATADG)
NOTE: LGWR sync ABA=124.7138 last written ABA 124.7138
NOTE: cache dismounted group 2/0xDEB72D47 (DATADG)
SQL> alter diskgroup DATADG dismount force /* ASM SERVER */ 

At this stage disk 27 was not readable even with dd. So that means something is wrong with the disk. Since it is an external redundancy diskgroup not much can be done until the disk becomes available.

Speaking to the storage team cleared the air. One that the disk had gone offline at storage level so that is why even dd was not able to read it. Two that all these disks were thin provisioned (over provisioning of the storage space to improve the utilization; similar to over provisioning of CPU cores in the Virtualization world) from the storage. This particular disk 27 was meant for some other purpose but got wrongly allocated to this diskgroup. The actual space available in the pool (of this disk) was less than what was needed. The moment disks were added to the diskgroup, the rebalance kicked in and ASM started writing data to the disk. Within few minutes space became full and the storage software took the disk offline. Since ASM couldn’t write to the disk, the diskgroup was dismounted.

Fortunately, in the same pool, there was another disk that was still unused. So the storage guy dropped that disk and it freed up some space in the pool. He brought this disk 27 online after that. Diskgroup got mounted and the rebalance kicked in again. Finally, we dropped this disk and the rebalance started again. Once the rebalance completed, disk was free to be taken offline.

 

Categories: BI & Warehousing

TNS-12543: TNS:destination host unreachable

Fri, 2017-07-14 23:53

Scenario : Setting up a physical standby from Exadata to a non-Exadata single instance. tnsping from standby to primary works fine but tnsping from primary to standby fails with:

TNS-12543: TNS:destination host unreachable

I am able to ssh standby from primary, can ping as well but tnsping doesn’t work.  From the error description we can figure out that something is blocking the access. In this case it was iptables that was enabled on the standby server.

Stopping the service resolved the issue.

service iptables stop
chkconfig iptables off

The error is an obvious one but sometimes it just doesn’t strike you that it could be something simple like that.

Categories: BI & Warehousing

Pages