Question: My replication feed is not working and is in the blocked state. What could be wrong?
Answer 1: Check the logs on the cluster where you created the feed. If you see the following, the username or password you entered when creating the feed is incorrect. Please correct the password and try again.
192.168.7.84 <180>2014-12-30 15:01:22,452 FEEDS.REPLICATION WARNING: got 401 -- permission problem
You may also check this way:
Assume an snmp password of "caringo" and source node IP address of 192.168.7.89:
Answer 2: If you are using a CSN earlier than CSN 7 (ie, CSN 6.5 or with CSN 3.1.5 with Swarm 6.5+), by default the cluster.cfg needs the cluster.proxyIPAddress parameter configured on the source side of the replication feed. This is auto-populated by CSN in CSN 7.x. If you are not using a CSN environment, you need to manually add this parameter to the source side and reboot the cluster.
It is obvious that this is the problem when looking at the Swarm logs on the target side (where you are replicating TO):
192.168.116.95 <179>2014-12-30 15:06:25,297 SCSP.READER ERROR: Could not HEAD source cluster via 192.168.7.89:80: CAStor could not prepare connection to 192.168.7.89 for HEAD / HTTP/1.1: TimeoutError. ((request:30938192 connection:140011236418408 label:replication feed ReplicationFeed from 192.168.7.89:80))
In this environment, 192.168.116.95 is in the private side of the target. 192.168.7.89 is the PRIVATE side of the source. Because the target cluster does not have knowledge of the private side of the source environment, you will need to add that proxy parameter to the source side CSN and reboot the source cluster.
Example in this environment, the cluster IP address of the source side:
[root@b-csn7 caringo]# ifconfig bond1:1 bond1:1 Link encap:Ethernet HWaddr 00:0C:29:31:17:32 inet addr:10.1.1.176 Bcast:10.1.1.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1
Therefore, I will now add 10.1.1.176 to the cluster.cfg:
and reboot the source side cluster. The source side cluster nodes send the address of where to retrieve the streams to the target side which is why you reboot the source side.
Answer 3: If you have multiple destinations listed in your replication feed, you will see the following in the source side Swarm logs if at least one of the destinations is not valid:
192.168.7.84 <179>2014-12-30 15:50:25,298 SCSP.CLIENT ERROR: connection setup exception to 10.1.1.220:80: ConnectError: 113: No route to host. 192.168.7.84 <179>2014-12-30 15:50:25,299 SCSP.CLIENT ERROR: <From File "/home/build/castor-g13.3.3-x86_64-debug/linux/scripts/../results/pybuild/lib/python/caringo/castor/protocol/scsp/client.py", line 2007, in connectToHost>#012Traceback (most recent call last):#012File "/home/build/castor-g13.3.3-x86_64-debug/linux/scripts/../results/pybuild/lib/python/caringo/castor/protocol/scsp/client.py", line 2002, in connectToHost#012ConnectError: ConnectError: 113: No route to host.
In this case, I added 10.1.1.220 to the list of Remote cluster proxy or cluster host(s) parameter, and that host is not available.
NOTE: if your destination is an SCSP proxy, like a dual network CSN for example, you do not need to configure all of the nodes in the replication feed definition. The nodes on the private remote side are unreachable anyhow, and the feed definition will glean these IPs from the proxy.
Answer 4: If you verify that the primary CSN can reach the DR CSN, and the DR CSN can reach the primary side CSN, but you are still getting messages in the log like this in the log at the DR site:
where 192.168.7.84 is the private IP address of the DR site node and 10.1.1.42 is the primary side's public CSN IP, that means that iptables on the DR CSN is disabled.
If you get no errors on the DR CSN with regard to reachability of the Primary CSN's IP address, it is possible that the primary CSN's iptables are not configured properly.
iptables is used by the CSN to do port address translation for the storage nodes' communication with the remote cluster. Without it, the nodes can't communicate with the opposite cluster, even if they can communicate fine with a local writing application (one with an interface in the private storage node network). You will likely also notice that health report communication errors will be seen in the logs where iptables isn't configured properly.
To check for this problem, simply run: iptables -L on the CSNs. If you don't see something similar to this, then run: /opt/caringo/csn/bin/setfirewall.sh on the CSN to enable default iptables rules: