Jump to content
OpenSplice DDS Forum
Sign in to follow this  
mindriot

rtps_init: failed to find a free participant index for domain

Recommended Posts

mindriot   

Hi all,

 

I am in a situation where I need to create a lot of independent processes on the same machine, all of which create DDS participants.

 

With the default OSPL configuration, I can create exactly 10. Number 11 fails to start because it quits with the error message

 

rtps_init: failed to find a free participant index for domain 1

 

This error comes from https://github.com/PrismTech/opensplice/blob/master/src/services/ddsi2/code/q_init.c#L540 and is due to an internal hard-coded maximum attempt limit of 10.

 

My questions:

  • Why on earth is there a hard-coded limit like that, with no way to configure it other than patching the source code?
  • Why is there a limit of 10 participants per machine? Is there a good reason for this?
  • Apparently I need to explicitly set a participant index to something other than auto. How do I do that?
  • I do not want to assign fixed indices, this should work automatically. Is there any way I can automatically test different indices myself? Note that the initialization code calls exit() on any failure, so I cannot check for errors without restarting the software.

How do I get out of this situation?

 

Thanks in advance.

Share this post


Link to post
Share on other sites
mindriot   

Note that the source code seems to indicate that the participant index has to be

 

a non-negative integer less than 120

So to me it would seem sensible to increase the max_attempts constant to 120 - or to make it configurable...

Share this post


Link to post
Share on other sites
mindriot   

Hi Hans,

 

thanks for the heads-up. In general that would be the right way to do it, but I should have mentioned that in my scenario I am unable to use Multicast altogether (I have systems in different subnets and IT refuses to give me a VLAN). I have a fixed set of peers in my configuration, but multiple participants on each peer machine.

 

I'll give it a try anyway and see what happens. What can I expect to happen? Could a Participant Index of 'none' function when I have explicit peer addresses?

 

Thanks,

m

Share this post


Link to post
Share on other sites
erik   

Hi,

 

What "goes wrong" when you raise the limit is that the unicast discovery will start blasting even larger numbers of packets into the network, and it has to do so periodically (the SPDPInterval). For each peer address, it sends a unicast packet to all N port numbers, so before you know it, the burst will be huge. There are some obvious ways of mitigating that, but those break support for asymmetrical discovery.

 

Obviously, it is a bit silly that it is hard-coded at 10. It is a historical artefact (as is the call to "exit") that simply never is an issue in federated (shared memory) deployments because the limit is on the number of DDSI2 instances, not participants, and also not in environments that support multicast. How it came to be that this found its way into the product I am unfortunately not at liberty to tell, but I suspect you would understand if you knew ... Anyway, it never got its priority raised because it never became a real issue ... such is life.

 

Please feel free to raise the limit and recompile, that has by far the shortest turn-around time. A periodic burst of packets presumably is better than a non-working system. In that case, please raise the limit in two (...) places: the one you found, and at https://github.com/PrismTech/opensplice/blob/master/src/services/ddsi2/code/q_addrset.c#L58. I will start the process of making it configurable, eliminating the call to exit(), and considering mitigations for the resulting packet bursts, but please be aware that whatever we do internally may take a while to reach github. I have very little influence on that.

 

The decisions what is freely available in the community edition and what is not are what they are, and you're welcome to use the community edition. It just happens that sometimes the commercial edition appears to be a better proposition on technical grounds, and from your description, I think yours is one of those cases. Since I don't know why you are using the community edition, for all I know, you may be in a position to consider switching to the commercial package.

 

If you are, you might want to look at the traffic overhead caused by having 10 hosts with 20 autonomous processes each, compared to having 10 hosts each containing a shared-memory deployment with 20 attached processes. The specification is freely downloadable, but I can give you the short summary: the mandated discovery protocol is quadratic in terms of the number of "participants" (scare quotes because in OpenSplice it is the number of DDSI2 instances, not application participants), and if there are multiple participants on a single node all subscribed to the same data, many copies will have to be sent. Both disappear in shared-memory deployments. (If you really want to scale up, you enter the territory of our Cloud package, with proper scalable discovery.)

 

If you can't go commercial and bump into issues of scale, the best I can advise is to look at the "poor man's" shared memory mode: multiple threads in a single application. A bit of trickery with the run-time linker can go a long way ...

 

Best regards,

Erik

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×