Jump to content
OpenSplice DDS Forum
Sign in to follow this  
JimHayes

Best Practices for Memory Management when using Java

Recommended Posts

JimHayes   

Greetings,

 

I'm using the OpenSplice Java api, and I suspect that managing memory is not quite as simple as the OpenSplice Java Reference Guide suggests.

 

The 'Memory Management' section of the reference says the following:

 

When objects are being created, they will occupy memory resources. Release of the

memory resources is the responsibility of the Java garbage collector. The memory of

an object is released, after all references to this object have run out of scope or have

explicitly been removed (set to null).

 

I think the reality is more complex.

If you rely on the garbage collector to clean up your data readers and writers, you will end up leaking memory.

 

Suppose you wanted to write one instance to a topic and you chose to implement that in the following way:

 

public void writeMyObject() {
   MyObject myObject = [create and populate the object];

   MyObjectDataWriter dataWriter = [create data writer];

   dataWriter.write(myObject, HANDLE_NIL.value);
}

 

After the completion of the method, the dataWriter instance will be out of scope, and if it were a normal java object it would be garbage collected.

But it is not a normal java object because it is associated through JNI with a native data writer.

JNI will maintain a reference to the java instance as long as the native instance exists (which will be forever since we didn't delete the writer).

This JNI reference will prevent the java garbage collector from freeing the memory associated with the data writer.

 

If you run code similar to the above example and then run a heap analyzer (like 'visualvm'), you will see that there are data

readers and writers existing in memory that are never going away because JNI is pointing at them.

 

So clearly I need to be deleting the data writer explicitly.

I have tried deleting the data writer immediately after writing the data, but when I do this the reader never receives the data.

It looks like I need to write the data, wait for enough time to elapse so that the reader gets a chance to read the data, and then delete the data writer.

 

I have found myself writing data like the following example, where I write some data and then schedule the data writer and instance for cleanup after a short wait:

 

public void writeMyObject() {
   MyObject myObject = [create and populate the object];

   MyObjectDataWriter dataWriter = [create data writer];

   dataWriter.write(myObject, HANDLE_NIL.value);

   // schedule the data writer for deletion and the instance for disposal.
   scheduleForDeletion(dataWriter);
   scheduleForDisposal(myObject);
}

 

This seems wrong to me and I'm sure that I am misunderstanding how I should be managing my resources.

 

So what is the proper way to read and write data, using the Java api, such that we do not leak memory?

 

Thanks in advance,

Jim

Share this post


Link to post
Share on other sites
namruuh   

Hi Jim,

 

The text regarding memory management that you quoted is correct, when all references to the datawriter are gone the datawriter is automatically deleted. However the publisher at which the datawriter was created also maintains a reference to the datawriter. As soon as that reference is gone (via the delete_datawriter call for example) the memory will be freed. Please see the JAVA reference manual, section 3.4.1.3 which documents the create_datawriter operation. It details how a datawriter can be deleted.

 

Now as far as your use case goes. It is a rather strange use case. You do not need to create a datawriter per object that you want to write and then delete the datawriter. More commonly one creates a datawriter for a type of objects. You then write an object with that datawriter using the write call on the datawriter for example. If then you never want send an update again for that object you simply use the unregister_instance call on the datawriter to indicate system wide that your writer will no longer publish updates for that object. However if you do plan to write more updates for that object, then it is wiser to leave the object registered as that is a performance gain.

 

Also you state that you do not read the object you wrote if you directly delete the datawriter after writing the object. I suspect this is a problem of how you read, because when you delete the datawriter you are stating to the system you will not write anymore updates for that object, which will cause a state change at the readers of that object. It will go from ALIVE to NOT_ALIVE_NOWRITERS, and if you read only ALIVE objects then you would in effect never see the object because it no longer matches your read condition. You should not need to delay the deletion of your writer to just get the object sent out, but again deleting the datawriter each time is not recommended at all. I'd highly advise to change that pattern to a more efficient one.

 

I hope this helps you forward a bit, and I would recommend studying our tutorial a little more to understand in more detail which steps happen and why they are done, as there seem to be some misunderstandings on various levels.

Share this post


Link to post
Share on other sites
JimHayes   

Emiel, thank you for your response.

 

I appreciate your feedback.

 

I'm sure there is some fundamental concept that I have failed to grasp.

 

I wonder if I could provide a quick example of what I am doing, and maybe this would expose a flaw in my approach:

 

Let's say that I have a topic for some object, let's call it 'Example'.

 

And lets say that the following relationships exist:

 

Object 'A' is publishing 'Example' objects, and object 'B' needs to read the 'Example' objects published by 'A'.

Object 'C' is publishing 'Example' objects, and object 'D' needs to read the 'Example' objects published by 'C'.

 

A ---> B

C ---> D

 

I want to ensure that object 'D' does not read in objects published by object 'A' and object 'B' does not read in objects published by 'c'.

 

In order to to this I have been using partitions:

The communication between 'A' and 'B' would be on partition "A to B".

The communication between 'C' and 'D' would be on partition "C to D".

 

This results in 2 data readers and 2 data writers:

Object 'A' uses a DataWriter for partition "A to B".

Object 'B' uses a DataReader for partition "A to B".

Object 'C' uses a DataWriter for partition "C to D".

Object 'D' uses a DataReader for partition "C to D".

 

In your response to me, you stated that one would commonly create one datawriter for a type of objects.

 

If I were to do this, and have one ExampleDataWriter (used by 'A' and 'C' in this example), and one ExampleDataReader (used by 'B' and 'D'),

how could I ensure that object 'B' only read in objects published by 'A' and object 'D' only read in objects published by 'C'?

 

So this is why I have ended up with lots of data reader and writer instances per object type, because the application that

I am impleneting has many pairs of objects like 'A and B' and 'C and D'.

 

Does this seem like a flawed approach?

 

Thank you for your time (and patience)!

 

Jim

Share this post


Link to post
Share on other sites
namruuh   

Hi Jim,

 

In DDS we use specific terminology when we are talking about problems which helps to create a common understanding about problems. As by your example I must admit I am a little confused as to how to related everything back to DDS terminology. I am especially confused by the use of the word 'Object' as it seems to have an ambiguous meaning. What I think you have said is:

 

You have defined a topic type struct in IDL, called 'example'. Based on this type you have used the create_topic call to make the topic known within DDS. The create_topic call will basically ensure that everyone in the system now knowns a topic with for this type with the specific topic name and QoS settings exist.

Now you are talking about object 'A', 'B', 'C' and 'D'. I am not sure what those are, part of me thinks these are different DDS applications, but the other part believes everything is happening within 1 application. Could you enlighten me?

 

Either way each application or thread or 'object' seems to use the same topic to communicate to data to a receiving application/thread/'object' and to prevent the data from overlapping you are using partitions to separate the data flows. And because you use many different partitions you create multiple datareaders/datawriters (which is ok in a way), my main concern in my previous post is that you created and then deleted each datawriter between each write of an instance of that topic. That is highly inefficient and ill advised.

 

I looked up the other posts you made on this forum, and saw that you seem to have a first approach of making a lot of topics that were just used once, you moved away from that when that proofed to be a problem. And now you are sort of doing the same with datareaders and datawriters, now having a lot of readers and writers is not bad and OpenSpliceDDS scales very well with having a lot of them on 1 node, I would avoid however to be constantly deleting and creating them, as that inevitably costs performance.

 

Your indication that you seem to just have 1 topic also gives a mild warning bell that maybe the information model is not correct and too much information is clumped up in 1 topic which causes the need to separate the information flows, though if that is the case you would likely have a few common attributes used of the 'example' topic by each 'object' and other then the common ones they all use different parts of the 'example' topic. But if that is not the case, then it could be fine as it is.

 

I do not fully understand your use case, which makes giving advice on the best approach difficult. However from the input received I would say that either you have an out of the ordinary use case and thus have an out of the ordinary use of DDS or DDS concepts are not yet fully understood and an approach for this application is taken based on those misunderstandings or based on previously used technology for solving a similar problem (or some similar reason). It could also be that I just do not fully understand the use case yet and that with further clarification it is not as out of the ordinary as thought.

 

All in all I would recommend to seek some assistance outside of this forum. We do have a few excellent sales engineers available which are dedicated to helping evaluating customers and of course paying customers out by spending time and effort on understanding your specific use case and your specific needs and how OpenSpliceDDS exactly fits into that. If you contact our sales team you can always look into what is possible in that area of support, even if what you are doing now is a mere prototype for possibly a later project.

Now I am not just saying that as a sales pitch (although selling OpenSpliceDDS support obviously puts the bread on the table ;)), but because I honestly think that your situation warrants some more specialized help in ensuring the basis of implementing your specific use case is correct and that a successful use of OpenSpliceDDS for your application is achieved. And that is something that's more difficult to achieve just here on the forum.

Share this post


Link to post
Share on other sites
JimHayes   

Hi Emiel,

 

Thanks for getting back to me so quickly.

 

First some clarification:

 

I don't always create a datawriter so that I can write a single object, I just used that as an example to demonstrate my suspicion that treating DataWriters (or DataReaders)

as regular java objects can cause a memory leak if the reference to the DataWriter goes out of scope before the DataWriter is explicitly deleted from the publisher.

When this happens, you no longer have a reference to the DataWriter, so you cannot tell the publisher to delete the DataWriter, and the garbage collector will never

reclaim that memory since JNI is pointing at it. I placed my example in a method with the intention of showing a DataWriter go out of scope, but I think I ended up adding confusion instead. :-(

 

 

Additionally, my previous example only referenced a single topic because I wanted to keep the example as simple as possible.

I had no intention of giving the impression that I am designing an application with one single monolithic topic used to transport all the data in my app.

 

 

About 'A', 'B', 'C' and 'D':

 

'A' is a java object that has a reference to a DataWriter and is responsible for publishing data on a particular topic/partition.

'B' is a java object that has a reference to a DataReader and is responsible for listening to data being published to a particular topic/partition (the same one used by 'A').

'A' and 'B' (and C and D) are objects participating in a distributed application that is making use of DDS to transport data between the various 'nodes' of the application.

 

Back to my issues with memory management:

 

Let's say that when 'A' is instantiated, it creates the DataWriter instance that it is going to use to publish data.

Let's say that object 'A' keeps its reference to this DataWriter for the duration of its lifetime.

Every time 'A' needs to publish data, it reuses this same DataWriter instance.

At some point, there will be no references to 'A' and the java garbage collector will decide to reclaim the memory used by 'A'.

I believe this is where a potential pitfall lies for Java programmers.

The DataWriter owned by 'A' needs to be explicitly deleted from the publisher before 'A' is garbage collected.

If publisher.delete_datawriter(datawriter) is not invoked at this point, then the garbage collector will never release the

resources associated with this DataWriter.

 

 

I think I need to work on the clarity of my questions in addition to my understanding of DDS :-)

 

Thanks again,

Jim

Share this post


Link to post
Share on other sites
namruuh   

Hey Jim,

 

I am glad the creation/write/delete of datawriters is just a simplification, though from your initial post this did not seem the case and thus indeed caused the confusion. Also not using a monolithic topic is very good, I just had a small concern that this might be the case and wanted to mention as not having a properly designed information model can lead to a lot of rework in the future and the earlier something like that is caught the better.

 

Now as far as your memory issues, there is an easy way out for your problems. Within java whenever an object is about to be garbage collected the 'finalize' call is invoked by the Java garbage collector. This gives you the chance to do some last minute clean up actions, and in your case it will give you the chance to delete the datawriter and such. Check out this link (first hit on google) http://www.janeg.ca/scjp/gc/finalize.html . With a reference to the datawriter maintained and the publisher (to call the delete_datawriter call on) you'll ensure that when your java object is garbage collected it will delete the datawriter in the finalize call and with that release all resources associated with it.

Share this post


Link to post
Share on other sites

The key to memory management is data structures, as well as how much performance you need / when. The tradeoff is often between memory and CPU cycles.

 

For example, a lot of memory can be occupied by caching, which is specifically there to improve performance since you are trying to avoid an expensive operation.

 

So think through your data structures and make sure you don't keep things in memory for longer than you have to. If it's a web app, avoid storing a lot of data into the session variable, avoid having static references to huge pools of memory, etc.

 

Clint

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×