Fasterj

|home |articles |cartoons |site map |contact us |
Tools: | GC log analysers| Multi-tenancy tools| Books| SizeOf| Thread analysers| Heap dump analysers|

Our valued sponsors who help make this site possible
JProfiler: Get rid of your performance problems and memory leaks! 

Training online: Concurrency, Threading, GC, Advanced Java and more ... 

Using SharedHashMap

JProfiler
Get rid of your performance problems and memory leaks!

Modern Garbage Collection Tuning
Shows tuning flow chart for GC tuning


Java Performance Tuning, 2nd ed
The classic and most comprehensive book on tuning Java

Java Performance Tuning Newsletter
Your source of Java performance news. Subscribe now!
Enter email:



JProfiler
Get rid of your performance problems and memory leaks!


1 - Using SharedHashMap | 2 - "no-copy" mode | 3 - Concurrency handling and thread safety | 4 - Appendix: interfaces supported | View All

In this article Jack Shirazi and Peter Lawrey give a worked example of using SharedHashMap, a high performance persisted off-heap hash map, shareable across processes.
Published March 2014, Authors Jack Shirazi and Peter Lawrey

ProcessInstanceLimiter

This article describes how to use SharedHashMap by way of an example of limiting the number of operating system processes that can be started with the common code. If you are just looking for that process limiting capability without further details, it's directly available from the ProcessInstanceLimiter class, part of the OpenHFT distribution.

What is SharedHashMap?

SharedHashMap provides a hash map implementation that uses a memory mapped file to store its entries. This has two important implications:

Since its storage is backed by a file, the entries are also persistent.

SharedHashMap is primarily targeted at high performance off-heap memory storage of entries for low latency applications, to avoid GC overheads on that data. To fully support this goal, SharedHashMap can be used in a "no-copy" mode (shown later in this article).

Because it uses resources outside the heap, SharedHashMap is not a map that you want to use as a default map; instead it should be used when you have a particular need, typically because you either want to share entries across processes, or because you want off-heap memory storage in a map format. If you only want persistence of map entries, you can use SharedHashMap, but a map implementation using a journaled log could be more efficient.

This article shows how to use SharedHashMap, and how to optimise using it. The article uses a real-world example, of preventing multiple instances of an application from starting, by using SharedHashMap's shared map to coordinate across processes. The example is specifically chosen to use SharedHashMap within a multi-process interaction.

Using SharedHashMap as a shared map

Getting started, we first need to create our shared map. This is done via a map builder and, unlike most other maps, we need a file location for the SharedHashMap when constructing (the file doesn't need to exist, the builder will create it):

	SharedHashMapBuilder builder = new SharedHashMapBuilder();
	String shmPath = System.getProperty("java.io.tmpdir") + System.getProperty("file.separator") + "SHMTest1";
	//Declare as a ConcurrentMap rather than Map if you want to use putIfAbsent()
	Map<String, SHMTest1Data> theSharedMap = builder.create(new File(shmPath), String.class, SHMTest1Data.class);

As you can see, SharedHashMap supports generics. I provide a value object of type SHMTest1Data, the implementation is based around a simple array

	public static class SHMTest1Data implements Serializable {
		private long[] time;
		public SHMTest1Data(int maxNumberOfProcessesAllowed) {
			this.time = new long[maxNumberOfProcessesAllowed];
		}
		public int getMaxNumberOfProcessesAllowed() {
			return this.time.length;
		}
		public void setTimeAt(int index, long time) {
			this.time[index] = time;
		}
		public long getTimeAt(int index) {
			return this.time[index];
		}
	}

Our example is intended to support having a configurable number of processes running concurrently, so it's basically an array of timestamps, each providing a slot for a process that is allowed to run concurrently. The idea is that each process will repeatedly update the timestamp in it's own slot, thus signalling that the slot is taken by a running process. If there are no free slots, a new process is not allowed to run.

The next step after constructing the map is to access our shared data object:

	SHMTest1Data data = theSharedMap.get("whatever");
	if (data == null) {
		//From 1.8, we could use putIfAbsent() as that's been added to the Map interface.
		//Alternatively we can cast to SharedHashMap and use putIfAbsent().
		//But for this test just bang it in, doesn't matter if something
		//else gets in first as you'll see below
		data = new SHMTest1Data(2);
		theSharedMap.put("whatever", data);
	}

We're using the map just as you would use any map; get the value for a specific key, and if the entry is not present, populate it. For the example, we'll use a value of "2" for the number of concurrent processes that are allowed to run (new SHMTest1Data(2)).

What exactly happened under the covers in those few lines? Well first the builder created a SharedHashMap which stores onto the file we told it to use (it created and initialised the file if it didn't previously exist); then we got the SHMTest1Data value object if it existed or created and stored it if it didn't. The actual storage positions of the key ("whatever") and value (new SHMTest1Data(2)) within the file is handled under the cover by the SharedHashMap implementation, and the key and value are copied into the file and from the file as serialised objects (SHMTest1Data implements Serializable). Now at this point you probably stop and say "whoah, serialized objects, they're slow and inefficient and object creation intensive, that's not going to be high performance", and you'd be right, which is why SharedHashMap supports much more efficient storage techniques, which we'll see later in the article. Because Strings are a known special case, they are already handled efficiently (more specifically any object implementing CharSequence).

So now we've set up our map and data object, we'll use it. Now comes one of the gotchas of using a shared memory - if you are copying objects to and from the shared memory, you have to be aware that something else can be altering the object in between your usage, and handle that. Later, when we get to the "no-copy" mode, we'll be able to manage this more easily, but for now we'll handle this by just reaccessing the object each time - the following piece of code gets the list of timestamps from the SHMTest1Data object, pauses 300ms, then does it again. This will allow us to compare the two lists and see if there is a slot which is not changing:

		data = theSharedMap.get("whatever");
		long[] times1 = new long[data.getMaxNumberOfProcessesAllowed()];
		for (int i = 0; i < times1.length; i++) {
			times1[i] = data.getTimeAt(i);
		}
		pause(300L);
		data = theSharedMap.get("whatever");
		long[] times2 = new long[data.getMaxNumberOfProcessesAllowed()];
		for (int i = 0; i < times2.length; i++) {
			times2[i] = data.getTimeAt(i);
		}

Note that we access the shared map again, each time we want to get the latest data from the SHMTest1Data object. If we don't do this, we'd get stale data, as the SHMTest1Data object in memory is a copy of the one in the shared file, not a direct reference into the object (again, later we'll see the "no-copy" mode which references the object in the file directly).

Now it's simply a matter of applying the algorithm:

The full working implemtation is available in SHMTest1.java

There is the same gotcha as above to be aware of; we need to handle concurrency conflicts of the SHMTest1Data object getting updated by other processes. In our case, a simple retry mechanism works fine, along the lines of:

	while( (data = theSharedMap.get("whatever")).getTimeAt(slotindex) != timenow) 	{
		...
		data.setTimeAt(slotindex, timenow);
		theSharedMap.put("whatever", data);
	}

There are some subtleties about the actual implementation for this example, e.g. what to do if the conflict cannot be recovered from, for full details look at the source in SHMTest1.java

Efficient Marshalling

Before we move to the "no-copy" implementation, there's a quick optimisation of this example we can do - moving from Serializable to Externalizable. This immediately makes our implementation much more efficient, with the addition of the relevant simple read and write methods to SHMTest1Data

	public void writeExternal(ObjectOutput out) throws IOException {
		out.writeInt(time.length);
		for (int i = 0; i < time.length; i++) {
			out.writeLong(time[i]);
		}
	}
	public void readExternal(ObjectInput in) throws IOException,ClassNotFoundException {
		int length = in.readInt();
		time = new long[length];
		for (int i = 0; i < time.length; i++) {
			time[i] = in.readLong();
		}
	}

You can see and test this implementation in SHMTest2.java

There is also another interface supported here, instead of Serializable or Externalizable you can use the net.openhft.lang.io.serialization.BytesMarshallable interface, which works similarly to Externalizable, but instead of supplying ObjectInput/ObjectOutput objects to the read/write methods, supplies an instance of net.openhft.lang.io.Bytes which provides many features including atomic updates and compare-and-swap capability within the marshalling. We're not going into this in more detail here, but the simplest implementation (almost identical to the Externalizable one above) is available in SHMTest3.java


1 - Using SharedHashMap | 2 - "no-copy" mode | 3 - Concurrency handling and thread safety | 4 - Appendix: interfaces supported | View All


Last Updated: 2024-07-15
Copyright © 2007-2024 Fasterj.com. All Rights Reserved.
All trademarks and registered trademarks appearing on Fasterj.com are the property of their respective owners.
URL: http://www.fasterj.com/articles/sharedhashmap1a.shtml
RSS Feed: http://www.JavaPerformanceTuning.com/newsletters.rss
Trouble with this page? Please contact us