Skip to main content

Set Best Pracrives

Best Practices for Using Set Implementations in Java​

When working with Set implementations in Java, it is important to choose the correct implementation and follow best practices to ensure good performance, maintainability, and correctness.


1. Choosing the Right Implementation​

Different Set implementations serve different purposes.

Use HashSet When​

  • Fast lookups are required
  • Order does not matter
  • You want the best average performance

Use LinkedHashSet When​

  • You need to maintain insertion order
  • You still want fast operations

Use TreeSet When​

  • You need sorted elements
  • You need navigation methods like first(), last(), ceiling()

2. Avoiding Common Pitfalls​

Adding Duplicate Elements​

Sets do not allow duplicates. The add() method returns false if the element already exists.

Set<String> fruits = new HashSet<>();

boolean added = fruits.add("Apple");
added = fruits.add("Apple"); // returns false

Using null with TreeSet​

TreeSet does not allow null.

Set<Integer> numbers = new TreeSet<>();
// numbers.add(null); // Throws NullPointerException

Modifying a Set During Iteration​

This causes ConcurrentModificationException.

Safe approach:

Iterator<String> it = fruits.iterator();

while (it.hasNext()) {
String fruit = it.next();

if (fruit.equals("Banana")) {
it.remove();
}
}

Forgetting Comparators with TreeSet​

If objects are not Comparable, you must provide a Comparator.

Set<String> names = new TreeSet<>(Comparator.reverseOrder());

3. Performance Comparison​

Operation HashSet LinkedHashSet TreeSet


Add O(1) O(1) O(log n) Remove O(1) O(1) O(log n) Contains O(1) O(1) O(log n) Ordering None Insertion order Sorted Memory Low Medium Higher


4. Thread Safety​

Most Set implementations are not thread-safe.

Option 1: Synchronized Wrapper​

Set<String> syncSet =
Collections.synchronizedSet(new HashSet<>());

Option 2: Concurrent Collection​

Set<String> concurrentSet =
new ConcurrentSkipListSet<>();

5. Useful Patterns​

Removing Duplicates​

List<String> fruits =
Arrays.asList("Apple","Banana","Apple");

Set<String> unique =
new HashSet<>(fruits);

Sorting a Set​

Set<Integer> numbers =
new HashSet<>(Arrays.asList(5,2,8,1));

Set<Integer> sorted =
new TreeSet<>(numbers);

Mathematical Set Operations​

Set<Integer> set1 = new HashSet<>(Arrays.asList(1,2,3));
Set<Integer> set2 = new HashSet<>(Arrays.asList(2,3,4));

Set<Integer> union = new HashSet<>(set1);
union.addAll(set2);

Set<Integer> intersection = new HashSet<>(set1);
intersection.retainAll(set2);

Set<Integer> difference = new HashSet<>(set1);
difference.removeAll(set2);

6. Large Scale Application Recommendations​

Primitive Collections​

Libraries like:

  • Eclipse Collections
  • FastUtil

reduce memory overhead caused by boxing.


Profile Performance​

Use profilers to detect inefficient Set usage.


Immutable Sets​

Set<String> set =
Set.of("Apple","Banana","Cherry");

Immutable sets prevent modification.


7. Difference Between Set Implementations in Java​

FeatureHashSetLinkedHashSetTreeSet
TypeSetSetSet
OrderingNo guaranteed orderMaintains insertion orderSorted order (natural or comparator)
Duplicates AllowedNoNoNo
Null Elements AllowedYes (one null allowed)Yes (one null allowed)No
Thread-SafeNoNoNo
Underlying Data StructureHash tableHash table + linked listRed-black tree
Add OperationO(1)O(1)O(log n)
Remove OperationO(1)O(1)O(log n)
Contains OperationO(1)O(1)O(log n)
Memory OverheadLowModerateHigh
Use CasesFast lookups when order is not importantWhen insertion order must be preservedWhen elements must be stored in sorted order
Iterator TypeFail-fastFail-fastFail-fast

Explanation of Key Differences​

1. Ordering​

All three implementations (ArrayList, LinkedList, Vector) maintain the insertion order, meaning elements are stored and retrieved in the same sequence they were added.

2. Duplicates and Null Elements​

Since all three are implementations of the List interface, they allow:

  • Duplicate elements
  • Null values

3. Thread-Safety​

  • ArrayList and LinkedList are not thread-safe.
    They are designed for single-threaded environments and offer better performance.

  • Vector is thread-safe because its methods are synchronized.
    However, this synchronization makes it slower compared to ArrayList and LinkedList.

4. Underlying Data Structure​

  • ArrayList

    • Uses a dynamic array to store elements
    • Efficient for random access
    • Slower for frequent insertions and deletions
  • LinkedList

    • Uses a doubly-linked list
    • Efficient for adding/removing elements at both ends
    • Slower for random access
  • Vector

    • Uses a dynamic array, similar to ArrayList
    • Has additional synchronization overhead

5. Performance​

ArrayList​

  • Add: O(1) amortized (occasionally O(n) when resizing)
  • Get: O(1) due to index-based access
  • Remove: O(n) because elements may need to be shifted

LinkedList​

  • Add/Remove at Ends: O(1)
  • Get: O(n) because traversal is required
  • Remove from Middle: O(n)

Vector​

  • Similar to ArrayList in time complexity
  • Slower due to synchronization

6. Memory Overhead​

  • ArrayList
    Minimal memory overhead since it stores elements in an array.

  • LinkedList
    Higher memory overhead because each node stores:

    • previous pointer
    • next pointer
  • Vector
    Moderate memory overhead due to synchronization.

7. Use Cases​

ArrayList​

Ideal when:

  • Random access is frequent
  • Thread-safety is not required
  • Dataset size is large but mostly static

LinkedList​

Suitable for:

  • Implementing queues
  • Implementing stacks
  • Implementing deques
  • Frequent insertions and deletions

Vector​

Primarily used in:

  • Legacy systems
  • Situations where built-in synchronization is required

8. Iterator Behavior​

  • ArrayList and LinkedList

    • Provide fail-fast iterators
    • Throw ConcurrentModificationException if modified during iteration
  • Vector

    • Provides fail-safe iterators via Enumeration
    • Does not throw exceptions if modified during iteration

Summary​

  1. Choose the right Set implementation.
  2. Avoid modifying collections during iteration.
  3. Understand time complexity of operations.
  4. Use concurrent collections when needed.
  5. Prefer immutable sets for read‑only data.