Removing duplicate elements from a doubly linked list is a common task in data structures that every programmer should understand. Whether you are preparing for an interview, implementing a data cleaning routine, or optimizing a legacy system, knowing the standard techniques and pitfalls for deduplicating a doubly linked list will save you time and bugs. A doubly linked list gives you forward and backward pointers, which makes removal operations straightforward if you keep pointer updates consistent. In this topic we will cover the most practical approaches a linear-time method using a hash set, an in-place method for sorted lists, and an O(n²) constant-space method. We’ll also discuss implementation details, pseudocode, complexity, edge cases, and testing tips.
Why deduplicate a doubly linked list?
Duplicates can arise for many reasons bad input, repeated inserts, merges of datasets, or application logic errors. Removing duplicates reduces memory use, prevents repeated processing of the same value, and often improves correctness for algorithms that assume unique elements. Because a doubly linked list keeps both next and prev pointers, you can remove nodes in O(1) time once you find them, but you must adjust both pointers carefully to keep the list consistent.
Assumptions and basics
A typical doubly linked list node has at least three fields value (or data), next pointer, and prev pointer. We assume a singly owned list with a head pointer (and optionally a tail). Removal of a node requires relinking its neighbours
- If node.prev exists, set node.prev.next = node.next.
- If node.next exists, set node.next.prev = node.prev.
- If node is head, update head to node.next.
- Optionally deallocate or free node memory.
Keeping these rules in mind prevents broken links and memory leaks.
Method 1 Linear time with extra space (hash set)
This is the most practical approach for an unsorted doubly linked list. Traverse the list once, keep a set (hash table) of values you’ve already seen, and remove a node when its value is already in the set. Time complexity is O(n) on average and space complexity is O(n). This method is fast and simple to implement.
Pseudocode (hash set)
current = head seen = empty set while current != null next_node = current.next // store next because current might be removed if current.value in seen // remove current if current.prev != null current.prev.next = current.next else head = current.next if current.next != null current.next.prev = current.prev free(current) // language dependent else seen.add(current.value) current = next_node
This code keeps the traversal stable by saving next_node before any deletion, so pointer updates are safe.
Complexity and practical notes
- Time O(n) average, O(n²) worst-case if hash operations degrade.
- Space O(n) for the seen set.
- Best when duplicates are common and extra memory is affordable.
Method 2 Linear time, no extra space (sorted list)
If the doubly linked list is sorted by value, removing duplicates is even simpler and requires no extra data structure. Since duplicates appear consecutively, a single pass comparing a node to its next neighbor is enough to remove repeated elements. This method runs in O(n) time and O(1) extra space.
Pseudocode (sorted list)
current = head while current != null and current.next != null if current.value == current.next.value dup = current.next current.next = dup.next if dup.next != null dup.next.prev = current free(dup) else current = current.next
Because duplicates are adjacent, you never need to backtrack or use a set. This is the preferred method for sorted lists.
Method 3 Constant space, O(n²) time (nested loops)
When memory is extremely constrained and the list is unsorted, you can remove duplicates using a nested loop for each node, scan subsequent nodes and remove any nodes with the same value. This method requires no extra storage but costs O(n²) time and can be slow for large lists.
Pseudocode (nested loops)
current = head while current != null runner = current.next while runner != null next_runner = runner.next if runner.value == current.value // remove runner if runner.prev != null runner.prev.next = runner.next if runner.next != null runner.next.prev = runner.prev free(runner) runner = next_runner current = current.next
This method is straightforward but best reserved for small lists or environments where memory allocation is not allowed.
Edge cases and pitfalls
When implementing removal logic, consider the following edge cases to avoid bugs
- Empty list (head is null) nothing to do.
- Single-node list removing duplicates should handle head correctly.
- All nodes are duplicates final head may become null.
- Tail node removal ensure tail pointer is updated if your list tracks it.
- Concurrent modifications if multiple threads can touch the list, synchronize access.
- Memory management in languages without garbage collection, free nodes properly to avoid leaks.
Also check whether duplicates are defined by value equality or by a key extracted from the node (for example, comparing only an id field).
Testing and validation
Robust testing helps verify your implementation. Useful test cases include
- Empty list.
- Single element list.
- List with no duplicates.
- List where every element is the same.
- Duplicates at the head, middle, and tail positions.
- Mixed data types or null values within nodes (if applicable).
Automated unit tests should assert list length, the order of remaining elements, and pointer consistency (every node’s next.prev equals the node and vice versa). For languages with manual memory management, use tools to check for leaks.
Memory and performance considerations
Choose the method based on your constraints. If performance matters and you can afford extra memory, the hash set approach is ideal. If memory is limited but the list is sorted, use the sorted-list approach. For extreme memory constraints and small lists, the nested-loop method can be acceptable. In high-performance systems, also consider cache locality and allocation overhead removing many nodes may fragment memory, so pooling or batch deallocation strategies can help.
Conclusion and best practices
Removing duplicates from a doubly linked list is a routine but crucial operation. The most practical strategy for unsorted lists is to use a hash set for O(n) time. For sorted lists, exploit adjacency and do it in-place with O(1) extra space. If you cannot allocate additional memory, the nested-loop O(n²) solution works for small inputs. Whichever approach you choose, be meticulous about updating both next and prev pointers and about testing edge cases. Clear, well-documented code and thorough unit tests will make your deduplication routine reliable and maintainable.