Be Careful When Modifying Data While Using a Java Iterator
As this semester begins to wrap up, I figured I’d share a little story of how I got very, very familiar with Java iterators.
Real World Context
For context, I teach a second-year software components course which serves as the last hurdle for students trying to get into the major. Naturally, this course is very stressful for students, and I often have to work extra hard to give them every opportunity to succeed.
Unfortunately, this semester we got swept up in the pandemic and had to convert to online teaching. As a result, we had to make some quick decisions about instruction that changed the way students were going to learn. In particular, we converted all our paper exams to online quizzes.
For some students, this was a major blessing. After all, these quizzes weren’t any more difficult than the exams, and we made them open book. In other words, we made the class just a little bit easier for them to pass.
Of course, students were all over the globe, and they weren’t able to get the help they needed. In addition, students weren’t taking their studies as seriously as they would for an exam. This combination created some pretty abysmal quiz scores.
By the time we hit the fourth quiz, students were pretty upset. In fact, I heard from several instructors that their students were tired of the “trick questions.” As an instructor, this was a little frustrating to hear because these were pretty typical exam questions. We weren’t exactly ramping up the difficulty just for them, yet this was the first time I was hearing these complaints.
Example Problem
Then, something weird happened. We gave them a question that I didn’t really know the answer to, and it went a little something like the following:
What is the value of the Set<NaturalNumber> nums variable after the following code fragment?
1 2 3 4 5 6 7 | Set<NaturalNumber> nums = new SomeSetImplementation<>(); nums.add( new NaturalNumber2( 1 )); nums.add( new NaturalNumber2( 5 )); nums.add( new NaturalNumber2( 6 )); for (NaturalNumber n : nums) { n.increment(); } |
Naturally, the students’ options are as follows:
- nums = {1, 5, 6, 2, 6, 7}
- nums = {2, 6, 7}
- nums = {1, 5, 6}
- Cannot tell from the information provided.
Now, for context, there are a few in-house components in this example.
First, a NaturalNumber is a mutable class which represents an unbounded non-negative integer. In other words, a NaturalNumber can range from zero to infinity. In addition, a NaturalNumber can be modified using a series of basic math operations like the following:
increment()
: adds 1 tothis
add(NaturalNumber n)
: adds n tothis
In addition, this question makes use of a Set
which is akin to a mathematical set. The idea here is that a Set
has two main properties:
- A
Set
lacks duplicates (i.e. {1, 2, 1} is not a legal set). - A
Set
is unordered (i.e. {1, 2, 3} and {3, 2, 1} are equivalent).
For reference, both of these components are thoroughly documented on the course website, if you’re interested in reading more details. All components are written using Design by Contract, so each method will include a proper contract where the precondition is denoted by @requires and the postcondition is denoted by @ensures.
In addition, we label each parameter using parameter modes like @restores, @updates, @clears, and @replaces. Of course, that’s a bit out of the scope of this piece.
Breaking Down the Problem
Now, I’ll reiterate that I wasn’t sure exactly which answer was correct at first. Obviously, the first answer (i.e. {1, 5, 6, 2, 6, 7}) is incorrect as incrementing the underlying value doesn’t add new values to the Set
—or so I thought. Using that same logic, I also assumed the third set (i.e. {1, 5, 6}) was obviously incorrect because we’re clearly mutating the underlying values.
At this point, I was fairly confident that the second answer (i.e. {2, 6, 7}) was correct, as were 87% of my students. Of course, I had the answer key, so I had to challenge myself to understand why the correct answer was actually the final answer (i.e. “Cannot tell from the information provided.”).
Now, based on the title of this article, you might already be way ahead of me. That’s fine! However, I didn’t jump to that conclusion right away. Instead, I took a step back and decided to actually draw out the Set
.
Of course, there are a couple major hiccups when you try to do that. First, as I mentioned previously, a Set
has no order. As a result, how do we reason about which element goes first during iteration? Do we try all possible configurations?
These were questions that I wasn’t ready to grapple with. Luckily, as it turns out, iterating by order of appearance saves us a lot of time. Take a look:
1 2 3 | { 1 , 5 , 6 } // Initial state { 2 , 5 , 6 } // After incrementing the first element { 2 , 6 , 6 } // After incrementing the second element |
Uh oh! We broke our first rule: a Set
must not contain duplicates. Therefore, we cannot tell what the resulting Set
will look like. My final answer is D: “Cannot tell from the information provided.”
Unfortunately, this explanation wasn’t exactly satisfying to me. Like, I get that a Set
cannot contain duplicates, but what are the practical ramifications of breaking that rule? In other words, if it’s so bad, why do we even give the user access to the underlying data?
In my opinion, users should only be able to access the data when they remove it. In general, I think the library does a great job of doing that. If Set
didn’t implement Iterable
, we’d be in the clear.
Introducing Java Iterators
Which brings me to an even weirder problem: Java iterators. In order for this code to work, Set
has to implement Iterable which means defining an Iterator for the underlying architecture.
Now, if you’ve ever written your own Iterator, you know that you need to do something like the following:
1 2 3 4 5 6 7 8 | new Iterator<T>() { @Override public boolean hasNext() { ... } @Override public T next() { ... } @Override public void remove() { ... } } |
Here, the basic idea is that we define some sort of structure which can serve as a lazy data structure. If you’re familiar with generator expressions from other languages like Python, it’s the same idea: we create an object which can return one item at a time from some sequence of items.
In practice, an Iterator
works by continuing to provide items through the next()
method until there is nothing left to return (which may never happen). In bounded sequences, we know when to stop because the hasNext()
method will return false
. Together, these methods can serve as the core of a looping mechanism:
1 2 3 | while (iter.hasNext()) { T item = next(); } |
By making a class implement Iterable
, we can then leverage a bit of Java syntactic sugar called the for-each loop:
1 | for (T item: collection) { ... } |
Java Iterator Caveats
In the problem defined above, we were able to loop over the Set
because it implements Iterable
.
Of course, just because we’re able to loop over the data structure doesn’t mean we won’t run into problems. After all, the Iterator
class has a few rules of its own. Perhaps the most important rule can be found in the description of the remove()
method:
Removes from the underlying collection the last element returned by this iterator (optional operation). This method can be called only once per call to next()
. The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.
Java 8 Docs (captured 04/23/2020)
Remember how I said that modifying a NaturalNumber
is bad because it could result in duplicates. Well, based on this definition, modifying a Set
could result in unpredictable behavior regardless.
Of course, this raises a question for me: what does it mean to modify the underlying collection. For Java Collections, for-each loops disallow the addition or removal of an item from a collection. In those cases, we can expect to see a ConcurrentModificationException
(docs).
Now, that error isn’t universal. After all, how could an Iterator
possibly know if a collection had been modified? As it turns out, that behavior is custom baked into the next()
method for each collection. With the List
collection, for example, the ConcurrentModificationException
is thrown when the size of the list changes. In other words, the integrity of the data structure is checked on every invocation of next()
.
Since the collections leverage generic types, it’s impossible to account for all the different types of situations that could arise. As a result, there’s no way for next()
to detect if any data was mutated without tracking state. For example, checking if any values have changed in a list might require storing a copy of the previous state and checking against that previous state regularly. That ain’t cheap!
To make matters worse, we haven’t really talked about what effects modifying underlying data could have on the actual iteration process. For example, if next()
somehow depends on the underlying data, changing it would clearly change what comes next.
Imagine for a second that we had an Iterator
for a list whose items must implement Comparable
. Then, we made this Iterator
in such a way that it always returned the next value in sorted order. If we were to then modify underlying values, we could create a loop which never traverses the entire list:
1 2 | [ 1 , 2 , 3 ] // next() returns 1 which we scale by 5 [ 5 , 2 , 3 ] // hasNext() claims there are no other values |
Now, that’s not ideal. Typically, you’d expect a for-each loop to actually traverse an entire data structure, and this simply isn’t doing that.
Revisiting the Set Problem
At this point, we’ve had a chance to talk about the Set
problem from two different angles:
- What happens if we invalidate a
Set
by generating duplicates? - What happens if we invalidate a for-each loop by modifying the underlying data structure?
Now, I want to take an opportunity to talk about what can actually happen while executing the problem snippet:
1 2 3 4 5 6 7 | Set<NaturalNumber> nums = new SomeSetImplementation<>(); nums.add( new NaturalNumber2( 1 )); nums.add( new NaturalNumber2( 5 )); nums.add( new NaturalNumber2( 6 )); for (NaturalNumber n : nums) { n.increment(); } |
Assuming the Iterator
for Set
has no fancy modification detection, one possible outcome is the same Set
most people would expect: {2, 6, 7}.
Another possible outcome is we get a Set
where only some of the values are incremented. Perhaps, as I stated before, the next()
method depends on underlying data to make its decision about what comes next.
In this scenario, we could end up with any combination of incremented outputs:
- {2, 5, 6}
- {1, 6, 6}
- {1, 5, 7}
- {2, 6, 6}
- {2, 5, 7}
- {1, 6, 7}
In either scenario, we’re not exactly safe. Sure, the Set
looks the same, but is it really the same?
Let’s imagine for a second that the Set
is implemented using a hash table. This offers the advantage of being able to quickly check for duplicates, but it requires a bit more maintenance. For example, if we want to change a value in the Set
, we have to recompute the hash and check for collisions.
When we modify the NaturalNumber
directly, we skip this maintenance phase. As a result, our hash table will still contain the original three hashes. When someone checks if the Set
contains two, for example, the method will incorrectly return false
.
Of course, this is an implementation detail. It’s very possible that no issues are detected at all. The program continues to run smoothly, and no one bats an eye. As with all implementation details, however, we can’t depend on their assumed behavior. In other words, the program is still unpredictable.
As a minor aside, the Java implementation of Set
actually calls out this exact issue:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Java Set Documentation (view on 04/24/2020)
Looks like it’s pretty tough to put together a Set
implementation that doesn’t have problems with mutable types. I wonder what that says about mutable types…
What’s the Takeaway?
In the end, I think the Iterator
documentation is written in a way that leaves it up to the user to play nice. In other words, when it says:
The behavior of an iterator is unspecified if the underlying collection is modified while the iteration is in progress in any way other than by calling this method.
It truly means “in any way.” Of course, I was never able to confirm these suspicions, so I’d be interested to see what other folks have to say.
In the meantime, if you liked this article, I’d love it if you took the opportunity learn how you can help grow the site a bit. In that article, you’ll learn about my mailing list as well as my Patreon.
Otherwise, here are a few related posts just for you:
Otherwise, thanks for sticking around. Hopefully, my late night grad school ramblings were useful to you!
Published on Java Code Geeks with permission by Jeremy Grifski, partner at our JCG program. See the original article here: Be Careful When Modifying Data While Using a Java Iterator Opinions expressed by Java Code Geeks contributors are their own. |
What is Adapter Design Pattern: A brief explanation
As mentioned earlier, the adapter design pattern falls in the category of structural design patterns. Structural patterns are mainly concerned with the composition of classes and objects. They follow the inheritance principle of object-oriented paradigms. The adapter design pattern makes one interface conform to another, providing a uniform abstraction of different interfaces. So, how do we define – what is adapter design pattern?