Scala Tutorial – code blocks, coding style, closures, scala documentation project
Preface
This is part 12 of tutorials for first-time programmers getting into Scala. Other posts are on this blog, and you can get links to those and other resources on the links page of the Computational Linguistics course I’m creating these for. Additionally you can find this and other tutorial series on the JCG Java Tutorials page.
This post isn’t so much a tutorial as a comment on coding style with a few pointers on how code blocks in Scala work. It was instigated by patterns I was noting in my students’ code; namely, that they were packing everything into one-liners with map after map with map after map, etc. These map-over-mapValues-over-map sequences of statements can be almost incomprensible, both for some other person reading the code, and even for the person writing the code. I do admit to a fair amount of guilt in using such sequences of operations in class lectures and even in some of these tutorials. It works well in the REPL and when you have lots of text to explain what is going on around the piece of code in question, but it seems to have given a bad model for writing actual code. Oops!
So taking a step back, it is important to break operation sequences up a bit, but it isn’t always obvious to beginners how one can do so. Also, some students indicated that they had gotten the impression that one should try to pack everything onto one line if possible, and that breaking things up was somehow less advanced or less Scala-like. This is hardly the case. In fact much to the contrary: it is crucial to use strategies that allow readers of your code to see the logic behind your statements. This isn’t just for others — you are likely to be a reader of your own code, often months after you originally wrote it, and you want to be kind to your future self.
A simple example
I’m giving an example here. of what you can do to give your code more breathing space. It’s not a very meaningful example, but it serves the purpose without being very complex. We begin by creating a list of all the letters in the alphabet.
scala> val letters = "abcdefghijklmnopqrstuvwxyz".split("").toList.tail letters: List[java.lang.String] = List(a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z)
Okay, now here’s our (pointless) task: we want to create a map from every letter (from ‘a’ to ‘x’) to a list containing that letter and the two letters that follow it in reverse alphabetical order. (Did I mention this was a pointless task in and of itself?) Here’s a one-liner that can do it.
scala> letters.zip((1 to 26).toList.sliding(3).toList).toMap.mapValues(_.map(x => letters(x-1)).sorted.reverse) res0: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(e -> List(g, f, e), s -> List(u, t, s), x -> List(z, y, x), n -> List(p, o, n), j -> List(l, k, j), t -> List(v, u, t), u -> List(w, v, u), f -> List(h, g, f), a -> List(c, b, a), m -> List(o, n, m), i -> List(k, j, i), v -> List(x, w, v), q -> List(s, r, q), b -> List(d, c, b), g -> List(i, h, g), l -> List(n, m, l), p -> List(r, q, p), c -> List(e, d, c), h -> List(j, i, h), r -> List(t, s, r), w -> List(y, x, w), k -> List(m, l, k), o -> List(q, p, o), d -> List(f, e, d))
That did it, but that one-liner isn’t clear at all, so we should break things up a bit. Also, what is “_” and what is “x”? (By which I mean, what are they in terms of the logic of the program? We know they are ways of referring to the elements being mapped over, but they don’t help the human reading the code understand what is going on.)
Let’s start by creating the sliding list of number ranges.
scala> val ranges = (1 to 26).toList.sliding(3).toList ranges: List[List[Int]] = List(List(1, 2, 3), List(2, 3, 4), List(3, 4, 5), List(4, 5, 6), List(5, 6, 7), List(6, 7, 8), List(7, 8, 9), List(8, 9, 10), List(9, 10, 11), List(10, 11, 12), List(11, 12, 13), List(12, 13, 14), List(13, 14, 15), List(14, 15, 16), List(15, 16, 17), List(16, 17, 18), List(17, 18, 19), List(18, 19, 20), List(19, 20, 21), List(20, 21, 22), List(21, 22, 23), List(22, 23, 24), List(23, 24, 25), List(24, 25, 26))
It’s quite clear what that is now. (The sliding function is a beautiful thing, especially for natural language processing problems.)
Next, we zip the letters with the ranges and create a Map from the pairs using toMap. This produces a Map from letters to lists of three numbers. Note that the lengths of the two lists are different: letters has 26 elements and ranges has 24, which means that the last two elements of letters (‘y’ and ‘z’) get dropped in the zipped list.
scala> val letter2range = letters.zip(ranges).toMap letter2range: scala.collection.immutable.Map[java.lang.String,List[Int]] = Map(e -> List(5, 6, 7), s -> List(19, 20, 21), x -> List(24, 25, 26), n -> List(14, 15, 16), j -> List(10, 11, 12), t -> List(20, 21, 22), u -> List(21, 22, 23), f -> List(6, 7, 8), a -> List(1, 2, 3), m -> List(13, 14, 15), i -> List(9, 10, 11), v -> List(22, 23, 24), q -> List(17, 18, 19), b -> List(2, 3, 4), g -> List(7, 8, 9), l -> List(12, 13, 14), p -> List(16, 17, 18), c -> List(3, 4, 5), h -> List(8, 9, 10), r -> List(18, 19, 20), w -> List(23, 24, 25), k -> List(11, 12, 13), o -> List(15, 16, 17), d -> List(4, 5, 6))
Note that we could have broken this into two steps, first creating the zipped list and then calling toMap on it. However, it is perfectly clear what the intent is when one zips two lists (creating a list of pairs) and then uses toMap on it immediately, so this is certainly a case where it makes sense to put multiple operations on a single line.
At this point we could of course process the letter2range Map using a one-liner.
scala> letter2range.mapValues(_.map(x => letters(x-1)).sorted.reverse) res1: scala.collection.immutable.Map[java.lang.String,List[java.lang.String]] = Map(e -> List(g, f, e), s -> List(u, t, s), x -> List(z, y, x), n -> List(p, o, n), j -> List(l, k, j), t -> List(v, u, t), u -> List(w, v, u), f -> List(h, g, f), a -> List(c, b, a), m -> List(o, n, m), i -> List(k, j, i), v -> List(x, w, v), q -> List(s, r, q), b -> List(d, c, b), g -> List(i, h, g), l -> List(n, m, l), p -> List(r, q, p), c -> List(e, d, c), h -> List(j, i, h), r -> List(t, s, r), w -> List(y, x, w), k -> List(m, l, k), o -> List(q, p, o), d -> List(f, e, d))
This is better than what we started with because we at least know what letter2range is, but it still isn’t clear what is going on after that. To make this more comprehensible, we can break it up over multiple lines and give more descriptive names to the variables. The following produces the same result as above.
letter2range.mapValues ( range => { val alphavalues = range.map (number => letters(number-1)) alphavalues.sorted.reverse } )
Notice that:
- I called it range rather than _ which is a better indicator of what mapValues is working with.
- After the => I use an open left bracket {
- The next lines are a block of code that I can use like any block of code, which means I can create variables and break things down into smaller, more understandable steps. For example the line creating alphavalues makes it clear that we are taking a range and mapping it to the corresponding indices in the letters list (e.g., the range 2, 3, 4 becomes ‘b’,’c’,’d’). For such a list, we then sort and reverse it (okay, so it started out sorted, but you can imagine plenty of times you need to do such sorting).
- The last line of that block is what the result of the overall mapValue for that element (here, indicated by the variable range) is.
Basically, we get a lot more breathing room, and this becomes even more essential as you dig deeper or do more complex operations during a map-within-a-map operation. Having said that, you should ask yourself whether you should just create and use a function that has a clear semantics and does the job for you. For example, here’s an alternative to the above strategy that is perhaps clearer.
def lookupSortAndReverse (range: List[Int], alpha: List[String]) = range.map(number => alpha(number-1).sorted.reverse)
We’ve defined a function that takes a range and a list of letters (called alpha in the function) and produces the sorted and reversed list of letters corresponding to the numbers in the range. In other words, it is what the anonymous function defined after range in the previous code block did. We can thus easily use it at the top-level mapValue operation with completely clear intent and comprehensibility.
letter2range.mapValues(range => lookupSortAndReverse(range, letters))
Of course, you should especially consider creating such functions if you use the same operation in multiple places.
Closures
One further final note. Note that I passed the letters list into the lookupSortAndReverse function such that its value was bound to the function internal variable alpha. You may wonder whether I needed to include that, or whether it is possible to directly access the letters list in the function. In fact you can: provided that letters has already been defined, we can do the following.
def lookupSortAndReverseCapture (range: List[Int]) = range.map(number => letters(number-1).sorted.reverse) letter2range.mapValues(range => lookupSortAndReverseCapture(range))
This is called a closure, meaning that the function has incorporated free variables (here, letters) that come from outside its own scope. I generally don’t use this strategy with named functions like this, but there are many natural situations for using closures. In fact you do it all the time when you are creating anonymous functions as arguments to functions like map and mapValue and their cousins. As a reminder, here was the map-within-a-mapValue anonymous function we defined before.
letter2range.mapValues ( range => { val alphavalues = range.map (number => letters(number-1)) alphavalues.sorted.reverse } )
The letters variable has been “closed over” in the anonymous function range => { … }, which is not very different from what we did with the closure-style lookupSortAndReverse function.
All the code in one spot
Since there are some dependencies between the different steps in this tutorial that could get things mixed up, here’s all the code in one spot such that you can run it easily.
// Get a list of the letters val letters = "abcdefghijklmnopqrstuvwxyz".split("").toList.tail // Now create a list that maps each letter to a list containing itself // and the two letters after it, in reverse alphabetical // order. (Bizarre, but hey, it's a simple example. BTW, we lose y and // z in the process.) letters.zip((1 to 26).toList.sliding(3).toList).toMap.mapValues(_.map(x => letters(x-1)).sorted.reverse) // Pretty unintelligible. Let's break things up a bit val ranges = (1 to 26).toList.sliding(3).toList val letter2range = letters.zip(ranges).toMap letter2range.mapValues(_.map(x => letters(x-1)).sorted.reverse) // Okay, that's better. But it is easier to interpret the latter if we break things up a bit letter2range.mapValues ( range => { val alphavalues = range.map (number => letters(number-1)) alphavalues.sorted.reverse } ) // We can also do the one-liner coherently if we have a helper function. def lookupSortAndReverse (range: List[Int], alpha: List[String]) = range.map(number => alpha(number-1).sorted.reverse) letter2range.mapValues(range => lookupSortAndReverse(range, letters)) // Note that we can "capture" the letters value, though this makes the // requires letters to be defined before lookupSortAndReverse in the // program. def lookupSortAndReverseCapture (range: List[Int]) = range.map(number => letters(number-1).sorted.reverse) letter2range.mapValues(range => lookupSortAndReverseCapture(range))
Wrapup
Hopefully this will encourage you to use clearer coding style and demonstrates some aspects of code blocks that you may not have realized. However, this just scratches the surface of writing clearer code, and a lot of it will just come with time and practice and realizing how necessary it is when you look back at code you wrote months ago.
Note that one easy thing you can do to create better code is to try to stick established coding conventions. For example, see the coding guidelines for Scala on the Scala documentation project. There is also a lot of other very useful stuff, including tutorials, and it is actively evolving and growing!
Reference: First steps in Scala for beginning programmers, Part 12 from our JCG partner Jason Baldridge at the Bcomposes blog.
Related Articles :
- Scala Tutorial – Scala REPL, expressions, variables, basic types, simple functions, saving and running programs, comments
- Scala Tutorial – Tuples, Lists, methods on Lists and Strings
- Scala Tutorial – conditional execution with if-else blocks and matching
- Scala Tutorial – iteration, for expressions, yield, map, filter, count
- Scala Tutorial – regular expressions, matching
- Scala Tutorial – regular expressions, matching and substitutions with the scala.util.matching API
- Scala Tutorial – Maps, Sets, groupBy, Options, flatten, flatMap
- Scala Tutorial – scala.io.Source, accessing files, flatMap, mutable Maps
- Scala Tutorial – objects, classes, inheritance, traits, Lists with multiple related types, apply
- Scala Tutorial – scripting, compiling, main methods, return values of functions
- Scala Tutorial – SBT, scalabha, packages, build systems
- Fun with function composition in Scala
- How Scala changed the way I think about my Java Code
- Testing with Scala
- Things Every Programmer Should Know
This is an excellent example of “write only” code. I fail to see any advantage of writing “one-liners” other than for bragging rights. The result is code that is difficult to read, understand, and most importantly, maintain. Modern optimizing techniques remove most, if not all, redundancies in the compiled code.
Please read the preface to the article! The point was to get away from one-liners…