Scala

Refactoring with Kleisli Composition

For quite awhile we have been maintaining an application that processes XML and JSON data. Usually the maintenance consists of fixing defects and adding minor features, but sometimes it requires refactoring old code.

Consider, for example, a function that extracts an XML node by path:

import scala.xml.{Node => XmlNode}
 
def getByPath(path: List[String], root: XmlNode): Option[XmlNode] =
  path match {
    case name::names =>
      for {
        node1 <- root.child.find(_.label == name)
        node2 <- getByPath(names, node1)
      } yield node2
    case _ => Some(root)
  }

This function works fine, but requirements change and now we need to:

  • Extract nodes from JSON and other tree-like data structures, not only XML
  • Return a descriptive error message if the node is not found

This post explains how to refactor getByPath to meet the new requirements.

Refactoring with Kleisli Composition

Let’s factor out a piece of code that creates a function to extract a child node by name. We could name it createFunctionToExtractChildNodeByName, but let’s name it child for brevity.

val child: String => XmlNode => Option[XmlNode] =
  name => node => node.child.find(_.label == name)

Now we can make the key observation: our getByPath is a sequential composition of functions that extract child nodes.The code below shows an implementation of this composition:

def compose(getChildA: XmlNode => Option[XmlNode],
            getChildB: XmlNode => Option[XmlNode]): XmlNode => Option[XmlNode] =
  node => for {
            a  <- getChildA(node)
            ab <- getChildB(a)
          } yield ab

Fortunately, the Scalaz library provides a more generic way to compose function A => M[A], where M is a monad. The library defines Kleisli[M, A, B]: a wrapper for A => M[B], which has method >=> to chain the Kleisli wrappers in the same way as andThen chains regular functions. We will call this chain Kleisli composition. The code below provides a composition example:

val getChildA: XmlNode => Option[XmlNode] = child(“a”)
val getChildB: XmlNode => Option[XmlNode] = child(“b”)
 
import scalaz._, Scalaz._
 
val getChildAB: Kleisli[Option, XmlNode, XmlNode] =
  Kleisli(getChildA) >=> Kleisli(getChildB)

Note the point-free style we are using here. It is very common for functional programmers to write functions as a composition of other functions, never mentioning the actual arguments they will be applied to.

The Kleisli composition is exactly what we need to implement our getByPath as the composition of functions extracting child nodes.

import scalaz._, Scalaz._
 
def getByPath(path: List[String], root: XmlNode): Option[XmlNode] =
  path.map(name => Kleisli(child(name)))
    .fold(Kleisli.ask[Option, XmlNode]) {_ >=> _}
    .run(root)

Note the using of Kleisli.ask[Option, XmlNode] as the neutral element of the fold. We need this neutral element to handle a special case when path is Nil. Kleisli.ask[Option, XmlNode] is just an alias of a function from any node to Some(node).

Abstracting over XmlNode

Let’s generalize our solution and abstract it over XmlNode. We can rewrite it as the following generic function:

def getByPathGeneric[A](child: String => A => Option[A])
                       (path: List[String], root: A): Option[A] =
  path.map(name => Kleisli(child(name)))
    .fold(Kleisli.ask[Option, A]) {_ >=> _}
    .run(root)

Now we can reuse this generic function to extract a node from JSON (we use json4s here):

import org.json4s._
 
def getByPath(path: List[String], root: JValue): Option[JValue] = {
  val child: String => JValue => Option[JValue] = name => json =>
    json match {
      case JObject(obj) => obj collectFirst {case (k, v) if k == name => v}
      case _ => None
    }
  getByPathGeneric(child)(path, root)
}

Note that we wrote a new function, child: JValue => Option[JValue], to handle JSON instead of XML, but getByPathGeneric remains unmodified and handles both XML and JSON.

Abstracting over Option

We can generalize getByPathGeneric even further and abstract it over Option with Scalaz, which provides an instance of scalaz.Monad[Option]. So we can rewrite getByPathGeneric as follows:

import scalaz._, Scalaz._
 
def getByPathGeneric[M[_]: Monad, A](child: String => A => M[A])
                                    (path: List[String], root: A): M[A]=
  path.map(name => Kleisli(child(name)))
    .fold(Kleisli.ask[M, A]) {_ >=> _}
    .run(root)

Now we can implement our original getByPath with getByPathGeneric:

def getByPath(path: List[String], root: XmlNode): Option[XmlNode] = {
  val child: String => XmlNode => Option[XmlNode] = name => node =>
    node.child.find(_.label == name)
  getByPathGeneric(child)(path, root)
}

Next we can reuse getByPathGeneric to return an error message if the node is not found.

To do this, we will use scalaz.\/ (aka disjunction), which is a monadic right-biased version of scala.Either. On top of that, Scalaz provides implicit class OptionOps with method toRightDisjunction[B](b: B), which converts Option[A] to scalaz.B\/A so that Some(a) becomes Right(a) and None becomes Left(b). You can find more info about \/ in other blogs.

Thus we can write a function, which reuses getByPathGeneric, to return an error message instead of None if the node is not found:

type Result[A] = String\/A
 
def getResultByPath(path: List[String], root: XmlNode): Result[XmlNode] = {
  val child: String => XmlNode => Result[XmlNode] = name => node =>
    node.child.find(_.label == name).toRightDisjunction(s"$name not found")
  getByPathGeneric(child)(path, root)
}

Conclusion

The original getByPath function handled only XML data and returned None if the node was not found. We also needed it to handle JSON and return a descriptive message instead of None.

We have seen how using Kleisli composition provided by Scalaz can factor out the generic function getByPathGeneric, which we abstracted further using generics (in order to support JSON) and disjunction (to generalize over Option).

Reference: Refactoring with Kleisli Composition from our JCG partner Michael Dagaev at the Wix IO blog.

Yoav Abrahami

Yoav is the Chief Architect at Wix.com, working with developers and operations to build the company's future products as well as accelerating and improving development processes. Prior to joining Wix, Yoav was an Architect at Amdocs Cramer OSS division. Yoav holds a MS in Physics and a BS in Computer Science from the Tel Aviv University.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Sebastian
Sebastian
9 years ago

I have enjoyed your article, thank you. Perhaps it would be good to point out that the real benefit of using the Kleisli arrow comes only when one abstracts over the monad. All the rest (abstracting over Node etc.) could also have been done with the original recursive formulation of getByPath. Also, the factoring out of the “glue” code comes at a price. While the original recursive implementation would run in constant space (with tail recursion optimization), the other approach builds up the entire function chain in memory before applying it all at once, requiring space linear in the size… Read more »

Back to top button