Thinking in Transforms—Handling Options
I’ve been thinking a lot about the way I program recently. I even gave a talk about it at the first ElixirConf.
One thing I’m discovering is that transforming data is easier to think about than maintaining state. I bumped into an interesting case of this idea when adding option handling to a library I was writing.
DirWalker—Some Background
I’m working on an app that helps organize large numbers of photos (about 3Tb of them). I needed to be able to traverse all the files in a set of directory trees, and do it lazily. I wrote a GenServer where the state is a list of the paths and files still be be traversed, and the main API returns the next n paths found by traversing the input paths. The code that returns the next path looks something like this:
defp next_path([ path | rest ], result) do
= File.stat!(path)
stat case stat.type do
:directory ->
([files_in(path) | rest], result)
next_path:regular ->
(rest, [ path | result ])
next_pathtrue ->
(rest, result)
next_pathend
end
So, if the next file in the list of paths to scan is a directory, we replace it with the list of files in that directory and call ourselves. Otherwise if it is a regular file, we add it to the result and call ourselves on the remaining paths. (The actual code is more complex, as it unfolds the nested path lists, and knows how to return individual paths, but this code isn’t the point of this post.)
The Real World Intrudes
Having added my DirWalker library to Hex.pm, I got a feature request—could it be made to return the File.Stat
structure along with the path to the file?
I wanted to add this capability, but also to make it optional, so I started coding using what felt like the obvious approach:
defp next_path([ path | rest ], opts, result) do
= File.stat!(path)
stat case stat.type do
:directory ->
([files_in(path) | rest], result)
next_path:regular ->
= if opts.include_stat do
return_value {path, stat}
else
pathend
(rest, [ return_value | result ])
next_pathtrue ->
(rest, result)
next_pathend
end
So, the function now has nested conditionals—never a good sign—but it is livable-with.
Then I thought, “while I’m making this change, let’s also add an option to return directory paths along with file paths.” And my code explodes in terms of complexity:
defp next_path([ path | rest ], opts, result) do
= File.stat!(path)
stat case stat.type do
:directory ->
if opts.include_dir_names do
= if opts.include_stat do
return_value {path, stat}
else
pathend
([files_in(path) | rest], [return_value | result])
next_pathelse
([files_in(path) | rest], result)
next_pathend
:regular ->
= if opts.include_stat do
return_value {path, stat}
else
pathend
(rest, [ return_value | result ])
next_pathtrue ->
(rest, result)
next_pathend
end
Moose Lends a Paw
So, lots of duplication, and the code is pretty much unreadable. Time to put down the keyboard and take Moose for a walk.
As it stands, the options map represents some state—the values of the two options passed to the API. But we really want to think in terms of transformations. So what happens if we instead think of the options as transformers?
Let’s look at the include_stat
option first. If set, we want to return a tuple containing a path and a stat structure; otherwise we return just a path. The first case is a function that looks like this:
fn path, stat -> { path, stat } end
and the second case looks like this:
fn path, _stat -> path end
So, if the include_stat
value in our options was one of these two functions, rather than a boolean value, our main code becomes simpler:
defp next_path([ path | rest ], opts, result) do
= File.stat!(path)
stat case stat.type do
:directory ->
if opts.include_dir_names do
= opts.include_stat.(path, stat)
return_value ([files_in(path) | rest], [return_value | result])
next_pathelse
([files_in(path) | rest], result)
next_pathend
:regular ->
= opts.include_stat.(path, stat)
return_value (rest, [ return_value | result ])
next_pathtrue ->
(rest, result)
next_pathend
end
We can do the same thing with include_dir_names
. Here the two functions are
fn (path, result) -> [ path | result ] end)
and
fn (_path, result) -> result end
and now our main function becomes:
defp next_path([ path | rest ], opts, result) do
= File.stat!(path)
stat case stat.type do
:directory ->
= opts.include_stat.(path, stat)
return_value |> opts.include_dir_names.(result)
([files_in(path) | rest], return_value)
next_path:regular ->
(rest, [ opts.include_stat.(path, stat) | result ])
next_pathtrue ->
(rest, result)
next_pathend
end
Changing the options from being simple state into things that transform values according the the meaning of each option has tamed the complexity of the next_path
function.
But we don’t want the users of our API to have to set up transforming functions—that would force them to know our internal implementation details. So on the way in, we want to map their options (which are booleans) into our functions.
defp setup_mappers(opts) do
{
%include_stat:
(opts[:include_stat],
one_offn (path, _stat) -> path end
fn (path, stat) -> {path, stat} end),
include_dir_names:
(opts[:include_dir_names],
one_offn (_path, result) -> result end,
fn (path, result) -> [ path | result ] end)
}
end
defp one_of(bool, if_false, if_true) do
if bool, do: if_true, else: if_false
end
If you’re interested in all the gritty details, the code is in Github.
My Takeaway
I wrote my first OO program (in Simula) back in 1974 (which is probably before most Elixir programmers were born—sigh). During the intervening years, I’ve developed many reflexes that made object-oriented development easier. And now I’m having to rethink that tacit knowledge.
Programming in Elixir encourages me to move away from state and to think about transformations. As I force myself to apply this change in thinking at all levels of my code, I discover interesting and delightful new patterns of development.
And that’s why I’m still having a blast, hacking out code, after all these years.