Mini polyglot grep
I wanted to write a little grep program that prints the lines of a given file that contain a given term.
I choose to write it in Haskell
, Ruby
, Java
, Scala
and Python
and see how different the thinking process and the result would be.
The aim was to get an idea of these points:
- how different these languages deal with IO
- how conditional flows could be managed
- how clean the code would be
- LOC
- Performance
And as baseline I defined the following requirements:
- usage is grep term file (the term and the file to grep would be taken as program arguments)
- in case of incorrect usage (not enough arguments), prints the usage.
- these program arguments should be named and not used directly
- the actual grep work should be defined in a specific function
- ignored cases such as incorrect filepaths
So I started with Haskell.
Haskell has a very interesting way of dealing with IO. IO is part of the “dirty” world, not of the pure one.
So computation that happens to do IO work is bound and “isolated” from the pure code and this separation is explicit, through the IO Monad.
To give a quick simple example of what this implies, a String value obtained from a IO operation has type of IO String and not just String.
Haskell version:
LOC: 9
Average Time: real 0m0.047s | user 0m0.042s | sys 0m0.004s
Why mapM_ putStrLn . filter (isInfixOf term) . lines
?
Another way of writing this would be: putStrLn . unlines . filter (isInfixOf term) . lines
.
mapM_
is a function that takes a Monad (putStrLn) and a foldable element (the filtered lines) to which the function gets mapped upon.
Both seem to me like fine solutions so I just decided to leave it like this.
Java version:
LOC: ±24
Average Time: real 0m0.239s | user 0m0.354s | sys 0m0.050s
Java 8 introduced Files which makes this work nicer than the traditional while getLine(). Apart from that, not much.
Ruby version:
LOC: 10
Average Time: real 0m0.060s | user 0m0.040s | sys 0m0.016s
Ok, maybe the foreach could be line-breaked but this looks pretty clean and readable.
Scala version:
LOC: 18
Average Time: real 0m0.421s | user 0m0.551s | sys 0m0.092s
Python version:
LOC: 12
Average Time: real 0m0.038s | user 0m0.018s | sys 0m0.015s
General conclusions
I think the Ruby version ends up with a very readable core and looks quite clean, but that’s all.
The Python version doesn’t look very interesting apart from the simplified syntax (like Ruby) without all the curly brackets that Java and Scala take but that is a superficial difference.
The Haskell version ends up being, in my view, the cleanest solution. If I would not specify the function types, I would end up with only 6 lines to satisfy the requirements.
The fact that I don’t need to define flows in an if-else way but, instead, with functions and pattern matching, the way it reads so nicely and the way the language deals with IO as something apart (which I find more interesting and challenging)… It’s just something else!
Performance
I used a 4.2MB plain text file for as target file. Here I got a bit surprised with Scala, not just because it took longer than the other languages but also because of its time variation. Sometimes it took almost up to 1.165s, while the other languages always kept a very small and constant time range.
Rating
- Python ± 0m0.038s
- Haskell ± 0m0.047s
- Ruby ± 0m0.060s
- Java ± 0m0.239s
- Scala ± 0m0.421s
Find the source repository here