A River of Bytes: Notes from the first day of PNWScala 2014

Here are some of my impressions from the 1st day of PNWScala 2014. This is my first Scala conference and I'm delighted to see it well organized and well attended.

Rapture: The Art of the One-Liner -- Jon Pretty (Propensive)

An introduction to the Rapture libraries for IO and related parsing tasks via a series of evocative and idiomatic "one liners". While no responsible Scala programmer would actually write them as one liners the point was well made: the JSON parsing code was very elegant. Jon's remarks on error handling led me to posit a litmus test for this kind of code being "industrial strength".

I want to be able to write similarly idiomatic and elegant code that allows me to process a large number of documents and:

Return representations of the valid documents (say in terms of case classes)
Return representations of the invalid documents that:

Identify the invalid document by some identifier and/or content
Explain what's invalid about it

For bonus points, while processing, give incremental counts of valid/invalid documents so I can decide that my failure rate is unacceptable (and something fundamental has gone wrong) or is acceptable and I can either discard or subsequently fix and reprocess the failure cases.

In a followup conversation Jon showed me how to do this with Try so I think it's plausible that Rapture actually meets my test.

The First Hit is Always Free: A Skeptic's Look at scalaz' "Gateway Drugs" -- Brendan McAdams (Netflix)

This was an interesting view of scalaz "from the outside", doing a nice job of explaining why scalaz may be of interest to more than just functional programming researchers. It started with a tour of some of the usual Scala and scalaz approaches to error handling -- Option and Validation -- and why they're hard to work with (not Mondaic) . Then Brendan explained scalaz's disjunction operator and showed how it makes it easier to accumulate error information while processing data. (The timing of this was interesting, coming on the heels of my error processing concerns about Rapture as described above.)

Types out of patmat -- Stephen Compall (McGraw Hill Financial)

I think everybody in the audience who hasn't worked among type theory researchers found this one really tough. I have worked with type theory researchers, but that was 22 years ago, which seems to be about 21 years too long for understanding this talk. My takeaway is that I should learn more about the subtleties of Scala pattern matching, and that I shouldn't expect it to work very well if the type of what I'm matching is complex enough to be interesting to a type theory researcher. I suspect this was a really interesting and informative talk for people who had the necessary background. The high-level warnings are useful to every Scala programmer.

Don't Cross the Streams -- Marc Millstone (Socrata)

This was focused on aspects of counting records in a stream, with adequate or at least understood accuracy, while dealing with bounded memory. The approaches are embodied in the Tallyho project. Marc talked about three kinds of approaches, based respectively on engineering, math and ignorance. There were a number of interesting techniques, some based on hashing, and all essentially unfamiliar to me. Stream processing seems to be an increasingly important application of Scala, especially with the success of Spark streaming. This talk also left me wanting to learn more about stream-lib, Algebird and Shapeless, and to check out the Highly Scalable Blog.

Apache Spark I: From Scala Collections to Fast Interactive Big Data with Spark -- Evan Chan (Socrata)

A nice overview of Apache Spark from a Scala perspective, emphasizing the smooth transition from Scala serial to Scala parallel collections (Scala 2.10), and then to the RDD as a distributed collection, and Spark laziness as a natural extension of Scala lazy collections (Streams and Iterators). This material was mostly very familiar to me but Evan did a great job of emphasizing how natural it all was and of explaining it.

It was also interesting to hear that the entire Socrata backend is implemented in Scala.

Apache Spark II: Streaming Big Data Analytics with Team Apache, Scala & Akka -- Helena Edelson (Datastax)

At first this seemed like an overview of Spark streaming, with which I was already quite familiar, but it turned out to be more about building quite complex streaming-oriented Spark applications that also used Akka. There were a number of interesting points:

I had never heard of the Lambda Architecture, of which this approach is an instance
A LOT of audience members were using Apache Kafka
A significant but much smaller number were using Apache Cassandra
An apparently well known discussion on the apache-spark-user-list of why Spark is based on Scala was summarized as: funcitional programming, JVM leverage, function serializability, static typing and the REPL
The KillrWeather project is a reference application whose design is based on the ideas presented
It's currently difficult to combine stream and historical data in a single application because SparkContext is not serializable

Miniboxing: JVM Generics without the overhead -- Vlad Ureche (EPFL)

Miniboxing is essentially efficient specialization of generics for primitive types that fit in a long integer by actually fitting them in a long integer, with sometimes dramatic performance improvements and/or code size reductions.

It is available to use via a compiler plugin that is undergoing active development and improvement. A comment from the back of the room: "That is an insane amount of documentation for an in-development compiler plugin".

The value of this was illustrated using an image processing example.

Some background reading comes as a post called "Quirks of Scala specialization" by Alex Prokopec.

Towards a Safer Scala -- Leif Wickland (Oracle)

This talk explored the various approaches to helping ensure that a project's Scala code is safe and correct:

scalac command line switches such as -deprecation , -Xlint and -Xfatal-warnings
FindBugs and the associated sbt tool
Scalastyle -- used in Martin Odersky's course
Abide
WartRemover
Linter
Scapegoat

My take-away was the the compiler parameters, Scalastyle and WartRemover were at the point where I should actually consider using them, and the remainder were worth watching.

A River of Bytes

Saturday, November 15, 2014

Notes from the first day of PNWScala 2014