Sunday, October 9, 2016

Learning Spark with Java

In a recent post I discussed the history and motivation of my LearningSpark project on GitHub. While that project is mostly based on the Scala APIs to Apache Spark, I explained why I had begun to explore the Java APIs as well. I also predicted that I would soon introduce a separate project, based solely on Maven and Java, to continue the Java exploration: most Java programmers are much more comfortable with Maven than with sbt, and a separate project allows me to choose the Java version appropriately.

The new learning-spark-with-java project on GitHub is the result. It started with a copy of the examples on the original project, but since I've now adopted Java 8, I rewrote the examples to make use of the latter's lambda expressions, perhaps ironically making the code now look more like the original Scala code.

I'll proceed with this project using the guidelines I listed in the LearningSpark project when I branched out into Java. I will almost definitely not:

  1. Rush to catch up with the Scala examples,
  2. Keep the two sets of examples perfectly (or even well) matched,
  3. Branch out into Python and R as well (seriously, I have no interest in doing this.)

I'll probably still focus on the Scala examples more, as new features seem to mature a little faster in the Scala API. I am unlikely to add to the Java examples in the LearningSpark project, and if they get in the way or create confusion, I may eventually delete them. As always, feedback is welcome, and I'm especially curious to see whether the community finds this project as useful as some people obviously found the earlier one.

No comments:

Post a Comment