Snowpipe Streaming Deep Dive
Disclaimer
Disclaimer
In a previous post, we explored how to do stateful streaming using Sparks Streaming API with the DStream abstraction. Today, I’d like to sail out on a journe...
It was a casual afternoon at the office, writing some code, working on features. Then, all of the sudden, Spark started getting angry. If you run a productio...
Update 10.03.2017 - There is a “gotcha” when using EFS for checkpointing which can be a deal breaker, pricing wise. I have updated the last part of the p...
Disclaimer: Yes, I know the topic is controversial a bit, and I know most of this information is conveyed in Sparks documentation for it’s Streaming API, ...
Disclaimer: What I am about to talk about is an experimental Spark feature. We are about to dive into an implementation detail, which is subject to change...
When working with collections in Scala, or any other high level programming language, one does not always stop to think about the underlying implementation o...
Disclaimer: What we’re about to look at is an implementation detail valid for the current point in time, and is valid for Scala 2.10.x and 2.11.x (not sur...
Update (01.08.2017): Spark v2.2 has recently come out with a new abstraction for stateful streaming called mapGroupsWithState which I’ve recently blogged abo...
Introduction
When you’re dealing with a large amount of data over the wire, you want to be able to reduce your payload size as much as you can. Using JSON is fine for mos...
Every Scala programmer is familiar with the Option[T] monad. Option[T] is a container which either has a value (Some), or doesn’t have a value (None). It is ...
At work we started using Spark Streaming as the underlying framework for a new project. Spark is a fast and general engine for large-scale data processing, w...
Not long ago an interesting question appeared on StackOverflow. We know that generally, mutable structs are considered evil, primarily for the fact that they...