Apache Spark
-
Software Development
Apache Spark: Unleashing Big Data Power
1. Introduction Apache Spark is a powerful open-source, distributed computing system that has become a cornerstone in the world of…
Read More » -
Software Development
Where is Apache Spark heading?
I watched (COVID19-era version of “attended”) the latest spark Summit and in one of the keynotes Reynold Xin from Databricks,…
Read More » -
Enterprise Java
Long Live ETL
Extract transform load is process for pulling data from one datasystem and loading into another datasystem. Datasystem involved are called…
Read More » -
Enterprise Java
Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 2)
In part 1 we have learned how to test data lineage info collection with Spline from a Spark shell. The same can…
Read More » -
Enterprise Java
Exploring the Spline Data Tracker and Visualization tool for Apache Spark (Part 1)
One interesting and promising Open Source project that caught my attention lately is Spline, a data lineage tracking and visualization tool…
Read More » -
Enterprise Java
Insights from Spark UI
As continuation of anatomy-of-apache-spark-job post i will share how you can use Spark UI for tuning job. I will continue with same…
Read More » -
Enterprise Java
Anatomy of Apache Spark Job
Apache Spark is general purpose large scale data processing framework. Understanding how spark executes jobs is very important for getting most of…
Read More » -
Enterprise Java
Custom Logs in Apache Spark
Have you ever felt the frustration of Spark job that runs for hours and it fails due to infra issue.…
Read More » -
Core Java
Apache Spark RDD and Java Streams
A few months ago, I was fortunate enough to participate in a few PoCs (proof-of-concepts) that used Apache Spark. There,…
Read More »