In this part, I will talk about what is dataset, data frame (not Pandas data frame!)and RDD all of which are considered as SparkAPI. Data frame was introduced after RDD and dataset is the newest format that introduced by Spark and it enjoy both the benefits from RDD and data…

As we mentioned in part 1, there are two types of operations on RDD which are actions and transformations. In this section, we will discuss actions.

Before get started, we need to know SparkContext which is the main entry point for any Spark features. It represents the connection to a…

Before knowing this you need to know how normally data are flowed from storage to CPU.

Firstly, data are stored in hard disk and it will be loaded to RAM which is an extremely slow process. Then the data will be loaded to Cache and then to registers. …

Cedric Yang

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store