DATA ANALYSIS

analysis_title

Google BigQuery, Zepplin Notebook, Google Cloud Dataproc, Apache Spark, Scala, JDBC

Use Spark and scala to perform general data analysis of the data in Zeppelin notebook on Dataproc cluster.

(1) Number of posts in each country

sw_analysis_output_1

(2) Top interests of country

sw_analysis_output_2

(3) Top interest of users

sw_analysis_output_3

(4) Top interests of posts

sw_analysis_output_4

Load JDBC dependency
Import everything needed and start Spark Session
Setup connection to database
Go through each of the four analysis
- save data from database into dataframe and perform aggregation
- use z.show() to display results with Zeppelin tool

Shared World