Overview Develop Deploy Data

DATA ANALYSIS

analysis_title

Google BigQuery, Zepplin Notebook, Google Cloud Dataproc, Apache Spark, Scala, JDBC

Use Spark and scala to perform general data analysis of the data in Zeppelin notebook on Dataproc cluster.

(1) Number of posts in each country

sw_analysis_output_1

(2) Top interests of country

sw_analysis_output_2

(3) Top interest of users

sw_analysis_output_3

(4) Top interests of posts

sw_analysis_output_4

Code flow:

  1. Load JDBC dependency
  2. Import everything needed and start Spark Session
  3. Setup connection to database
  4. Go through each of the four analysis
    • save data from database into dataframe and perform aggregation
    • use z.show() to display results with Zeppelin tool