Shared World

sw_data

Data

This is all of the Big Data aspects of the shared-world project. This includes:

Data Analysis: Scala code and Zeppelin notebook of Spark programming
Match posts to user interests: Scala jar files to output post ordered by user interests using SparkSQL
Map of tourist-resident ratio: the sql queries and table for the interactive map
Linear Regression Model: pyspark code and Jupyter notebook to create linear regression model using SparkML

sw_data_workflow

(1) Setup Dataproc cluster

sw_data_setup_1

sw_data_setup_2

(3) Dataproc Cluster for notebooks

sw_data_setup_3

Cloud application solution for overtourism