LINEAR REGRESSION MODEL

regression_title

World Bank, Google BigQuery, SQL, Jupyter Notebook, Google Dataproc, Apache Spark, SparkMlib, pyspark

Create a linear regression model using table of factors generated from the WorldBank dataset using BigQuery in Jupyter with pyspark. The factors considered for the model are countries with all the following indicators in the dataset (1995 - 2017):

Population, total
GDP (current US$)
Intentional homicides (per 100,000 people)
Life expectancy at birth, total (years)
Unemployment, total (% of total labor force) (modeled ILO estimate)
International tourism, number of departures
International tourism, expenditures (current US$)
International Tourism, number of arrivals
International tourism, receipts (current US$)

Steps:

(1) Get regression input data: Populate a table in BigQuery of all countries with all relevant factors, from the World Bank indicator table, to create input for a linear regression model.

Compile view of all countries with any of the desired indicators from the WorldBank dataset
Create seperate view for each indicator
Combine all these views together so that the only countries selected have an entry in all the views (ie.have all the factors)
Save as a table in BigQuery to be accessed by dataproc regression_input

regression_input

(2) Setup Jupyter notebook: Create a Dataproc cluster and install Jupyter notebook component

(3) Create regression model: Create a linear regression model using input populated by a table in BigQuery in Jupyter Notebook with pyspark.

Create vector for inputs
Use Big Query connector to read data from BigQuery
Conver from Panda DataFrame to Spark Dataframe
Create clean input dataframe for SparkML using vector function
Construct new LinearRegression object and fit the training data to train similar to Google gestation example
Create model with train and test data (similar to LogisticRegression example in INFS3208 lecture on machine learning)

regression_output1 regression_output2

Shared World

Cloud application solution for overtourism

LINEAR REGRESSION MODEL

Steps: