Overview Develop Deploy Data

LINEAR REGRESSION MODEL

regression_title

World Bank, Google BigQuery, SQL, Jupyter Notebook, Google Dataproc, Apache Spark, SparkMlib, pyspark

Create a linear regression model using table of factors generated from the WorldBank dataset using BigQuery in Jupyter with pyspark. The factors considered for the model are countries with all the following indicators in the dataset (1995 - 2017):

Steps:

(1) Get regression input data: Populate a table in BigQuery of all countries with all relevant factors, from the World Bank indicator table, to create input for a linear regression model.

regression_input

(2) Setup Jupyter notebook: Create a Dataproc cluster and install Jupyter notebook component

(3) Create regression model: Create a linear regression model using input populated by a table in BigQuery in Jupyter Notebook with pyspark.

regression_output1 regression_output2