In this course you'll learn how to use Spark from Python! Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that makes the magic happen. You'll use this package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be late. Get ready to put some Spark in your Python code and dive into the world of high performance machine learning!
• Learn about Apache Spark and the Spark 2.0 architecture • Build and interact with Spark DataFrames using Spark SQL • Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively • Read, transform, and understand data and use it to train machine learning models • Build machine learning models with MLlib and ML • Learn how to submit your applications programmatically using spark-submit • ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering • Featurization: feature extraction, transformation, dimensionality reduction, and selection • Pipelines: tools for constructing, evaluating, and tuning ML Pipelines • Persistence: saving and load algorithms, models, and Pipelines
Knowledge Prerequisites • Big Data and Hadoop • Basic Python data structures • Basic knowledge of Pandas dataframes and SQL • Entry-level Data Science • Anyone interested in Machine Learning • Any intermediate level people who know the basics of machine learning, including the classical algorithms like linear regression or logistic regression, but who want to learn more about it and explore all the different fields of Machine Learning. • Any people who are not that comfortable with coding but who are interested in Machine Learning and want to apply it easily on datasets. • Any data analysts who want to level up in Machine Learning. • Any people who are not satisfied with their job and who want to become a Data Scientist. • Any people who want to create added value to their business by using powerful Machine Learning tools
Software Prerequisites • Apache Spark (Downloadable from http://spark.apache.org/downloads.html) • A Python distribution containing IPython, Pandas and Scikit-learn • PySpark • Anaconda with Python3.6 • www.anaconda.com