×

Python, DataScience and PySpark

Home Python, DataScience and PySpark

Card image cap

In this course you'll learn how to use Spark from Python! Spark is a tool for doing parallel computation with large datasets and it integrates well with Python. PySpark is the Python package that makes the magic happen. You'll use this package to work with data about flights from Portland and Seattle. You'll learn to wrangle this data and build a whole machine learning pipeline to predict whether or not flights will be late. Get ready to put some Spark in your Python code and dive into the world of high performance machine learning!

• Learn about Apache Spark and the Spark 2.0 architecture • Build and interact with Spark DataFrames using Spark SQL • Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively • Read, transform, and understand data and use it to train machine learning models • Build machine learning models with MLlib and ML • Learn how to submit your applications programmatically using spark-submit • ML Algorithms: common learning algorithms such as classification, regression, clustering, and collaborative filtering • Featurization: feature extraction, transformation, dimensionality reduction, and selection • Pipelines: tools for constructing, evaluating, and tuning ML Pipelines • Persistence: saving and load algorithms, models, and Pipelines

Knowledge Prerequisites • Big Data and Hadoop • Basic Python data structures • Basic knowledge of Pandas dataframes and SQL • Entry-level Data Science • Anyone interested in Machine Learning • Any intermediate level people who know the basics of machine learning, including the classical algorithms like linear regression or logistic regression, but who want to learn more about it and explore all the different fields of Machine Learning. • Any people who are not that comfortable with coding but who are interested in Machine Learning and want to apply it easily on datasets. • Any data analysts who want to level up in Machine Learning. • Any people who are not satisfied with their job and who want to become a Data Scientist. • Any people who want to create added value to their business by using powerful Machine Learning tools

Software Prerequisites • Apache Spark (Downloadable from http://spark.apache.org/downloads.html) • A Python distribution containing IPython, Pandas and Scikit-learn • PySpark • Anaconda with Python3.6 • www.anaconda.com



Course Outline


1. Python Introduction
2. Python_Function
3. Object Oriented Programming
4. File_Handling_Regular_Expression
5. DataScience
6. Pandas_PySpark
7. PySpark_DataFrame_Case_Study
8. PySpark_DataFrame
9. PySpark_Machine_Learning

Online Corporate Plans


Free


  • 1 Live / Recorded Session
  • Two Sample Modules PDF
  • Experience Trainer
    (2-7 Years)
  • Learner Dashboard
  • Sample Quiz
  • Customised Content
  • Free Ebook For Reference
  • 24/7 Lab Support


Silver

Rs.3000/Hour Rs.6000/Hour

  • Up to 20 Participants
  • Freshers Level Training
    (0-2 Years)
  • Experience Trainer
    (2-7 Years)
  • Learner Dashboard
  • 10 Quiz and Online Test
  • Customised Content
  • Free Ebook For Reference
  • Experienced Trainer
  • 24/7 Lab Support

Gold

Rs.5000/Hour Rs.10000/Hour

  • Up to 20 Participants
  • Intermediate Level Training
    (2-7 Years)
  • Experience Trainer
    (7-15 Years)
  • Learner Dashboard
  • 15 Quiz and 2 Online Test
  • Customised Content
  • Free Ebook For Reference
  • Experienced Trainer
  • 24/7 Lab Support

Diamond

Rs.7000/Hour Rs.14000/Hour

  • Up to 20 Participants
  • Expert Level Training
    (7-20 Years)
  • Experience Trainer
    (15-25 Years)
  • Learner Dashboard
  • 30 Quiz and 3 Online Test
  • Customised Content
  • Free Ebook For Reference
  • Experienced Trainer
  • 24X7 Lab Support

Latest Corporate Courses


Snow
ChatBot

Hello! How can I help you?