Programming with Data: Foundations of Python and Pandas

Course Description

Whether in R, MATLAB, Stata, or Python, modern data analysis, for many researchers, requires some kind of programming. The preponderance of tools and specialized languages for data analysis suggests that general purpose programming languages like C and Java do not readily address the needs of data scientists; something more is needed.

In this training, you will learn how to accelerate your data analyses using the Python language and Pandas, a library specifically designed for interactive data analysis. Pandas is a massive library, so we will focus on its core functionality, specifically, loading, filtering, grouping, and transforming data. Having completed this workshop, you will understand the fundamentals of Pandas, be aware of common pitfalls, and be ready to perform your own analyses. and format conversion for the future. From there, we’ll determine not only the quality of the data and its uses, but also how best to apply it to achieving your business objectives. In short, we’ll help you isolate the signal from the noise.

Summary

Prerequisites
Intermediate-level Python

Length
2.5 hours

Location
Live on oreilly.com

Enroll

What you'll learn —
and how you
can apply it

  • Use the Split-Apply-Combine technique to calculate grouped summary statistics like mean, median, and standard deviation on your data.
  • Load data from flat files, numpy, and native Python data structures and compute on them using Pandas
  • Avoid common pitfalls and “gotchas” in Pandas by understanding the conceptual underpinnings common to most data manipulation libraries and environments

This training course is for you if:

You have a solid understanding of Python programming

You want to learn how to load and transform tabular data in Python using Pandas

You want to accelerate your understanding of Pandas by learning general principles and requirements common to tabular data manipulation frameworks

Prerequisites

Intermediate-level programming ability in Python. Attendees should know the difference between a dict, list, and tuple. Familiarity with control-flow (if/else/for/while) and error handling (try/catch) are required. No statistics background is required.

Course Set-up:

Step-by-step instructions for setting up a working Python environment with using Anaconda are available here. You will need a working environment to complete the exercises in Jupyter notebook. Alternatively, you may view the notebooks here.

Recommended Preparation:

For a refresher of Panda and Python data analysis fundamentals, see Pandas Data Analysis with Python Fundamentals (video) and Pandas for Everyone: Python Data Analysis (book)

If you are comfortable with the topics covered in Intermediate Python then you will likely be able to follow along with the Live Training. For intermediate to advanced Python, see Fluent Python.

Recommended Follow-up:

Python for Data Analysis, 2nd Edition was written by the principal author of Pandas. It covers much of the material in this Live Training. The Python Data Science Handbook demonstrates usage of numpy, Pandas, and Jupyter along with machine learning methods.