Pandas
-
Pandas is a Python library.
-
Pandas is used to analyze data.
-
Pandas allows us to analyze big data and make conclusions based on statistical theories.
-
Pandas can clean messy data sets, and make them readable and relevant.
-
Relevant data is very important in data science.
The Pandas library is one of the most preferred tools for data scientists to do data manipulation and analysis, next to matplotlib for data visualization and NumPy, the fundamental library for scientific computing in Python on which Pandas was built
What Can Pandas Do?
-
Pandas gives you answers about the data. Like:
- Is there a correlation between two or more columns?
- What is average value?
- Max value?
- Min value?
How to start
import pandas as pd
mydataset = {
'cars': ["BMW", "Volvo", "Ford"],
'passings': [3, 7, 2]
}
myvar = pd.DataFrame(mydataset)
print(myvar)
Output
cars passings
0 BMW 3
1 Volvo 7
2 Ford 2
What is a DataFrame?
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
#load data into a DataFrame object:
df = pd.DataFrame(data)
print(df)
Output
calories duration
0 420 50
1 380 40
2 390 45
Notes
The fast, flexible, and expressive Pandas data structures are designed to make real-world data analysis significantly easier, but this might not be immediately the case for those who are just getting started with it. Exactly because there is so much functionality built into this package that the options are overwhelming.
It’s a quick guide through the basics of Pandas that you will need to get started on wrangling your data with Python.
As such, you can use it as a handy reference if you are just beginning their data science journey with Pandas or, for those of you who already haven’t started yet, you can just use it as a guide to make it easier to learn about and use it.