When building machine learning models in Python, you will inevitably come across pandas, a data-handling library. This article is a set of notes that I wrote while learning Pandas.
What is Pandas?
Pandas is a data-handling library for Python. The library enables you to read data in a file (for example, a .csv file or a .json file) and load it into a dataframe object. You then interact with the dataframe to perform different operations on the data.
What is a DataFrame?
A dataframe is a rectangular grid that represents data. This grid is made up of columns and these columns are vectors of data. Each value in the vector represents a particular entry and can be referenced using an index. Values across different columns but with the same index form a row. Columns can also be called Series.
Under the hood, the columns are stored as NumPy arrays. Therefore, a column can only store one type of data, but different columns can store different kinds of data.