The pandas library provides an easy way to load data from documents hosted locally or remotely, for example with the pandas.read_csv
or pandas.read_table
functions:
import pandas as pd
df = pd.read_csv("my_file.csv")
Pandas will infer the type of each columns of the CSV file and specify the datatype accordingly, making this code perfectly valid. However this
snippet of code does not convey the proper intent of the user, and can raise questions such as:
- What information can I access in
df
?
- What are the names of the columns available in
df
?
These questions arise as there are no descriptions of what kind of data is loaded into the data frame, making the code less understandable and
harder to maintain.
A straightforward way to fix these issues is by providing the schema of the data through the usage of the dtype
parameter.