Using .data files from UCI repository

Saima Sharleen
2 min readMay 17, 2021

--

Image from UCI Machine Learning Repository

We will learn how to use the data sets from UCI that come with the .data file in this brief article.
Downloading data from UCI repository:

There are some interesting data sets available for free in the UCI repository. You can use it to hone your analytical abilities.

To begin, we will download the .data file from the University of California, Irvine repository. We will download the iris dataset for this article. You may use the same procedure to download any dataset.

Go to the data folder
Download the iris.data file

After downloading it, you can open it with notepads or microsoft excel.

Opened the .data file using notepad

Now, we will try to use it in jupyter notebook. Atfirst, we will import pandas and then will use read_csv() to read the data into a dataframe.

However, as we can’t see the column names so now we will add the column names. To do that, we will copy the attribute names from the attribute information kinda like this:

Getting attribute information from the uci site

Now, we will add the column names to your DataFrame with the .columns property on the DataFrame like this:

Here is the full code from jupyter notebook if you want to try it by your ownself.

https://gist.github.com/da785d826e1c0f43b1ec4608c6528ccd.git

--

--

Saima Sharleen

I explore the wonderful world of data science, machine learning and deep learning. I write informative articles and projects. Keep learning!!