Sunday 9 August 2015

Knowing Python Part 1

Vote Counting Problem


Let us try and understand how python is used in extracting and presenting useful information from a pile of data. We have a dataset called radishsurvey.txt.

The problem statement: Try to figure out :

1.) Whats the most popular radish variety?
2.) Whats the least popular one?
3.) Has anybody voted twice?

Introduction
We have a survey in txt format which has 300 rows of data. Below is the screenshot for the file. Save this file in the default directory.


Analyses on data

Reading the data:


In the above code we will strip out line by line and split the code into two variables
(i) Name and (ii) Vote and print these two variables.
Note: The strip function is used to strip off the trailing new line ("\n").



Through this code we have made two arrays name[] and vote[] and added the values of the respective variables.

Output for the above code would be:



Lets check for duplication:





From the output it could be inferred that there are a few duplicate values.

So lets clean this dirty data.
1.) Lets create an empty array named voted.
2.) Lets go through each line and take out the names of the voters.
3.) By using capitalize() lets convert the names in capital letters and by using replace() get rid of the        extra empty spaces found between firstname and surname.

By running this code the frauds could be found out.

To make the lines of code shorter we will make use of some user defined functions.




So this is our result.

Source: http://opentechschool.github.io/python-data-intro/core/strings.html

No comments:

Post a Comment