Sunday 9 August 2015

Knowing Python Part 3

Reading CSV files

The data for this part can be download from http://openflights.org/data.html .
Lets start working on this csv file.

Lets import and open this csv file in python.


Now lets fetch the airport names for some explicitly defined countries.Lets take the example of Australia.




Explanation of the above code:
1.) First create an empty dictionary Airport.
2.) Then each row is exported into an array line[].
3.) Now in the if statement we check the third column (containing country name) and the first column      (airport name).If the dictionary has country name as key in it we would append the value to it, else      create a new key as a new country.
4.) At last print the values assigned to the dictionary key="Australia" to see the names of the airport        in Australia.

Airline Route Histogram

Now lets plot a histogram showing the distribution of distances over each flight schedule.
To do so we need to follow the below mentioned steps:

1.) Read the airport file and build a dictionary mapping unique ID of airport to the latitude and        longitude which will help in looking up the location of each airport by its ID.

2.) Read the routes files and get the IDs of the source and destination airports. Using the                latitude and longitude, calculate the length of the route and append it to a list of all route            lengths.

Now in order to measure the distance we need a new module called "geo_distance"



Output


The Final Code
import numpy as np
import matplotlib.pyplot as plt
latitudes = {}
longitudes = {}
f = open("airports.dat")
for row in csv.reader(f):
    airport_id = row[0]
    latitudes[airport_id] = float(row[6])
    longitudes[airport_id] = float(row[7])
distances = []
f = open("routes.dat")
for row in csv.reader(f):
    source_airport = row[3]
    dest_airport = row[5]
    if source_airport in latitudes and dest_airport in latitudes:
        source_lat = latitudes[source_airport]
        source_long = longitudes[source_airport]
        dest_lat = latitudes[dest_airport]
        dest_long = longitudes[dest_airport]
        distances.append(geo_distance.distance(source_lat,source_long,dest_lat,dest_long))
plt.hist(distances, 100, facecolor='r')
plt.xlabel("Distance (km)")
plt.ylabel("Number of flights")

       Explanation
We need to read the airport file (airports.dat) and build a dictionary mapping the unique airport ID to the geographical coordinates (latitude & longitude.)
We need to read the routes file (routes.dat) and get the IDs of the source and destination airports. And then look up for the latitude and longitude based on the ID . Finally, using these coordinates, calculate the length of the route and append it to a list distances = []of all route lengths.
At last a histogram is plotted based on the route lengths, to show the distribution of different flight distances.
Output :



No comments:

Post a Comment