[ad_1]
Looping is an integral part of any programming language. In Python, an important component of loops is the built-in range
function.
In this detailed guide, we will walk you through the workings of the range
function using examples, and discuss its limitations and their workarounds. Although range
is useful for a broad variety of Python programming tasks, this guide will conclude with a couple of data science use cases for the range
function.
For the purposes of this tutorial, we do assume you have at least some knowledge of Python syntax. If you’ve never worked with Python before, we’d recommend starting with this interactive Python fundamentals course first.
A Brief History of Range in Python
This tutorial focuses on Python 3, but if you’ve worked with Python 2 before some explanation is needed because the meaning of range
changed between these two versions.
The range
function in Python 2 generated a list of numbers that you could iterate through. This process, thus, occupied a significant chunk of memory for large list sizes. The xrange
function in Python 2 returned items through lazy evaluation, meaning that numbers were generated only when required, which used less memory.
The xrange
function from Python 2 was renamed range
in Python 3 and range
from Python 2 was deprecated. In this tutorial, we’re working with the range
function of Python 3, so it doesn’t have the performance issues once associated with range
in Python 2.
Python Range: Basic Uses
Let us first look at a basic use of for
loops and the range
function in Python. Let us print the first five whole numbers.
for i in range(5):
print(i)
0 1 2 3 4
The snippet above loops through the numbers zero to four. Notice that the five was not included in the loop. The basic use of range
is, therefore, to loop through a list of numbers. We will revisit the scope of range
shortly.
There are three arguments we can use with range
: start, stop, and step. We can illustrate these three as follows:
- range(stop): This creates a range of numbers from zero to one less than the stop number, incrementing by one.
- range(start, stop): This creates a range of numbers from the start number to one less than the stop number, incrementing by one.
- range(start, stop, step): This creates a range of numbers from the start number to a number less than the stop number, incrementing by step.
The simple example above used the first way of declaring the range
function. Let us explore the two other ways.
# range(start, stop)
for i in range(3, 8):
print(i)
3 4 5 6 7
Note that the start number is included in the range, whereas the stop number is not.
# range (start, stop, step)
for i in range(1, 10, 3):
print(i)
1 4 7
In this third way of declaring range
, we begin with the start number and then count up by three (the step number) until we reach our stop number.
Range: Data Type
Let’s check the type of object returned by the range
function.
print(type(range(5)))
<class 'range'>
Notice that range
is a type in Python. The default print method of the class prints the range of numbers that the range object will iterate through. Note that the numbers are still not generated — this is due to the memory saving “lazy evaluation” mentioned earlier. The numbers are generated only when they’re actually being used in some way (like being called in the print function as we have above).
print(range(5))
range(0, 5)
Range Objects: Advanced Use
Interestingly, we can access items in a range object through its index, just like we would with a list. The third object in our range is 2.
range(5)[2]
2
Like lists, we can also slice a range object. This returns a new range object!
range(5)[:3]
range(0, 3)
We can reverse a range object too, using the same reversed()
function that is applicable for lists.
# reversed
for i in reversed(range(5)):
print(i)
4 3 2 1 0
Range can be used to generate negative numbers.
# show negative numbers
for i in range(-10, 5, 3):
print(i)
-10 -7 -4 -1 2
We can also define a negative step function to generate numbers in decreasing order, rather than use the reversed
function.
# show negative iteration
for i in range(-10, -17, -2):
print(i)
-10 -12 -14 -16
Note that if you’re using a step argument with range
, it cannot be zero (this would cause an infinite loop and thus throws a ValueError).
Additionally, if counting up from your start argument won’t ever reach your end argument, range
won’t return anything. Note that nothing is printed when we run the code below, because if we start and 17 and count up, we can never reach the specified end argument of 10:
# What if no numbers are in range?
for i in range(17, 10):
print(i)
Range Objects with Float Numbers
The range function does not work with floats. Only integer values can be specified as the start, stop and step arguments.
# floats with python range
for i in range(0.1, 0.5, 0.1):
print(i)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-13-a83306d87fcd> in <module> 1 # floats with python range ----> 2 for i in range(0.1, 0.5, 0.1): 3 print(i) TypeError: 'float' object cannot be interpreted as an integer
If we need to generate a range-like return with floats, though, there are a couple of workarounds.
First, we can define a simple function with three arguments, that increments the start number by step until you reach stop:
# floats with python range
def range_with_floats(start, stop, step):
# Works only with positive float numbers!
while stop > start:
yield start
start += step
for i in range_with_floats(0.1, 0.5, 0.1):
print(i)
0.1 0.2 0.30000000000000004 0.4
We could also use NumPy to get the same result from NumPy’s arange()
function.
import numpy as np
for i in np.arange(0.1, 0.5, 0.1):
print(i)
0.1 0.2 0.30000000000000004 0.4
Where did the number 0.30000000000000004
come from? Only approximations of floating numbers are stored in systems. Therefore, we may need to use the round()
function when working with floating numbers like this if we want to generate a cleaner output.
def range_with_floats(start, stop, step):
while stop > start:
yield round(start, 2) # adding round() to our function and rounding to 2 digits
start += step
for i in range_with_floats(0.1, 0.5, 0.1):
print(i)
0.1 0.2 0.3 0.4
Using Python’s Range Function in Data Science
Reading Large Files
One use of Python’s range function in the context of data science is when we are reading large files.
For example, consider this car import dataset from 1985, provided by UCI. The file is in a CSV format, where values are separated by commas. Download the file and open it with the open()
function. Let us print the first five lines using the .readline()
method.
data_file = open('imports-85.data')
for i in range(5):
print(data_file.readline())
3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,13495 3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.60,168.80,64.10,48.80,2548,dohc,four,130,mpfi,3.47,2.68,9.00,111,5000,21,27,16500 1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.50,171.20,65.50,52.40,2823,ohcv,six,152,mpfi,2.68,3.47,9.00,154,5000,19,26,16500 2,164,audi,gas,std,four,sedan,fwd,front,99.80,176.60,66.20,54.30,2337,ohc,four,109,mpfi,3.19,3.40,10.00,102,5500,24,30,13950 2,164,audi,gas,std,four,sedan,4wd,front,99.40,176.60,66.40,54.30,2824,ohc,five,136,mpfi,3.19,3.40,8.00,115,5500,18,22,17450
The header file describes the column names. The last column contains the price of the car import. If we explore the first few lines of the file, we notice that missing items are stored as a question mark (?
).
There is an empty line between each printed line as they end with the newline character (\n
). We will need to consider this in our analysis below. Let us check how many lines are present in the file. In a UNIX terminal, you can use the command wc -l
with the filename as an argument to count the number of lines. If you are using Jupyter notebooks, you can use the exclamation mark before the command to run a terminal command from within a cell.
!wc -l imports-85.data
205 imports-85.data
With 205 lines of data, let us try to find the car import with the highest price and the row number of the car in the dataset. First, we’ll loop over a range with the file length. Next, read the line with the .readline()
method, strip the newline character at the end of every line and convert the line to a list of items using the split()
function.
The last item of the list is the price of the car import. If the price is missing, we loop over the next item. If the price is higher than our max_price
, we change the value of the max_price
and update the row number, stored in the variable max_price_loc
.
# basic looping in Python
# reading lines in a file (time with a magic function)
data_file = open('imports-85.data')
max_price = 0
max_price_loc = 0
for i in range(205):
row = data_file.readline().rstrip('\n').split(",")
price = row[-1]
# Missing price
if price == '?':
continue
if (max_price < int(price)):
max_price = int(price)
max_price_loc = i + 1
print("Maximum Price: ", max_price)
print("Maximum Price Location: Row number ", max_price_loc)</code>
Maximum Price: 45400 Maximum Price Location: Row number 75
The most expensive car cost $45400 in 1985, which is about $108000 in terms of 2019 dollars adusted for inflation.
Iterating Over Pages When Web Scraping
Another use for the range function in the context of data science is web scraping from certain websites.
Imagine, for example, that we want to pull data from a BBS forum. Often, posts are spread over a large number of pages, with a page number included somewhere in the URL. Rather than entering each page URL one by one, we can enter the URL once and interate through each page by replacing the page number in the URL with each number produced by a range
function.
For instance, if we would like to send a request to a page with the URL format: http://www.website.com/?p=page_number
, we could use range to generate each URL in sequence. In the example below, we get URLs for the first ten pages, but this technique could be used to quickly generate hundreds or thousands of URLs which you can then scrape content from one by one, without actually having to put more than one URL into your code.
for i in range(1, 11):
current_url = 'http://www.website.com/?p={page_num}'.format(page_num = i)
print(current_url)
# Make a request to the URL
http://www.website.com/?p=1 http://www.website.com/?p=2 http://www.website.com/?p=3 http://www.website.com/?p=4 http://www.website.com/?p=5 http://www.website.com/?p=6 http://www.website.com/?p=7 http://www.website.com/?p=8 http://www.website.com/?p=9 http://www.website.com/?p=10
Final Thoughts
In this tutorial, we’ve learned some different ways of using the range()
function, and seen some examples of how it can be useful specifically in data science work.
If you work with data, range()
isn’t likely to be a function you use every day, but there are definitely circumstances where having a solid grasp of how it works can save a ton of time and effort.
[ad_2]
This article has been published from the source link without modifications to the text. Only the headline has been changed.
Source link