How to read a file and search specific word locations in Python

posted 5 min read

Introduction

Solving problems is a great exercise to enhance our logical thinking. It generally improves our problem-solving skills not only in computer programming but also in our everyday life.

In this article, we will be exploring how to read the contents of a text file, line by line, and search for a given word and output its location on whether it is the beginning, end, or somewhere in between in the given string. You might be into scraping web pages and want to gather statistics to see which words are there, their frequencies, locations, adjacent words, etc.

This problem has similarities to one of the programming exercises in the HackerRank [1] website.

Problem

You have a string and a specific word to look for within that string. You need to meet the following conditions.

  • If the word is at the beginning of the string, output "start".
  • If the word is at the end of the string, output "end".
  • If the word is somewhere between the start and end, output "in-between".
  • If the word is both at the beginning and the end of the string, output "start-end".
  • If none of these conditions are met, output "not found".

For example, the string is: "Chess is a wonderful board game", and the word to find is "game". Based on the conditions above, the output is "end" because the word can be found at the end of the string.

Here is the problem to solve.

Given a text file programming.txt, read each line, determine the correct output, and display it in the terminal if the word to find is "hackerrank". Also just ignore the case such as "HackerRank" is the same as "hackerrank", and similar cases.

programming.txt

hackerrank challenges are fun
solving problems on hackerrank
hackerrank is a great platform
love the hackerrank community
learning new concepts daily
Hackerrank helps improve coding skills
hackerrank is our journey's start and end hackerrank

Solutions

I will show just two approaches to solve this, using Python RE [2] or regular expression and string [3] methods.

Solution 1 using RE (regular expression)

  • We need a function using RE to take a string and a word to search and output texts based on the problem conditions given.
  • We need a function to read a text file given a file path or name.

Function to read a string using RE

def process_string_sol_1(line_string, text_to_search):
    """Output description depending on the location of text_to_search."""
    if (re.match(f"^{text_to_search}", line_string, re.IGNORECASE)
            and re.search(f"{text_to_search}$", line_string, re.IGNORECASE)):
        print('start-end')
    elif re.match(f"^{text_to_search}", line_string, re.IGNORECASE):
        print('start')
    elif re.search(f"{text_to_search}$", line_string, re.IGNORECASE):
        print('end')
    elif re.search(f"{text_to_search}", line_string, re.IGNORECASE):
        print('in-between')
    else:
        print('not found')
  • The pattern "^text_to_search" searches this text at the beginning of the given string.
  • The pattern "text_to_search$" searches this text at the end of the given string.
  • The match function searches a pattern only at the beginning of the string.
  • The search function searches everywhere.

Function to read the text file

def read_file(fn, text_to_search):
    """Reads the file fn."""
    with open(fn, 'r') as f:
        for line in f:
            line = line.rstrip()  # remove new line at the right end
            process_string_sol_1(line, text_to_search)
  • The "read_file" function takes two arguments, the file name to read and the text to search which will be used in calling the function "process_string".
  • It opens the file for reading with "r" mode, meaning the contents of file will not be modified.
  • Once a string line is read, it is then passed to the string processor which is responsible for the output status of the text to search.

Full script code

word_searcher.py

import re


def process_string_sol_1(line_string, text_to_search):
    """Output description depending on the location of text_to_search."""
    if (re.match(f"^{text_to_search}", line_string, re.IGNORECASE)
            and re.search(f"{text_to_search}$", line_string, re.IGNORECASE)):
        print('start-end')
    elif re.match(f"^{text_to_search}", line_string, re.IGNORECASE):
        print('start')
    elif re.search(f"{text_to_search}$", line_string, re.IGNORECASE):
        print('end')
    elif re.search(f"{text_to_search}", line_string, re.IGNORECASE):
        print('in-between')
    else:
        print('not found')


def read_file(fn, text_to_search):
    """Reads the file fn."""
    with open(fn, 'r') as f:
        for line in f:
            line = line.rstrip()  # remove new line at the right end
            process_string_sol_1(line, text_to_search)


def main():
    fn = 'programming.txt'
    text_to_search = 'hackerrank'
    read_file(fn, text_to_search)


if __name__ == '__main__':
    main()

You can run the script in the terminal with:

python word_searcher.py

The output would look like this.

start
end
start
in-between
not found
start
start-end

Solution 2 using string methods

The "read_file" function can be reused in this solution number 2. I will only create the "process_string" function.

def process_string_sol_2(line_string, text_to_search):
    """Output description depending on the location of text_to_search."""
    line = line_string.lower()  # sets all characters to lowercase
    word = text_to_search.lower()

    if line.startswith(word) and line.endswith(word):
        print('start-end')
    elif line.startswith(word):
        print('start')
    elif line.endswith(word):
        print('end')
    elif word in line:
        print('in-between')
    else:
        print('not found')
  • It converts the arguments line_string and text_to_search to lowercase as we are ignoring case sensitivity.
  • It uses the string methods "startswith()" and "endswith()" to detect the location of the word or text to search.

To use this function, just take the full code in Solution 1, and use "process_string_sol_2" function instead of "process_string_sol_1".

Summary

This problem is interesting to learn as it allows us to know how to read a text file line by line. It also exposes us to the simple regular expression patterns and the use of string methods to detect if a word starts or ends the given string.

References

If you read this far, tweet to the author to show them you care. Tweet a Thanks

More Posts

How to Fix the TypeError: cannot use a string pattern on a bytes-like object Error in Python

Cornel Chirchir - Oct 29, 2023

Read all files in a directory in Python

Ferdy - Oct 24, 2023

How to create an Income and Expense App in Streamlit

Brando - Nov 19, 2023

How to Fix the OpenCV Error: (-215:Assertion failed)size.width>0 & size.height>0 in function imshow Error in Python

Cornel Chirchir - Nov 7, 2023

[PYTHON] Zipfile.badzipfile: file is not a zip file [SOLVED]

Muzzamil Abbas - Feb 14
chevron_left