28Apr
Question-Answering Application with Python/Django. Cover
Question-Answering Application with Python/Django. Cover

In this article, I am going to show you one of the best ways to improve your Back-End skills without caring about Front-End. The great designed apps always make a positive impact on me, so I used to spend too much time on the design part of my applications.  However, I like to work with the complexity of Back-End so I don’t have time to learn the inner workings of Front-End development. At the same time, I can’t start to build Back-End without finishing Front-End for some reason. As a solution, I started to build random apps by using CodePen snippets and it was really amazing with those UI/UX designs. Additionally, you’ll improve your creative thinking skills while trying to find app ideas by using CodePen designs.

A few days ago, I used a simple UI “search-box” design to create a Question-Answering Application. Even this kind of small designs can be turned into an application. By building this app you’ll learn a new Python package to implement a QA System on your own private data, become familiar with parser libraries of Python such as BeautifulSoup and integration of AJAX with Django. Let me give you a spoiler of the algorithm, we are going to crawl data for a particular question from the internet and pass them into the pre-trained Deep Learning model to find the exact answer.

Before we start you need a basic understanding of Django and Ubuntu to run some important commands. If you’re using other operating systems, you can download Anaconda to make your work easier.

Installation and Configuration

To get started, create and activate a virtual environment by following commands:

virtualenv env
. env/bin/activate

Once the environment activated, install Django, then create a new project named answerfinder and inside this project create an app named search:

pip install django
django-admin startproject answerfinder
cd answerfinder
django-admin startapp search

Don’t forget to configure your templates and static files inside settings.py

Ajax Request with Django

Create new a new HTML file named search.html and copy-paste everything inside following CodePen snippet.

Animated Search Box – Click to see the Pen

Now, let’s make small changes in the HTML file and add Ajax in the JavaScript file to get the input value.  Initially, we need to define the method of the form and add id into search input to get data by id. Then, we are going to implement the ajax function inside the submitFn() function that handles the submission of the form.

search.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="/static/css/main.css">
    <title>QA Application</title>
</head>
<body>
    <form method="POST" onsubmit="submitFn(this, event);">
        {% csrf_token %}
        <div class="search-wrapper">
            <div class="input-holder">
                <input type="text" id="question" class="search-input" placeholder="Ask me anything.." />
                <button class="search-icon" onclick="searchToggle(this, event);"><span></span></button>
            </div>
            <span class="close" onclick="searchToggle(this, event);"></span>
            <div class="result-container">
            </div>
        </div>
    </form>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
    <script src="/static/js/main.js"></script>
</body>
</html>

main.js

function searchToggle(obj, evt){
    var container = $(obj).closest('.search-wrapper');

    if(!container.hasClass('active')){
            container.addClass('active');
            evt.preventDefault();
    }
    else if(container.hasClass('active') && $(obj).closest('.input-holder').length == 0){
            container.removeClass('active');
            // clear input
            container.find('.search-input').val('');
            // clear and hide result container when we press close
            container.find('.result-container').fadeOut(100, function(){$(this).empty();});
    }
}

function submitFn(obj, evt){
          
    $.ajax({
            type:'POST',
            url:'/',
            data:{
              question:$('#question').val(), // get value inside the search input
              csrfmiddlewaretoken:$('input[name=csrfmiddlewaretoken]').val(),
            },
            success:function(json){
            console.log(json)
            $(obj).find('.result-container').html('<span>Answer: ' + json.answer + '</span>');
            $(obj).find('.result-container').fadeIn(100);
            
        
            },
            error : function(xhr,errmsg,err) {
            console.log(xhr.status + ": " + xhr.responseText); // provide a bit more info about the error to the console
            console.log($('#total').text())
        }
    });

    evt.preventDefault();
}

As you see, I removed the body of the submitFn() function and implemented ajax to get the input value. Since we are making POST request it is important to include csrf token to make it secure. Once the request succeeds, we’ll get a response as a JSON to display it in the template.

Next, we should create a view that checks if a given input or question is taken, and return a response as JSON.

views.py

from django.http import JsonResponse

def search_view(request):
    if request.POST:
        question = request.POST.get('question')
        data = {
        'question': question
        }
        return JsonResponse(data)
    return render(request, 'search.html')

urls.py

from django.contrib import admin
from django.urls import path
from search.views import search_view

urlpatterns = [
    path('admin/', admin.site.urls),
    path('', search_view, name="search"),
]

Now, if you submit the form, the question will show up in the console of your browser with 200 status code which means the request was successful.

Question-Answering System

I found an article in Medium that explains the Question-Answering system with Python. It’s an easy-to-use python package to implement a QA System on your own private data. You can go and check for more explanation from the link below.

How to create your own Question-Answering system easily with python

The main reason for using this package is to display the exact answer to the user, even you can create a chatbot by using this deep learning model. Let’s first install the package then we’ll look closer to the program.

pip install cdqa pandas

Once installation completed, create a new python file just for testing and download the pre-trained models and data manually by using the code block below:

import pandas as pd
from ast import literal_eval

from cdqa.utils.filters import filter_paragraphs
from cdqa.utils.download import download_model, download_bnpp_data
from cdqa.pipeline.cdqa_sklearn import QAPipeline

# Download data and models
download_bnpp_data(dir='./data/bnpp_newsroom_v1.1/')
download_model(model='bert-squad_1.1', dir='./models')

# Loading data and filtering / preprocessing the documents
df = pd.read_csv('data/bnpp_newsroom_v1.1/bnpp_newsroom-v1.1.csv', converters={'paragraphs': literal_eval})                       
df = filter_paragraphs(df)

# Loading QAPipeline with CPU version of BERT Reader pretrained on SQuAD 1.1                   
cdqa_pipeline = QAPipeline(reader='models/bert_qa.joblib')

# Fitting the retriever to the list of documents in the dataframe
cdqa_pipeline.fit_retriever(df)

# Sending a question to the pipeline and getting prediction
query = 'Since when does the Excellence Program of BNP Paribas exist?'
prediction = cdqa_pipeline.predict(query)

print('query: {}\n'.format(query))
print('answer: {}\n'.format(prediction[0]))
print('title: {}\n'.format(prediction[1]))
print('paragraph: {}\n'.format(prediction[2]))

the output should like this:

Question-answering system's code output
Question-answering system’s code output

It prints the exact answer and paragraph that includes the answer.

We are going to crawl the content of a few webpages later in this post to use them as data.

Basically, when a question sent to the system, the Retriever will select a list of documents from the crawled data that are the most likely to contain the answer. It computes the cosine similarity between the question and each document in the crawled data.

After selecting the most probable documents, the system divides each document into paragraphs and send them with the question to the Reader, which is basically a pre-trained Deep Learning model. The model used was the Pytorch version of the well known NLP model BERT.

Then, the Reader outputs the most probable answer it can find in each paragraph. After the Reader, there is a final layer in the system that compares the answers by using an internal score function and outputs the most likely one according to the scores which will the answer of our question.

Here is the schema of the system mechanism.

Question-answering system's schema
Question-answering system’s schema

You have to include pre-trained models in your Django project and to achieve that you can run download functions directly from views or just simply copy-paste the models directory into your project.

Crawling Data with Beautiful Soup

We are going to use Beautiful Soup to crawl the content of the first 3 webpages from Google’s search results to get some information about the question because the answer probably locates in one of them. Basically, we need to perform a google search to get these links, and instead of putting so much effort for such a trivial task, google package has one dependency on Beautiful Soup library that can find links of all the google search result directly.

Install the beautiful soup and google package by following commands:

pip install beautifulsoup4
pip install google

Try to run the following code snippet in a different file to see the result:

from googlesearch import search 

for url in search("Coffee", tld="com", num=10, stop=5, pause=2): 
	print(url) 

Once we get the links it’s time to create a new function that will crawl the content of these webpages. The data frame must be (CSV) in a specific structure so it can be sent to the cdQA pipeline. At this point, we can use the pdf_converter function of the cdQA library, to create an input data frame from a directory of PDF files. Therefore, I am going to save all crawled data in PDF files for each webpage. Hopefully, we’ll have 3 pdf files in total (can be 1 or 2 as well). Additionally, we need to name these pdf files, so by using enumerate() function, these files will name by their index numbers.

If I summarize the algorithm it will get the question form the input, search it on google, crawl the first 3 results, create 3 pdf files from the crawled data and finally find the answer by using a question answering system.

views.py

import os, io
import errno
import urllib
import urllib.request
from time import sleep
from urllib.request import urlopen, Request
from django.shortcuts import render
from django.http import JsonResponse
from bs4 import BeautifulSoup
from googlesearch import search 
import pandas as pd
from cdqa.utils.filters import filter_paragraphs
from cdqa.utils.download import download_model, download_bnpp_data
from cdqa.pipeline.cdqa_sklearn import QAPipeline
from cdqa.utils.converters import pdf_converter


def search_view(request):
    if request.POST:
        question = request.POST.get('question')
        # uncomment the line below to download models 
        # download_model(model='bert-squad_1.1', dir='./models')
        for idx, url in enumerate(search(question, tld="com", num=10, stop=3, pause=2)): 
            crawl_result(url, idx) 
        # change path to pdfs folder
        df = pdf_converter(directory_path='/path/to/pdfs/')
        cdqa_pipeline = QAPipeline(reader='models/bert_qa.joblib')
        cdqa_pipeline.fit_retriever(df)
        prediction = cdqa_pipeline.predict(question)
        data = {
        'answer': prediction[0]
        }
        return JsonResponse(data)
    return render(request, 'search.html')
   

def crawl_result(url, idx):
    try:
        req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
        html = urlopen(req).read()
        bs = BeautifulSoup(html, 'html.parser')
        # change path to pdfs folder
        filename = "/path/to/pdfs/" + str(idx) + ".pdf"
        if not os.path.exists(os.path.dirname(filename)):
            try:
                os.makedirs(os.path.dirname(filename))
            except OSError as exc: # Guard against race condition
                if exc.errno != errno.EEXIST:
                    raise
        with open(filename, 'w') as f:
            for line in bs.find_all('p')[:5]:
                f.write(line.text + '\n')
    except (urllib.error.HTTPError, AttributeError) as e:
        pass


Note that you must change filename path to your pdfs folder otherwise cdQA can’t create an input data frame. Remember that you have to download pre-trained models, simply uncomment the download function inside search_view(). Once models downloaded you can remove this line.

Final Result

Finally, you can run the project and type any question in the search box. It’ll take about 20 seconds to display the answer.

Question-answering system's result
Question-answering system’s result

The question was “What is programming?” and the answer was “designing and building an executable computer program to accomplish a specific computing result”. Sometimes the crawlers can’t find enough information so the answer will empty. Additionally, don’t forget to remove pdf files for new questions, if you wish you can add a new function to handle this task.

I know some of you going to tell me that Flask is better for this kind of applications but I prefer Django anyway.

You can find this project in GitHub link below:

GitHub – Question-Answering Application

Python enumerate() Explained and Visualized

Python is renowned for its collection of libraries, modules, and functions that it comes packaged with — and enumerate() is something many developers want to understand better. In this article, we’ll explore what enumerate() actually is, analyze its functionality, and highlight how you should use it to maximize its efficiency and performance.

Leave a Reply