Whatsapp chat book

 Presenting by Madhan Kumar Selvaraj

More than 2 billion people are using the Whats-app application as per the report. It replaced all of our chatting applications and most of our formal, as well as informal chats, are happening through the Whats-app. Those chats are between our family, friends, spouse, our loved ones. I got an idea of printing the chat in the form of a book or hard copy kind of thing to make our bond as a lifetime memory.

Real-world issue

  • One of the easiest ways to convert the chat into PDF format is by taking the long screenshot of our Whats-app chat in the application and converting it into PDF. The problem is that Whats-app deletes messages frequently in the application and we can't able to change the font type, size, color. Quality of taking the long screenshot is not that efficient to convert into the printable format and most of the applications are proprietary.
  • Some of the websites like Zapptales are providing options to convert the exported Whats-app chat into a printable format. The problem is that it is a proprietary one and we need to share our private chat with them.
For that purpose, I thought to create our own application as per our taste and we need not worry about our privacy when we run it in our local machine.

The workflow of the project

  1. Export the Whatsapp chat from the Whatsapp mobile application
  2. Do the text processing for the exported chat
  3. Create a landing page for the website
  4. Create a conversation page to load the chat to display the conversation on the page
  5. Option to convert the HTML page content to PDF
  6. Create a CSV file to create statistics graph by using Plotly

Technologies used here

  • Python
  • Flask web framework
  • Jinja 2 template
  • Dash framework
  • Weasyprint

Whatsapp book

In this application, we are going to create three features by using the exported Whatsapp chat. Check the link if you don't know how to export the chat.
  1. View chat conversation 
  2. Option to download the chat
  3. Statistics based on chat activity

Text processing

After exporting the chat from the Whatsapp we need to do the process the text to separate the date, time, user name, chat, attachments. 
date_pattern = '\d\d\/\d\d\/\d\d\d\d'
time_pattern = '\d\d\:\d\d'
name_pattern = '- \w.*\S\)?: '
name_addon_pattern = "\w.*\w"
chat_pattern = ': .*'
attachment_pattern = "IMG\S*"
group_name_pattern = "with .*\.txt"

View chat conversation 

Input from the flask framework is read by using the io.StringIO and lot of filtration are done to extract the useful content from the exported document. 
import io
import re
import csv
import configuration
def date_time(string, pattern):
    try:
        return re.findall(pattern, string)[0]
    except:
        return ""   
def chat_generator(WAfile, filename):
    chat_list = []  
    unique_names = set()
    name_dict = {}
    check_date = None
    file = WAfile.decode("utf-8") 
    text_convert = io.StringIO(file)
    content = text_convert.readlines()
    if filename == "statistics":
        with open(configuration.csv_file_path, 'w+', encoding='utf-8', newline='') as file:
                writer = csv.writer(file)
                writer.writerow(["Name", "Date", "Time", "Chat"])
    for each_chat in content:
        try:
            name_result = re.findall(configuration.name_addon_pattern, re.findall(configuration.name_pattern, each_chat)[0])[0]
            unique_names.add(name_result)            
        except:
            pass
    len_unique_names = len(unique_names)     
    for index, name in enumerate(unique_names):
        name_dict[name] = index        
    for each_chat in content:    
        date_result = date_time(each_chat, configuration.date_pattern)
        time_result = date_time(each_chat, configuration.time_pattern)        
        try:
            chat_result = re.findall(configuration.chat_pattern, each_chat)[0].replace(": ", "")
        except:
            chat_result = ""           
        try:
            name_result = re.findall(configuration.name_addon_pattern, re.findall(configuration.name_pattern, each_chat)[0])[0]                        
        except:
            name_result = "" 
        name_result_copy = name_result
        try:
            attachment_result = re.findall(configuration.attachment_pattern, each_chat)[0]
            chat_result = " "
        except:
            attachment_result = ""    
        if(filename != "statistics"):
            if (date_result != check_date):
                check_date = date_result
            else:
                date_result = ""        
        try:
            name_result = name_dict[name_result]
        except:
            name_result = ""            
        if (chat_result != "") & (filename != "statistics") :
            list_data = [date_result, time_result, name_result, chat_result, attachment_result, name_result_copy]
            chat_list.append(list_data) 
        elif(chat_result != "") & (filename == "statistics"):
            list_data = [name_result_copy, date_result, time_result, chat_result]
            with open(configuration.csv_file_path, 'a+', encoding='utf-8', newline='') as file:
                writer = csv.writer(file)
                writer.writerow(list_data)         
        date_result = time_result = name_result = chat_result = attachment_result = chat_name_id = chat_name = ""
        if filename != "statistics":
            if len_unique_names==2:
                for chat_name in unique_names:
                    chat_name_id = name_dict[chat_name]
            else:
                chat_name = re.findall(configuration.group_name_pattern, filename)[0].replace("with ","").replace(".txt","")
    
    if filename != "statistics":
        return chat_name, chat_name_id, chat_list, len_unique_names
    elif filename == "statistics":
        return "success"

One-to-One chat

The profile picture for the user is selected automatically where I added default images under static/css. Anyone change the image according to their wish. Here we need to create a separate landing page, chat view page, statistics page. I am not going to add all the coding parts here and you can check all the script in the GitHub.

Group chat

Similar to the One-to-One chat conversation, I added provision for a maximum of 10 users profile pictures for the group chat. We can able to update the profile pictures for more than 10 users.
import flask
from flask import Flask, render_template, request, url_for
from flask_weasyprint import HTML, render_pdf
import dash
import dash_html_components as html
import dash_core_components as dcc
import pandas as pd
import plotly.express as px
import configuration
import main
server = Flask(__name__)
server.debug = True
@server.route('/', methods=['GET', 'POST'])
def landing_page():
    if request.method == 'POST':
        if ('file' in request.files):
            file = request.files['file']  
            filename = request.files['file'].filename 
            choice = request.form["optradio"]
            if choice=="view":
                name, chat_name_id, chat_data, no_of_users = main.chat_generator(file.read(), filename)
                return render_template('chat_box.html', chat_name_id = chat_name_id, no_of_users = no_of_users, name = name, chat = chat_data, color_code = configuration.color_code_dict)
            elif choice == "download":
                name, chat_name_id, chat_data, no_of_users = main.chat_generator(file.read(), filename)
                html = render_template('chat_box.html', chat_name_id = chat_name_id, no_of_users = no_of_users, name = name, chat = chat_data, color_code = configuration.color_code_dict)
                return render_pdf(HTML(string=html))
            elif choice == "statistics":
                print(main.chat_generator(file.read(), choice))
                return flask.redirect('/statistics')
    else:
         return render_template('landing_page.html')
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']
colors = {
    'background': '#111111',
    'text': '#7FDBFF'
}
app = dash.Dash(
    __name__,
    server=server, external_stylesheets=external_stylesheets,
    routes_pathname_prefix='/statistics/'
)
wa_df = pd.read_csv(configuration.csv_file_path)
wa_df['Time'] =  pd.to_datetime(wa_df['Time'], format='%H:%M')
#wa_df['Date'] =  pd.to_datetime(wa_df['Date'], format='%d/%m/%Y')
date_df = pd.DataFrame(wa_df["Date"].value_counts())
date_df = date_df.rename(columns={"Date":"Number of chats"})
date_df['Date']= date_df.index
name_df = pd.DataFrame(wa_df["Name"].value_counts())
name_df = pd.DataFrame(wa_df["Name"].value_counts())
name_df = name_df.rename(columns={"Name":"Number of chats"})
name_df['Name']= name_df.index
fig = px.bar(date_df, x="Date", y="Number of chats", text='Number of chats', title='Number of chats based on date')
fig1 = px.pie(name_df, names='Name', values='Number of chats', title='Number of chats by person')
fig2 = px.line(wa_df, x='Date', y='Time', title='Chat timings')
fig.update_layout(
    plot_bgcolor=colors['background'],
    paper_bgcolor=colors['background'],
    font_color=colors['text']
)
fig1.update_layout(
    plot_bgcolor=colors['background'],
    paper_bgcolor=colors['background'],
    font_color=colors['text']
)
fig2.update_layout(
    plot_bgcolor=colors['background'],
    paper_bgcolor=colors['background'],
    font_color=colors['text']
)

app.layout = html.Div(style={'backgroundColor': colors['background']}, children=[
    html.H1(
        children='Whatsapp chat statistics',
        style={
            'textAlign': 'center',
            'color': colors['text']
        }
    ),
    dcc.Graph(
        id='Whatsapp',
        figure=fig
    ),
             dcc.Graph(
        id='Whatsapp1',
        figure=fig1
    ),
        dcc.Graph(
        id='Whatsapp2',
        figure=fig2
    )
])  
if __name__ == '__main__':
    server.run(debug=True)

Option to download the chat

Now we are going to download the chat by converting the HTML into the PDF. For that, we are going to use Weasyprint. Check the installation guide to install Weasyprint and its dependencies like GTK+. Weasyprint converts HTML into the PDF format and we can able to download as well as print the chat.

Statistics based on chat activity

CSV file is created while doing the chat text processing. We are creating the statistics graph by using the Dask framework. Few important features are missing in this frame because it is under the developing stage. There is one main limitation for the statistics option and we'll discuss it later part of this blog.

Heroku deployment

I deploy this application in the Heroku cloud without the download chat feature. The reason is that we can't install weasyprint and its dependency on the Heroku. Play with the below live application.

Limitations of this application

  • The Dash framework is under the developing stage. So there is no option to load the new data for the statistics. It'll use the same CSV file every time after deployment and it shows statistics for the same chat. We can also use Bokeh instead of the Dash framework.
  • You need to keep the attachment under the static/attachments folder to view the attachments. Or else you can update the script by getting multiple documents in the form input.

Ready-made code

Code for the local machine deployment
Code for the Heroku deployment

Finally, we created our own Whatsapp book application. Still, there is a lot of improvement needed in this application. Make this project as a reference and update it as per your taste. Happy coding!

P.S - Whatever the things that we are learning today, won't get waste for any cost. Load your brain every day and fire your skills during the right moment. 

Comments

  1. I was surfing net and fortunately came across this site and found very interesting stuff here. Its really fun to read. ai chatbot online

    ReplyDelete

Post a Comment

Popular posts from this blog

Artificial Intelligent Chatbot

Detecting stranger through CCTV camera and alerting the owner

Extracting text from the image and translation using Tesseract and Yandex API