Posts

Showing posts from February, 2020

Analyzing data scientist growth rate from Naukri website using web scraping

Image
Presenting by Madhan Kumar Selvaraj What you will get from this blog Basics of extracting the data from the web page by using the web scrapy technology in Python Integrating Python with the MySQL database by loading, fetching and manipulation part in a few lines of code (It seems good right) PySpark is a big data technology for cluster computing framework to work with billion of data by in-memory processing  Basic concepts of Pandas, seaborn and other visualization part. Basic flow chart of the project In this blog, we are going to know the current trend of the data scientist role by using the following technology Python web scrapy MySql PySpark  Pandas Seaborn This blog is not for those Who is new to the programming language Who is not having any idea about the web scrapping Who are not familiar with the big data technologies Don't get worried I'll refer a few links to get familiar with the above concepts Prerequisite Python 3.X version R