web scrapping in express JS

·

4 min read

Are you looking for the amazing web scrapping code in node JS by which you can impress someone ?

If your answer is yes then follow these steps. how we can create web scrapping application in express js which is node js based framework

  1. download node js from this link nodejs.org/en/download and then install.
  2. after successful installation node js in your system open command prompt and type node then hit enter. your command prompt will look like below image nodejs.png
  3. install express js globally using npm command like 'npm install -g express-generator'. this command will install express js globally in your system. Here globally installation means you can execute express command from anywhere in your system using command prompt
  4. open command prompt and type command 'express yourproject name'. express will generate default application structure in your directory like below image:

expressjs-app-directory-structure.png

here in public directory you will include your css, javaScript and images files. Routers are the controllers in which you will write your business logic. Views in which you will add your html files. Your express application will be run by default on port 3000. in case you want to change your port you will have to change www file exist in bin folder. here you can see i changed port to 8888 which is by default 3000.

express-config-file.png

next step is to install request module in your application like 'npm install --save request' and 'npm install --save cheerio'

request module is use for http/https calls and cheerio module we can use like we can use jquery in our html's files. in the same way we can use cheerio module in our controllers. in the image you can see crawlurl

scrapping-logic.png

now in our attached image we can see i am importing cheerio and request module to use in our controllers. as you all know router.get () is a method provided by express js to create our route like get/post/put/delete. request module we are using for http call 'iban.com/exchange-rates' and you can see that we are using cheerio for DOM actions for scrap url html. now open command prompt in your application directory and write command npm start. in your browser type localhost:3000. i am using 3000 because in www file i am using default port provided by express js and you will see the output like this in your command prompt.

scrapping-output.png