1. crawling
ㄱ. 한빗 사이트 크롤링
사용모듈 2가지
axios
axios.get(url)
cheerio
cheerio.load(response.data)
const axios = require("axios");
const cheerio = require("cheerio");
const url = "https://www.hanbit.co.kr/academy/books/new_book_list.html";
axios.get(url)
.then(response=>{
const $ = cheerio.load(response.data);
$(".view_box").each((index,element)=>{
let title = $(element).find(".book_tit").text().trim();
let author = $(element).find(".book_writer").text().trim();
author = author.split(",").map(x=>x.trim()).join(",");
console.log(index+1,"====================");
console.log(title);
console.log(author);
});
})
.catch(err=>{
console.log(err);
});
출력형태
1 ====================
글로벌 시장환경과 국제경영
이장로,신만수,김창수
2 ====================
STEM CookBook, 미래 세상의 모빌리티
임덕신,임현준
ㄴ. 인터파크 크롤링
const axios = require('axios');
const cheerio = require('cheerio');
const iconv = require('iconv-lite');
url = 'http://book.interpark.com/display/collectlist.do?_method=BestsellerHourNew201605&bestTp=1&dispNo=028';
axios.get(url, {responseType: 'arraybuffer'})
.then(res => {
let contentType = res.headers['content-type'];
console.log(contentType);
let charset = contentType.includes('charset=') ? contentType.split('charset=')[1] : 'utf-8';
let data = iconv.decode(res.data, charset);
const $ = cheerio.load(data);
$('.rankBestContentList > ol > li').each((index, element) => {
let title = $(element).find('.itemName').text().trim();
let author = $(element).find('.author').text().trim();
author = author.split(',').map(x => x.trim()).join(', ');
let company = $(element).find('.company').text().trim();
let price = $(element).find('.price > em').text().trim();
console.log(index+1, '============================================');
console.log(`제목:\t\t${title}`);
console.log(`저/역자:\t${author}`);
console.log(`출판사:\t\t${company}`);
console.log(`가격:\t\t${price}원`);
});
})
.catch(err => {
console.log(err);
});