Select the language you need from the flag button at the top of the blog.
All content on this blog can be translated using the translation function.
The original of this article is here.
Hello.
I am also doing various things while studying crawling for the first time.
This time, I will apply it to Instagram, one of the most popular social media.
By using a little bit of human psychology to give interest
I want to do something like this:
1. Log in to Instagram
2. Get followers and following list
3. If you follow only me but not the other party, pull target
how is it? Don't you want to try it. haha
There are people who sometimes quit following. Burr
Now, let’s do it step by step.
First, import the necessary modules.
import time
import sys
from selenium import webdriver
from bs4 import BeautifulSoup
The webdriver module of the selenium package runs a web browser and allows you to perform actions according to script commands.
The BeautifulSoup module in the bs4 package is used for HTML DOM data.
It has a function that makes it easy to extract the desired content.
If selenium and bs4 packages do not exist, install them by entering each of the following in the command window.
pip install bs4
pip install selenium
And you can download the chrome webdriver from the link below.
https://sites.google.com/a/chromium.org/chromedriver/downloads
The rest are basic modules, so you can import and use them right away.
Now I will open up a Chrome browser and try to log in by accessing the Instagram address.
Let's get input directly from the command window.
For example, if you enter the following in the command window
python crawling_instagram.py sangminem 123456
sys.argv[0] is crawling_instagram.py
sys.argv[1] becomes sangminem and sys.argv[2] becomes 123456
Let's use this to write the code as below.
browser = webdriver.Chrome('./chromedriver')
browser.get('https://www.instagram.com/'+sys.argv[1])
browser.execute_script("document.querySelectorAll('.-nal3')[1].click();")
time.sleep(2)
browser.find_element_by_name('username').send_keys(sys.argv[1])
browser.find_element_by_name('password').send_keys(sys.argv[2])
browser.find_element_by_xpath('//*[@id="loginForm"]/div[1]/div[3]/button').submit()
time.sleep(5)
browser.find_element_by_xpath('//*[@id="react-root"]/section/main/div/div/div/div/button').click()
Using chromedriver, I opened up a Chrome web browser and connected the Instagram address and username.
Using browser.execute_script, I clicked the follower button to open the login window.
(If you click it when you are not logged in, the login window appears.)
After that, I waited 2 seconds in case the loading would take longer.
Next, find the username and password part of the input type
The send_keys method was used to enter the username and password.
Then, we searched the login button in the form with xpath and called the submit method.
xpath is displayed in the Elements tab of the developer by pressing the F12 key in Chrome.
Right-click on the element you want to go to and click
You can get it by choosing Copy> Copy xpath.
Wait 5 seconds for the next login.
This is the code written to click the Do button later using xpath again.
This part is for skipping purposes only, so a detailed explanation will be omitted.
Now that the login is complete, let's implement the logic to get the number of followers.
time.sleep(2)
browser.execute_script("document.querySelectorAll('.-nal3')[1].click();")
time.sleep(1)
oldHeight = -1
newHeight = -2
while oldHeight != newHeight:
oldHeight = newHeight
newHeight = browser.execute_script("return document.querySelectorAll('.jSC57')[0].scrollHeight")
browser.execute_script("document.querySelectorAll('.isgrP')[0].scrollTo(0,document.querySelectorAll('.jSC57')[0].scrollHeight)")
time.sleep(0.5)
soup = BeautifulSoup(browser.page_source, 'html.parser')
followers = soup.findAll('a',['FPmhX','notranslate','_0imsa'])
followers_text = []
for follower in followers:
followers_text.append(follower.get_text())
print("follwers count: " + str(len(followers_text)))
After waiting for another 2 seconds to prevent malfunction
I clicked the Followers Barton again.
And wait 1 second
This is the part to get the follower user name in earnest.
First, you have to load all followers, so in order to load everyone
I implemented the logic to repeatedly scroll down through the while statement.
If the previous scroll height and the new scroll height are different, it means that there is more to load.
This is a syntax that keeps repeating until the old and new scroll heights are equal.
The class name, which is the argument value of the querySelectorAll method, is the values viewed and retrieved directly from the Elements tab in developer mode.
After loading is finished, html data is fetched through BeautifulSoup module
Checks the tag and class with the user name, extracts them all, and puts them in an array.
The array length was calculated using the print method and the number of followers was printed on the screen.
Next, let's find the number of following.
browser.find_element_by_xpath('/html/body/div[4]/div/div/div[1]/div/div[2]/button').click()
time.sleep(0.5)
browser.execute_script("document.querySelectorAll('.-nal3')[2].click();")
time.sleep(1)
oldHeight = -1
newHeight = -2
while oldHeight != newHeight:
oldHeight = newHeight
newHeight = browser.execute_script("return document.querySelectorAll('.jSC57')[0].scrollHeight")
browser.execute_script("document.querySelectorAll('.isgrP')[0].scrollTo(0,document.querySelectorAll('.jSC57')[0].scrollHeight)")
time.sleep(0.5)
soup = BeautifulSoup(browser.page_source, 'html.parser')
followings = soup.findAll('a',['FPmhX','notranslate','_0imsa'])
followings_text = []
for following in followings:
followings_text.append(following.get_text())
print("follwings count: " + str(len(followings_text)))
I clicked the close button using xpath to close the follower window.
Then wait 0.5 seconds
Clicked the Following button.
And wait 1 second again
I got the username that I'm following.
It’s very similar to the pattern of getting a follower username,
I won't go over this again.
Finally, we’ll compare the list of followers and following
Let's get a target who didn't have a right arm.
result = []
for following in followings_text:
cnt = 0
for follower in followers_text:
if following == follower:
cnt += 1
break
if cnt == 0:
result.append(following)
print('People who didn''t follow each other on Instagram: '+str(result))
By matching against the list of all follower usernames, one by one,
If so, counting and exiting logic was repeatedly performed.
If the counting is 0, I follow it, but it’s not on the follower list.
This means it's not a counterattack and adds it to the resulting array.
Finally, outputting the resulting array achieves the desired purpose.
Just like this.
I followed, but you didn't? TT
For reference, I don't have many people who don't match because I only follow acquaintances.
Simply achieved the purpose.
I'll see you next time with another topic. :)
'For foreigners > English' 카테고리의 다른 글
Follow Python text crawling (4) | 2020.11.20 |
---|---|
Tetris game development #1-JavaScript, HTML, CSS, responsive web (1) | 2020.11.15 |
[Beginner Coding Lesson 1] What to start with (1) | 2020.11.15 |
댓글