In this article, we are looking at how to use Browse AI to Extract Structured Data. We assume that you already have signed up to Browse AI. If you haven’t, you can sign up today. Also, make sure to install the Chrome Browse AI Extension.
Before we start this blog, you might want to identify which sites the Browse AI Robot can crawl through.
Browse AI to Extract Structured Data
- From your Browse AI Dashboard, open the feature Build New Robots.
- From there, select the “Extract Structured Data” feature.
- We will try to extract company details from the YCombinator website for this blog.
- Copy the URL (https://www.ycombinator.com/companies) and paste it inside the Origin URL field.
- Then hit the “Start Recording Task” button.
- If you have already installed the Browse AI extension and toggled on the “Chrome extension to record your actions while recording tasks”, there is one more option you need to enable “Allow Recording in the Incognito Mode.”
- Now, Browse AI will trigger the Extraction window as soon as you do it.
- Click on the Browse AI Robot and select the “Extract List” option.
- In the second window, highlight all the necessary information you want to extract.
- In the next window, put a label for all the extracted data. After you hit enter, Browse AI will generate a demo extraction. From that window, select the maximum number of rows you want to extract, Name the table, and select the Pagination type.
- Now, hit the “Capture List” box and the “Finish Recording” from the next window.
Once the automation is complete, you can download the CSV file. The file will contain all the highlighted text under their assigned labels.
Our Failed Attempt
Before we start extracting, you should sign-up to Browse AI, and install the Browser extension. Using Chrome as your default web browser, you can now download the Browse AI Chrome Extension.
We Tried Extracting UpWork Job Information using the Browse AI Structured Data Feature.
We had failed to extract Job Listings from UpWrok using the Browse AI Structured Data feature. Ultimately, we failed. We will try to narrate the entire workflow step-by-step.
Firstly, there is one crucial setback that we ran into; UpWork needs people to have an account to log in to its platform and browse. Can Browse AI find a workaround for this obstacle?
We had no other option but to try for ourselves. Luckily, when you visit the Browse-AI page for extracting Structured data, you will see a small checkbox titled “This Website needs logging in.”
Before starting the extraction, we used UpWork filters to narrow the search results.
We have gone on UpWork and searched for “AI Projects.” We then copied the URL and pasted the link into the URL field in Browser AI.
As soon as we pasted the link inside, two more options appeared beside “This website needs logging in”.
As this was our first attempt where we tried to extract data from Upwork, we were not comfortable sharing our UpWork login credentials. So we used the middle option, allowing Browser AI to use the Session Cookies to gain access to UpWork.
In the next window, we saw two checkboxes. As we had already installed the Browse AI, this checkbox was already green. However, the second checkbox showed us what would the Chrome Extension do.
As you can read from the text, it says, ” Allow Chrome extension to record your actions while recording tasks”. Browse AI will record my screen, which it will later use to train its AI to automate the task. Makes sense?
In other words, After I grant permission, Browser AI Extension will record all the activities I do on my device and then replicate it to automate the task.
I used the Text Extraction Robot. I only highlighted the texts I need extracting on only one Job Posting. After saving this task, I thought the Browse AI Robot will automate this process for the rest of the job postings.
So completed the task automation and downloaded the CSV to find out that Browse AI had only captured text from the highlighted job posting.
But by this time, we had realized this was not the way to proceed.
Read More: Browse AI Extract UpWork Job Postings
This article has looked at how to use Browse AI to Extract Structured Data from YCombinator. Please leave a comment below if you have any confusion.