Tuesday, February 26, 2019

Web scraping through WebBrowser (.Net WinForm Control)



In this post, I am going to explain how it is easy to extract and change the the webpage content through WebBrowser control. The WebBrowser is a Microsoft windows form control and easy to use. This post will explain you how you can use this control for your need.

The WebBrowser control is used to display the webpage in your application. The WebBrowser class is a powerful class that give you leverage to manipulate the html code, interact with JavaScript, automate the web scrapping and many more.

You can find more about the WebBrowser from MSDN library. The following steps will guide you how to use the WebBrowser.

1.       The first thing that you should do is to create a windows application project.

2.       Add a WebBrowser and two button controls on web form.  
3.       Set the “Url” property of the control or you can do this programmatically using the below code snippet.
webBrowser.Navigate("https://www.google.com");
4.       When you run the application, you will see the screen look as per below screen print

5.       If you want to set anything to webpage control use the below code and even you can fire the control events as well. Here, I am going to set the value to the search box (for this example I am going to set my name “Mohd Azharuddin Ansari” to search box) and then I will fire “Click” event of search button programmatically. If everything will go as per plan then google will present me the result based on the search criteria.
Code (This code will go on first button click event)
                webBrowser.Document.GetElementById("q").SetAttribute("value", "Mohd Azharuddin Ansari");

            HtmlElement button = webBrowser.Document.GetElementById("btnK");

            button.InvokeMember("click");
Result

6.       Now if you need to extract this results on somewhere your code then you can do it using the below code
Code (This code will go on second button click event)
string searchResultText = "";
            HtmlElementCollection searchResult = webBrowser.Document.GetElementsByTagName("h3");

            foreach(HtmlElement he in searchResult)
            {
                searchResultText += he.InnerText + System.Environment.NewLine;
            }

            MessageBox.Show(searchResultText);
Result



No comments: