Question
Hi there. I have a windows forms application that scrapes a website to retrieve some data. I would like to implement the same functionality as a windows service. The reason for this is to allow the program to run 24/7 without having a user signed in.
To that end, my current version of the program uses a web browser control (system.windows.forms.webbrowser) to navigate the pages, click the buttons, allow scripts to do their thing, etc. I cannot figure out a way to do the same without the web browser control, but the web browser control cannot be instantiated in a windows service (because there is no user interface in a web service).
Does anyone have any brilliant ideas on how to get around this?
Thank you very much!
Answers
Hi Andy,
There is a tool which could let you manipulate anything you want on the website. This agile HTML parser builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). More information, please check:
http://htmlagilitypack.codeplex.com/
Have a nice day.
Best regards
All replies
You are not telling if you are using a .NET Express edition or not
You are not telling which Framework
You are not realy saying what data you are getting from the web site.
So
I made an example of service that work on any Studio edition (including the Express)
to install it, I supposed that you have at least the Framework2, so you will use something similar to:
%SystemRoot%\Microsoft.NET\Framework\v2.0.50727\installutil /i C:\Test\MyWindowService\MyWindowService\bin\Release\MyWindowService.exe
In the example, I supposed that you are downloading some file from the site
You will need a reference to Windows.Form for the timer
Imports System.ServiceProcess
Imports System.Configuration.Install
Public Class WindowsService : Inherits ServiceBase
Private Minute As Integer = 60000
Private WithEvents Timer As New Timer With {.Interval = 30 * Minute, .Enabled = True}
Public Sub New()
Me.ServiceName = "MyService"
Me.EventLog.Log = "Application"
Me.CanHandlePowerEvent = True
Me.CanHandleSessionChangeEvent = True
Me.CanPauseAndContinue = True
Me.CanShutdown = True
Me.CanStop = True
End Sub
Private Sub Timer_Tick(ByVal sender As Object, ByVal e As System.EventArgs) Handles Timer.Tick
If IO.File.Exists("C:\MyPath.Data") Then IO.File.Delete("C:\MyPath.Data")
My.Computer.Network.DownloadFile("http://MyURL.com", "C:\MyPath.Data", "MyUserName", "MyPassword")
'Do Something with the data downloaded
End Sub
End Class
<Microsoft.VisualBasic.HideModuleName()> _
Module MainModule
Public TheServiceName As String
Public Sub main()
Dim TheServiceApplication As New WindowsService
TheServiceName = TheServiceApplication.ServiceName
ServiceBase.Run(TheServiceApplication)
End Sub
End Module
<System.ComponentModel.RunInstaller(True)> _
Public Class WindowsServiceInstaller : Inherits Installer
Public Sub New()
Dim serviceProcessInstaller As ServiceProcessInstaller = New ServiceProcessInstaller()
Dim serviceInstaller As ServiceInstaller = New ServiceInstaller()
serviceProcessInstaller.Account = ServiceAccount.LocalSystem
serviceProcessInstaller.Username = Nothing
serviceProcessInstaller.Password = Nothing
serviceInstaller.DisplayName = "My Windows Service"
serviceInstaller.StartType = ServiceStartMode.Automatic
serviceInstaller.ServiceName = TheServiceName
Me.Installers.Add(serviceProcessInstaller)
Me.Installers.Add(serviceInstaller)
End Sub
End Class
Hello Andy,
Thanks for your post.
What do you want to scrape from the page? HttpWebRequest class ans WebClient class may be what you need. More information, please check:
The HttpWebRequest class provides support for the properties and methods defined in WebRequest and for additional properties and methods that enable the user to interact directly with servers using HTTP.
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx
The WebClient class provides common methods for sending data to or receiving data from any local, intranet, or Internet resource identified by a URI
http://msdn.microsoft.com/en-us/library/system.net.webclient.aspx
If you have any concenrs, please feel free to follow up.
Best regards
Hi Andy,
What about this problem on your side now? If you have any concerns, please feel free to follow up.
Have a nice day.
Best regards
Hi Andy,
When you come back, if you need further assistance about this issue, please feel free to let us know. We will continue to work with this issue.
Have a nice day.
Best regards
Thank you for the reply. Sorry it has taken me so long to respond. I did not receive any notification that someone had replied!
I am using Visual Studio 2010 Ultimate Edition and the .NET framework 4.0. Actually, I am upgrading some old code written in VB 6.0, but I can use the latest and greatest thats available.
The application uses a browser control to go to the page, fill in values, click on UI elements, read the HTML that returns, etc. The purpose of the application is to collection useful information regularily/automatically.
I know how to create a web service, but using the web control in such a service is problematic because the web browser control was meant to be placed on a windows form. I am not able to create a new instance of it in a project designated as a windows service.
Andy
Thank you for the reply. Sorry it has taken me so long to respond. I did not receive any notification that someone had replied!
I thought a web request was for web services (retrieving information from them). I am trying to retreive useful information from a website designed for interaction by a human, such as selecting items from lists and clicking buttons. I currently use a web browser control to programmatically do what a person would do and get the pages back which in turn get parsed.
Andy
Hi Andy,
There is a tool which could let you manipulate anything you want on the website. This agile HTML parser builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). More information, please check:
http://htmlagilitypack.codeplex.com/
Have a nice day.
Best regards
Thanks for the suggestion. I will go to that link and see if it will work. I will update this post with what I find.
I am writing to check the status of the issue on your side. Would you mind letting us know the result of the suggestions? If you have any concerns, please feel free to follow up.
Have a nice day.
Best regards
Hi Liliane
Thank for the follow up reply. I don't have an answer as of yet. Implementing this is going to take time and I haven't been given the go-ahead by my boss to spend the time to pursue it.
Hi Andy,
Never minde. You could have a try when you feel free. If you have any further questions about this issue, please feel free to let us know. We will continue to work with you on this issue.
Have a nice day.
Best regards
Source: https://social.msdn.microsoft.com/Forums/vstudio/en-US/f5d565b1-236b-43c2-90c7-f5cc3b2c341b/scraping-a-website-from-a-windows-service
Hi there. I have a windows forms application that scrapes a website to retrieve some data. I would like to implement the same functionality as a windows service. The reason for this is to allow the program to run 24/7 without having a user signed in.
To that end, my current version of the program uses a web browser control (system.windows.forms.webbrowser) to navigate the pages, click the buttons, allow scripts to do their thing, etc. I cannot figure out a way to do the same without the web browser control, but the web browser control cannot be instantiated in a windows service (because there is no user interface in a web service).
Does anyone have any brilliant ideas on how to get around this?
Thank you very much!
Answers
Hi Andy,
There is a tool which could let you manipulate anything you want on the website. This agile HTML parser builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). More information, please check:
http://htmlagilitypack.codeplex.com/
Have a nice day.
Best regards
All replies
You are not telling if you are using a .NET Express edition or not
You are not telling which Framework
You are not realy saying what data you are getting from the web site.
So
I made an example of service that work on any Studio edition (including the Express)
to install it, I supposed that you have at least the Framework2, so you will use something similar to:
%SystemRoot%\Microsoft.NET\Framework\v2.0.50727\installutil /i C:\Test\MyWindowService\MyWindowService\bin\Release\MyWindowService.exe
In the example, I supposed that you are downloading some file from the site
You will need a reference to Windows.Form for the timer
Imports System.ServiceProcess
Imports System.Configuration.Install
Public Class WindowsService : Inherits ServiceBase
Private Minute As Integer = 60000
Private WithEvents Timer As New Timer With {.Interval = 30 * Minute, .Enabled = True}
Public Sub New()
Me.ServiceName = "MyService"
Me.EventLog.Log = "Application"
Me.CanHandlePowerEvent = True
Me.CanHandleSessionChangeEvent = True
Me.CanPauseAndContinue = True
Me.CanShutdown = True
Me.CanStop = True
End Sub
Private Sub Timer_Tick(ByVal sender As Object, ByVal e As System.EventArgs) Handles Timer.Tick
If IO.File.Exists("C:\MyPath.Data") Then IO.File.Delete("C:\MyPath.Data")
My.Computer.Network.DownloadFile("http://MyURL.com", "C:\MyPath.Data", "MyUserName", "MyPassword")
'Do Something with the data downloaded
End Sub
End Class
<Microsoft.VisualBasic.HideModuleName()> _
Module MainModule
Public TheServiceName As String
Public Sub main()
Dim TheServiceApplication As New WindowsService
TheServiceName = TheServiceApplication.ServiceName
ServiceBase.Run(TheServiceApplication)
End Sub
End Module
<System.ComponentModel.RunInstaller(True)> _
Public Class WindowsServiceInstaller : Inherits Installer
Public Sub New()
Dim serviceProcessInstaller As ServiceProcessInstaller = New ServiceProcessInstaller()
Dim serviceInstaller As ServiceInstaller = New ServiceInstaller()
serviceProcessInstaller.Account = ServiceAccount.LocalSystem
serviceProcessInstaller.Username = Nothing
serviceProcessInstaller.Password = Nothing
serviceInstaller.DisplayName = "My Windows Service"
serviceInstaller.StartType = ServiceStartMode.Automatic
serviceInstaller.ServiceName = TheServiceName
Me.Installers.Add(serviceProcessInstaller)
Me.Installers.Add(serviceInstaller)
End Sub
End Class
Hello Andy,
Thanks for your post.
What do you want to scrape from the page? HttpWebRequest class ans WebClient class may be what you need. More information, please check:
The HttpWebRequest class provides support for the properties and methods defined in WebRequest and for additional properties and methods that enable the user to interact directly with servers using HTTP.
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.aspx
The WebClient class provides common methods for sending data to or receiving data from any local, intranet, or Internet resource identified by a URI
http://msdn.microsoft.com/en-us/library/system.net.webclient.aspx
If you have any concenrs, please feel free to follow up.
Best regards
Hi Andy,
What about this problem on your side now? If you have any concerns, please feel free to follow up.
Have a nice day.
Best regards
Hi Andy,
When you come back, if you need further assistance about this issue, please feel free to let us know. We will continue to work with this issue.
Have a nice day.
Best regards
Thank you for the reply. Sorry it has taken me so long to respond. I did not receive any notification that someone had replied!
I am using Visual Studio 2010 Ultimate Edition and the .NET framework 4.0. Actually, I am upgrading some old code written in VB 6.0, but I can use the latest and greatest thats available.
The application uses a browser control to go to the page, fill in values, click on UI elements, read the HTML that returns, etc. The purpose of the application is to collection useful information regularily/automatically.
I know how to create a web service, but using the web control in such a service is problematic because the web browser control was meant to be placed on a windows form. I am not able to create a new instance of it in a project designated as a windows service.
Andy
Thank you for the reply. Sorry it has taken me so long to respond. I did not receive any notification that someone had replied!
I thought a web request was for web services (retrieving information from them). I am trying to retreive useful information from a website designed for interaction by a human, such as selecting items from lists and clicking buttons. I currently use a web browser control to programmatically do what a person would do and get the pages back which in turn get parsed.
Andy
Hi Andy,
There is a tool which could let you manipulate anything you want on the website. This agile HTML parser builds a read/write DOM and supports plain XPATH or XSLT. It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams). More information, please check:
http://htmlagilitypack.codeplex.com/
Have a nice day.
Best regards
Thanks for the suggestion. I will go to that link and see if it will work. I will update this post with what I find.
I am writing to check the status of the issue on your side. Would you mind letting us know the result of the suggestions? If you have any concerns, please feel free to follow up.
Have a nice day.
Best regards
Hi Liliane
Thank for the follow up reply. I don't have an answer as of yet. Implementing this is going to take time and I haven't been given the go-ahead by my boss to spend the time to pursue it.
Hi Andy,
Never minde. You could have a try when you feel free. If you have any further questions about this issue, please feel free to let us know. We will continue to work with you on this issue.
Have a nice day.
Best regards
Source: https://social.msdn.microsoft.com/Forums/vstudio/en-US/f5d565b1-236b-43c2-90c7-f5cc3b2c341b/scraping-a-website-from-a-windows-service