【发布时间】:2012-05-20 06:47:48
【问题描述】:
我在使用 StreamWriter 为我拥有的当前项目编写爬虫代码时遇到问题。我编写的循环如下
我已经调试了所有进入循环的变量,并且一切都设置好了。当我根据 url 中的 ID GET 变量传入 url 和要搜索的范围时,它无法写入第二个 sourceCode 字符串
有人能告诉我我是不是在冲洗什么东西还是这里有其他东西在工作吗??
我绞尽脑汁试图找到根本原因,但事实证明它非常顽固
using System;
using System.IO;
using System.Windows.Forms;
namespace Scraper
{
public partial class Form1 : Form
{
Scraper scraper = new Scraper();
private StreamWriter sw;
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
string url = textBox1.Text;
string[] urlBits = url.Split('.');
string[] domain = urlBits[2].Split('/');
string filepath = @"C:\Users\Herbaldinho\Desktop\"+urlBits[1]+"-"+domain[0];
string parentPath = @"C:\Users\Herbaldinho\Desktop\";
string newPath = Path.Combine(parentPath, filepath);
if (File.Exists(filepath))
{}
else
{
Directory.CreateDirectory(newPath);
}
DateTime today = DateTime.Today;
string curDate = String.Format("{0:ddd-MMM-dd-yyyy}", today);
string subPath = newPath + "\\" + curDate;
string newSubPath = Path.Combine(newPath, subPath);
if (File.Exists(subPath))
{ }
else
{
Directory.CreateDirectory(newSubPath);
}
string lower = textBox2.Text;
int lowerValue;
int.TryParse(lower, out lowerValue);
string upper = textBox3.Text;
int upperValue;
int.TryParse(upper, out upperValue);
int i;
for (i = lowerValue; i < upperValue; i++)
{
string filename = newSubPath+"\\Advert-"+i+".html";
string adPage = url + i;
bool write = scraper.UrlExists(adPage);
if (write)
{
string sourceCode = scraper.getSourceCode(adPage);
using (sw = new StreamWriter(filename))
{
sw.Write(sourceCode);
}
}
}
MessageBox.Show("Scrape Complete");
}
}
}
####This is the Scraper Object
using System.Net;
namespace Scraper
{
class Scraper
{
WebClient w = new WebClient();
public bool UrlExists(string url)
{
try
{
HttpWebRequest request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "HEAD";
HttpWebResponse response = request.GetResponse() as HttpWebResponse;
return (response.StatusCode == HttpStatusCode.OK);
}
catch
{
return false;
}
}
public string getSourceCode(string url)
{
string s = w.DownloadString(url);
return s;
}
}
}
【问题讨论】:
-
您遇到什么错误?此外,您不需要使用 using 块调用 Close 和 Dispose。
-
我 if(write) 在第二遍返回 true ?似乎 {0}\Advert-{1}.html 网址可能不存在。
-
你遇到的问题与StreamWriter无关;删除所有您没有提供完整代码的代码(如 scaper、url),并只制作 sourceCode 随机文本; StreamWriter 工作正常。
-
提供你得到的错误,或者提供一个完整的例子来显示问题的重现。
-
根本没有产生错误,这就是问题所在。在第二次迭代中满足 if(write) 条件并检索 sourceCode 但未能将其写入其指定文件
标签: c# html-parsing streamwriter