【发布时间】:2016-04-05 19:21:26
【问题描述】:
我有一个 CSV 格式的名称列表,我准备让谷歌使用 java 搜索这些名称。但我面临的问题是,当我最初运行代码时,我能够搜索查询,但在代码中间,代码开始抛出 503 异常,当我再次运行代码时,它开始从一开始。这是我正在使用的代码。
public class ExtractInformation
{
static String firstname,middlename,lastname;
public static final int PAGE_NUMBERS = 10;
public static void readCSV()
{
boolean first = true;
try
{
String splitBy = ",";
BufferedReader br = new BufferedReader(new FileReader("E:\\KOLDump\\names.csv"));
String line = null;
String site = null;
while((line=br.readLine())!=null)
{
if(first)
{
first = false;
continue;
}
String[] b = line.split(splitBy);
firstname = b[0];
middlename = b[1];
lastname = b[2];
String name = null;
if(middlename == null || middlename.length() == 0)
{
name = firstname+" "+lastname+" OR "+lastname+" "+firstname.charAt(0);
}
else
{
name = firstname+" "+lastname+" OR "+lastname+" "+firstname.charAt(0)+" OR "+firstname+" "+middlename.charAt(0)+". "+lastname;
}
BufferedReader brs = new BufferedReader(new FileReader("E:\\KOLDump\\site.csv"));
while((site = brs.readLine()) != null)
{
if(first)
{
first = false;
continue;
}
String [] s = site.split(splitBy);
String siteName = s[0];
siteName = (siteName.replace("www.", ""));
siteName = (siteName.replace("http://", ""));
getDataFromGoogle(name.trim(), siteName.trim());
}
brs.close();
}
//br.close();
}
catch(Exception e)
{
System.out.println("unable to read file...some problem in the csv");
}
}
public static void main(String[] args)
{
readCSV();
}
private static void getDataFromGoogle(String query,String siteName)
{
Set<String> result = new HashSet<String>();
String request = "http://www.google.co.in/search?q="+query+" "+siteName;
try
{
Document doc = Jsoup.connect(request).userAgent("Chrome").timeout(10000).get();
Element query_results = doc.getElementById("ires");
Elements gees = query_results.getElementsByClass("g");
for(Element gee : gees)
{
Element h3 = gee.getElementsByTag("h3").get(0);
String annotation = h3.getElementsByTag("a").get(0).attr("href");
if(annotation.split("q=",2)[1].contains(siteName))
{
System.out.println(annotation.split("q=",2)[1]);
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
任何有关如何从代码中删除此异常的建议都会很有帮助。
【问题讨论】:
-
Google 可能会限制您的访问,因为您在通常用于处理浏览器请求的 URL 上提交请求的速度过快。