【问题标题】:Android, Parsing XML, how to ignor HTML tags?Android,解析XML,如何忽略HTML标签?
【发布时间】:2012-03-12 09:59:05
【问题描述】:

在我的项目中,我需要解析 XML。 XML 中的某些项目具有 HTML 标记。我试图删除那些标签,但我没有成功。活动中的代码是:

private NewsFeedItemList parseNewsContent() {
        NewsParserHandler newsParserHandler = null;

        Log.i("NewsList", "Starting to parse XML...");

        try {
            SAXParserFactory factory = SAXParserFactory.newInstance();
            SAXParser parser = factory.newSAXParser();
            XMLReader xr = parser.getXMLReader();
            newsParserHandler = new NewsParserHandler();
            xr.setContentHandler(newsParserHandler);

            ByteArrayInputStream is = new ByteArrayInputStream(strServerResponseMsg.getBytes());
            xr.parse(new InputSource(is));

        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }

        NewsFeedItemList itemList = newsParserHandler.getNewsList();
//      checkLog(itemList);

        Log.i("NewsList", "Parsing XML finished. Sending result back to caller...");
        return itemList;
    }

"strServerResponseMsg" 包含 XML 信息 (http://www.mania.com.my/rss/ManiaTopStoriesFeedFull.aspx?catid=146)

我会解析所有项目,但那些有 html 标签的项目不会完全解析。

这是我的解析器处理程序:

public class NewsParserHandler extends DefaultHandler {

    private NewsFeedItemList newsFeedItemList;  
    private boolean current = false;  
    private String currentValue = null;

   /* Because the feed has another "Title", "link" and "pubdate" name in root, 
    * we need to don't let to be stored in arrays. Therefore, we ignore all of 
    * them by incrementing count.*/
    private int count = 0; 


    @Override
    public void characters(char[] ch, int start, int length) throws SAXException {
        super.characters(ch, start, length);

        if(current)  {  
            currentValue = new String(ch, start, length); 

            if(currentValue==null || currentValue=="" || currentValue==" ")
                currentValue = "-";

            current = false;  
        }
    }

    @Override
    public void startDocument() throws SAXException {
        super.startDocument();

        newsFeedItemList = new NewsFeedItemList();
    }

    @Override
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        super.startElement(uri, localName, qName, attributes);

        current = true;
    }

    @Override
    public void endElement(String uri, String localName, String qName) throws SAXException {
        super.endElement(uri, localName, qName);

        current = false;

        if(localName.equals("title"))  {  
            if(count >= 1)
                newsFeedItemList.setTitle(currentValue);  
        }
        if(localName.equals("description"))  {  
            newsFeedItemList.setDescription(currentValue);  
        } 
        if(localName.equals("fullbody"))  {  
            newsFeedItemList.setFullbody(currentValue);  
        } 
        if(localName.equals("link"))  {  
            if(count >= 4)
                newsFeedItemList.setLink(currentValue);  
        } 
        if(localName.equals("pubDate"))  {  
            if(count >= 5)
                newsFeedItemList.setPubDate(currentValue);  
        } 
        if(localName.equals("image"))  {  
            newsFeedItemList.setImage(currentValue);  
        } 

        count++;
    }

    @Override
    public void endDocument() throws SAXException {
        super.endDocument();
    }   


    public NewsFeedItemList getNewsList() {
        return newsFeedItemList;
    }

}

我尝试将 currentValue = Html.fromHtml(currentValue).toString(); 放入 characters() 方法中,但没有任何效果。同样在发送“strServerResponseMsg”之前,我尝试将其更改为 HTML,但解析器没有解析任何内容。

我找到了这些主题,但它们的解决方案对我不起作用: How to strip or escape html tags in Android Display HTML Formatted String

如果您能帮助我,我将不胜感激。谢谢。

【问题讨论】:

    标签: android xml parsing saxparser


    【解决方案1】:

    使用以下方法从 currentValue 变量中删除所有 HTML 标记。

    public static String removeHtmlTag(String htmlString) {
            return htmlString.replaceAll("\\<.*?\\>", "").trim();
    }
    

    【讨论】:

    • 感谢 Lalit,但不幸的是它不起作用。我不知道为什么会这样:(
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2014-07-13
    • 2016-03-25
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多