【问题标题】:How to converting a json with html tags to a csv using C#?如何使用 C# 将带有 html 标签的 json 转换为 csv?
【发布时间】:2017-08-24 09:43:14
【问题描述】:

我正在编写一个自定义翻译工具,它通过新的 azure 文本翻译 API 将 csv 文件内容从一种语言翻译成另一种语言。现在的问题是其中一列是html格式的

即描述(html)

<p><span style='font-family:arial,helvetica, sans-serif; font-size:small;'>Paso hacia fuera en estilo y comodidad con estas bufandas de raso gasa estilo de alta calidad, hecho de material de alta calidad a precios asequibles. Tienda con idence.</span></p><li><span style="font-family:arial,helvetica,sans-serif;" font-size:small;=""><strong>Color:</strong> Crema y negro</span></li><span style="font-family:" arial,="" helvetica,="" sans-serif;="" font-size:="" small;=""> <strong>Color:</strong> crema &amp; negro</span>"....

到目前为止,我将 csv 转换为 json 格式,然后翻译成另一种语言,即(西班牙语或德语),然后我得到 Json 格式的响应,但是当我将 json 转换回 csv 时,我得到了错误。

我查看了错误,发现该列有双引号,这使我的 json 文件无效。如何转义/替换 html 标签和双引号?

我的英文 json(可以很好地转换为 csv):

[{"SKU":"d2d Floppy Hats 91504 (One Size)",
"EAN":"123456789",
"Meta Description (website)":"Shop at dso. Product Features:
Colour: Slate Grey
Pattern: Spiral Plain Woven
Features: 
Material: 100% Paper",
"New From":"23/6/2017",
"New To":"23/7/2017",
"Description ( html)":"<p><span style='font-family:arial,helvetica, sans-serif; font-size:small;'>Step out in style &amp; comfort with these high quality Satin Chiffon Style scarves, made from top quality material at affordable prices. Shop with idence.</span></p><li><span style=font-family:arial,helvetica,sans-serif; font-size:small;><strong>Colour:</strong> Cream &amp; Black</span></li><span style=font-family: arial, helvetica, sans-serif; font-size: small;><strong>Colour:</strong> Cream &amp; Black</span></li>"
}]

翻译后的 json(西班牙语):

[{"SKU": "d2d disquete sombreros 91504 (talla única)",
"EAN": "123456789",
"Meta Description (sitio web)": "tienda en dso. Características del producto: color: patrón gris de la pizarra: espiral llano tejidos características: Material: 100% de papel ",
"Nuevo de": "23/06/2017",
"Nuevo a": "23/07/2017",
"Descripción (html)": "<p><span style='font-family:arial,helvetica, sans-serif; font-size:small;'>Paso hacia fuera en estilo y comodidad con estas bufandas de raso gasa estilo de alta calidad, hecho de material de alta calidad a precios asequibles. Tienda con idence.</span></p><li><span style="font-family:arial,helvetica,sans-serif;" font-size:small;=""><strong>Color:</strong> Crema y negro</span></li><span style="font-family:" arial,="" helvetica,="" sans-serif;="" font-size:="" small;=""> <strong>Color:</strong> crema &amp; negro</span>"
}]

有没有更简单的方法?

用于翻译和获取 xmldocument 的代码。 json 字符串与上面提供的相同,但我不明白为什么翻译后会包含额外的双引号和等号

string txtToTranslate = txtjson.Text.ToString();
        if (txtToTranslate.ToString() == "")
        {
            MessageBox.Show("No Json file to convert to csv");
        }
        else
        {
            string uri = string.Format("http://api.microsofttranslator.com/v2/Http.svc/Translate?text=" + System.Web.HttpUtility.UrlEncode(txtToTranslate) + "&to={0}", languageCode);
            WebRequest translationWebRequest = WebRequest.Create(uri);
            translationWebRequest.Headers.Add("Authorization", tokenProvider.GetAccessToken()); //header value is the "Bearer plus the token from ADM
            WebResponse response = null;
            response = translationWebRequest.GetResponse();
            Stream stream = response.GetResponseStream();
            Encoding encode = Encoding.GetEncoding("utf-8");



            StreamReader translatedStream = new StreamReader(stream, encode);
            System.Xml.XmlDocument xTranslation = new System.Xml.XmlDocument();
            xTranslation.LoadXml(translatedStream.ToString());


            string jsontranslated = xTranslation.OuterXml;                  

            txttranslated.Text = jsontranslated;

【问题讨论】:

标签: c# csv datatable json.net


【解决方案1】:

首先删除所有带有正则表达式的html标签(see answer from Daniel Brückner):

var  plainText= Regex.Replace(htmlDocument, @"<[^>]*>", String.Empty);

为翻译创建一个类

 public class Translation{
   public string PlainText{ get;set;} 
}
var translation = new Translation { PlainText = plainText};

您可以使用 Newtonsoft 创建 Json:

var jsonSerializerSettings = new JsonSerializerSettings
{
    ReferenceLoopHandling = ReferenceLoopHandling.Error,
    ContractResolver = new CamelCasePropertyNamesContractResolver()
};
var serializedTranslations = JsonConvert.SerializeObject(translation, jsonSerializerSettings);

现在您的 JSON 应该被正确转义了。

【讨论】:

  • 感谢您的回复。我刚刚通过在我怀疑我有一些错误的地方包含我的代码 sn-p 来编辑问题,你能从那里帮助我吗@codeHacker
猜你喜欢
  • 2019-07-21
  • 2014-12-24
  • 2019-08-21
  • 1970-01-01
  • 2016-06-16
  • 2020-12-29
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多