Java – how to use regular expressions to check whether HTML documents contain non empty script tags

I try to check whether an HTML document contains script tags that are not empty using regular expressions Regular expressions should match any script tag with anything other than spaces or newlines

I tried

<script\b[^>]*>[^.+$]</script>

However, this regular expression can only find a script label with a space

Solution

Don’t parse HTML with regexen! Seriously, in general, this is almost impossible Why do you use regular expressions here? It makes more sense to use HTML parser, although I can't give you any special advice because I don't know what language you're using For example, if you are using JavaScript DOM, you need the following:

var scripts     = document.getElementsByTagName('script')
var numScripts  = scripts.length
var textScripts = []
for (var i = 0; i < numScripts; ++i)
  if (scripts[i].text !== '') textScripts.push(scripts[i])

This will look at the structure of the HTML to determine the attributes of the script tag, not in messy text

Editor 1: obviously, you are using Java Unfortunately, I know nothing about html parsing in Java, so I can't give you any advice; However, look, because it's the way to go

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>