Java – how to use regular expressions to check whether HTML documents contain non empty script tags

2020-08-12 • Java

I try to check whether an HTML document contains script tags that are not empty using regular expressions Regular expressions should match any script tag with anything other than spaces or newlines

I tried

<script\b[^>]*>[^.+$]</script>

However, this regular expression can only find a script label with a space

Solution

Don’t parse HTML with regexen! Seriously, in general, this is almost impossible Why do you use regular expressions here? It makes more sense to use HTML parser, although I can't give you any special advice because I don't know what language you're using For example, if you are using JavaScript DOM, you need the following:

var scripts     = document.getElementsByTagName('script')
var numScripts  = scripts.length
var textScripts = []
for (var i = 0; i < numScripts; ++i)
  if (scripts[i].text !== '') textScripts.push(scripts[i])

This will look at the structure of the HTML to determine the attributes of the script tag, not in messy text

Editor 1: obviously, you are using Java Unfortunately, I know nothing about html parsing in Java, so I can't give you any advice; However, look, because it's the way to go

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.

THE END

Java

二维码

What is the best way to use hibernate’s detachedcriteria to limit results in Java?

< <上一篇

GUI Java program requiring action event button

下一篇>>

搜索内容

Java – how to use regular expressions to check whether HTML documents contain non empty script tags

Solution

热门文章