linux - How To Extract Text Between HTML Tags With Or Condition Multiple Times -


i have been researching how extract title tags html. i've pretty figured out regex , html don't mix , grep can used. however, code found here, looks this:

awk -vrs="</title>" '/<title>/{gsub(/.*<title>|\n+/,"");print;exit}' 

now, works find text between title tags once. know how can make run on every line. cat file; while read line; ...; done. however, know not efficient there's better way.

secondly, in file need keep lines start string '--'. believe requires adding 'or' statement in awk match title tags , line starting '--'

the input file this:

text text text <title>random text of title 1</title> random html stuff --time-- xyz more random text <title>random text of title 2</title> hmtl text --time-- text <title>random text of title 3</title> more text tags --time-- text here <title>random text of title 4</title> random text html --time-- 

the desired output:

<title>random text of title 1</title> --time-- <title>random text of title 2</title> --time-- <title>random text of title 3</title> --time-- <title>random text of title 4</title> --time-- 

i'm not great awk, i'm learning. know there should option print all, it's or statement i'm stuck on. open sed or grep if think that's more efficient. or direction appreciated.

for given input, grep enough

$ grep -o '<.*>\|^--.*' ip.html  <title>random text of title 1</title> --time-- <title>random text of title 2</title> --time-- <title>random text of title 3</title> --time-- <title>random text of title 4</title> --time-- 
  • -o extract matching parts
  • <.*> extract < upto last > in line
  • \|^--.* alternate pattern, if line starts -- line

to restrict title tags,

grep -o '<title.*title>\|^--.*' ip.html 

Comments

Popular posts from this blog

unity3d - Rotate an object to face an opposite direction -

angular - Is it possible to get native element for formControl? -

javascript - Why jQuery Select box change event is now working? -