This is an article from Alibaba’s internal technology forum

This is an article from Alibaba's internal technology forum. The original text has been unanimously praised by Alibaba. The author has opened this article to yunqi community for Internet access. Hollis has partially deleted the content of the article, mainly deleting the introduction of tools that can only be used inside Alibaba, and deleting some links that can only be accessed through Alibaba's intranet.

preface

We often encounter many difficult problems in our daily work. While solving the problems, some tools have played a considerable role. Write them down here. First, as notes, you can forget later and read them quickly. Second, share them. I hope the students who see this article can take out the tools they feel are very helpful to make progress together.

Don't talk much, just do it.

The most commonly used linux command class tail - F

tail -300f shopbase.log 

grep

grep forest f.txt     
grep forest f.txt cpf.txt 
grep 'log' /home/admin -r -n 
cat f.txt | grep -i shopbase    
grep 'shopbase' /home/admin -r -n --include *.{vm,java} 
grep 'shopbase' /home/admin -r -n --exclude *.{vm,java} 
seq 10 | grep 5 -A 3    
seq 10 | grep 5 -B 3    
seq 10 | grep 5 -C 3    
cat f.txt | grep -c 'SHOPBASE'

Awk 1 basic command

awk '{print $4,$6}' f.txt
awk '{print NR,$0}' f.txt cpf.txt    
awk '{print FNR,$0}' f.txt cpf.txt
awk '{print FNR,FILENAME,$0}' f.txt cpf.txt
awk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txt
echo 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'

2 matching

awk '/ldb/ {print}' f.txt   
awk '!/ldb/ {print}' f.txt  
awk '/ldb/ && /LISTEN/ {print}' f.txt   
awk '$5 ~ /ldb/ {print}' f.txt 

3 built in variables

NR: NR refers to the number of data read according to the record separator after the execution of awk. The default record separator is line feed, so the default is the number of data rows read. NR can be understood as the abbreviation of number of record.

FNR: when awk processing multiple input files, after the first file is processed, the NR does not start from 1, but continues to accumulate. Therefore, FNR appears. Whenever a new file is processed, the FNR counts from 1. FNR can be understood as file number of record.

NF: NF indicates the number of fields divided by the current record. NF can be understood as number of field.

find

sudo -u admin find /home/admin /tmp /usr -name \*.log(多个目录去找)
find . -iname \*.txt(大小写都匹配)
find . -type d(当前目录下的所有子目录)
find /usr -type l(当前目录下所有的符号链接)
find /usr -type l -name "z*" -ls(符号链接的详细信息 eg:inode,目录)
find /home/admin -size +250000k(超过250000k的文件,当然+改成-就是小于了)
find /home/admin f -perm 777 -exec ls -l {} \; (按照权限查询文件)
find /home/admin -atime -1  1天内访问过的文件
find /home/admin -ctime -1  1天内状态改变过的文件    
find /home/admin -mtime -1  1天内修改过的文件
find /home/admin -amin -1  1分钟内访问过的文件
find /home/admin -cmin -1  1分钟内状态改变过的文件    
find /home/admin -mmin -1  1分钟内修改过的文件

PGM batch queries the qualified logs of VM shopbase

pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'

Tsar Tsar is our company's own collection tool. It's easy to use. The data collected in the history is persisted on the disk, so we can quickly query the historical system data. Of course, real-time applications can also be queried. It is installed on most machines.

tsar  

tsar --live 

tsar -d 20161218 

tsar --mem
tsar --load
tsar --cpu


ps -ef | grep java
top -H -p pid

After the thread is converted from hexadecimal to hexadecimal, jstack grabs it to see what the thread is doing

other

netstat -nat|awk  '{print $6}'|sort|uniq -c|sort -rn 


Check sharp weapon

Btrace is the first thing to say. It's really a problem killer in the production environment. I won't say anything about the introduction. Direct code dry

1. Check who has called the add method of ArrayList, and print only the thread call stack with the size greater than 500 of the current ArrayList

@OnMethod(clazz = "java.util.ArrayList",method="add",location = @Location(value = Kind.CALL,clazz = "/.*/",method = "/.*/"))
public static void m(@ProbeClassName String probeClass,@ProbeMethodName String probeMethod,@TargetInstance Object instance,@TargetmethodOrField String method) {
   if(getInt(field("java.util.ArrayList","size"),instance) > 479){
       println("check who ArrayList.add method:" + probeClass + "#" + probeMethod  + ",method:" + method + ",size:" + getInt(field("java.util.ArrayList",instance));
       jstack();
       println();
       println("===========================");
       println();
   }
}

2. Monitor the value returned when the current service method is called and the requested parameters

@OnMethod(clazz = "com.taobao.sellerhome.transfer.biz.impl.C2CApplyerServiceImpl",method="nav",location = @Location(value = Kind.RETURN))
public static void mt(long userId,int current,int relation,String check,String redirectUrl,@Return AnyType result) {
   println("parameter# userId:" + userId + ",current:" + current + ",relation:" + relation + ",check:" + check + ",redirectUrl:" + redirectUrl + ",result:" + result);
}

For more information, please move to: https://github.com/btraceio/btrace

be careful:

Greys said several great functions (some of which coincide with btrace):

SC - DF XXX: output the details of the current class, including the source location and classloader structure

Trace class method: I really like this function! Jpprofiler can see this function a long time ago. Print out the time consumption of the current method call and subdivide it into each method.

Javosize says one function classes: by modifying the bytecode and changing the content of the class, it takes effect immediately. So you can quickly log somewhere to see the output. The disadvantage is that it is too intrusive to the code. But if you know what you're doing, it's a good thing.

Other functions greys and btrace can easily do, no more.

Jpprofiler used to judge many problems through jpprofiler, but now greys and btrace can basically solve them. In addition, the problem is basically the production environment (network isolation), so it is not used much, but it still needs to be marked. Please move to the official website https://www.ej-technologies.com/products/jprofiler/overview.html

Big killer

Eclipse mat can be opened as a plug-in to eclipse or as a separate program. Please move for details http://www.eclipse.org/mat/

Java three board axe, oh, no, it's seven

JPS I only use one command:

sudo -u admin /opt/taobao/java/bin/jps -mlvV

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815

Jinfo can see the system startup parameters as follows

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815

1. Check the heap

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815

perhaps

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815

3. Look who occupied the pile? With zprofiler and btrace, troubleshooting is like a tiger's wings

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10

sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000 

JDB today, JDB is often used. JDB can be used to pre send debug, assuming that you pre send Java_ Home is / opt / Taobao / Java /, and the remote debugging port is 8000 Then sudo - U admin / opt / Taobao / Java / bin / JDB - attach 8000

Chlsdb chlsdb feels that more interesting things can be seen in many cases. I won't describe them in detail. It is said that tools such as jstack and jmap are based on it.

sudo -u admin /opt/taobao/java/bin/java -classpath /opt/taobao/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB

For more details, see this post http://rednaxelafx.iteye.com/blog/1847971

VM options

1. Which file is your class loaded from?

-XX:+TraceClassLoading
结果形如[Loaded java.lang.invoke.MethodHandleImpl$Lazy from D:\programme\jdk\jdk8U74\jre\lib\rt.jar]

2. The application hung the output dump file

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/java.hprof

Jar package conflict

Isn't it too much to write this in a separate headline? Everyone has dealt with this annoying case more or less. I have so many plans below. Can't you believe it?

mvn dependency:tree > ~/dependency.txt

Hit all dependencies

mvn dependency:tree -Dverbose -Dincludes=groupId:artifactId

Only the dependencies of the specified groupid and artifactid are displayed

-XX:+TraceClassLoading

VM startup script is added. The details of the loaded class can be seen in the Tomcat startup script

-verbose

VM startup script is added. The details of the loaded class can be seen in the Tomcat startup script

greys:sc

The SC command of greys can also clearly see where the current class is loaded from

tomcat-classloader-locate

You can find out from the following URL where the current class is loaded curl http://localhost:8006/classloader/locate?class=org.apache.xerces.xs.XSObjec

other

dmesg

If you find that your Java process has quietly disappeared without leaving any clues, dmesg is likely to have what you want.

sudo dmesg|grep -i kill|less

Find the keyword OOM_ killer。 The results found are similar to the following:

[6710782.021013] java invoked oom-killer: gfp_mask=0xd0,order=0,oom_adj=0,oom_scoe_adj=0
[6710782.070639] [] ? oom_kill_process+0x68/0x140 
[6710782.257588] Task in /LXC011175068174 killed as a result of limit of /LXC011175068174 
[6710784.698347] Memory cgroup out of memory: Kill process 215701 (java) score 854 or sacrifice child 
[6710784.707978] Killed process 215701,UID 679,(java) total-vm:11017300kB,anon-RSS:7152432kB,file-RSS:1232kB

The above shows that the corresponding java process was killed by the system's oom killer, with a score of 854 Explain the oom killer (out of memory killer). This mechanism will monitor the memory resource consumption of the machine. Before the machine runs out of memory, this mechanism will scan all processes (calculate the memory occupation, time, etc. according to certain rules), select the process with the highest score, and then kill it to protect the machine.

Dmesg log time conversion formula: log actual time = Greenwich 1970-01-01 + (current time seconds - seconds since system startup + log time printed by dmesg) seconds:

date -d "1970-01-01 UTC `echo "$(date +%s)-$(cat /proc/uptime|cut -f 1 -d' ')+12288812.926194"|bc ` seconds"

The rest is to see why the memory is so large that the oom killer is triggered.

New skill get

Does ratelimiter want fine control of QPS? For example, in such a scenario, you call an interface, and the other party clearly needs you to limit your QPS within 400. How do you control it? At this time, ratelimiter has a place to play. Details can be moved http://ifeve.com/guava-ratelimite

The content of this article comes from the network collection of netizens. It is used as a learning reference. The copyright belongs to the original author.
THE END
分享
二维码
< <上一篇
下一篇>>