This is an article from Alibaba’s internal technology forum
This is an article from Alibaba's internal technology forum. The original text has been unanimously praised by Alibaba. The author has opened this article to yunqi community for Internet access. Hollis has partially deleted the content of the article, mainly deleting the introduction of tools that can only be used inside Alibaba, and deleting some links that can only be accessed through Alibaba's intranet.
preface
We often encounter many difficult problems in our daily work. While solving the problems, some tools have played a considerable role. Write them down here. First, as notes, you can forget later and read them quickly. Second, share them. I hope the students who see this article can take out the tools they feel are very helpful to make progress together.
Don't talk much, just do it.
The most commonly used linux command class tail - F
tail -300f shopbase.log
grep
grep forest f.txt
grep forest f.txt cpf.txt
grep 'log' /home/admin -r -n
cat f.txt | grep -i shopbase
grep 'shopbase' /home/admin -r -n --include *.{vm,java}
grep 'shopbase' /home/admin -r -n --exclude *.{vm,java}
seq 10 | grep 5 -A 3
seq 10 | grep 5 -B 3
seq 10 | grep 5 -C 3
cat f.txt | grep -c 'SHOPBASE'
Awk 1 basic command
awk '{print $4,$6}' f.txt
awk '{print NR,$0}' f.txt cpf.txt
awk '{print FNR,$0}' f.txt cpf.txt
awk '{print FNR,FILENAME,$0}' f.txt cpf.txt
awk '{print FILENAME,"NR="NR,"FNR="FNR,"$"NF"="$NF}' f.txt cpf.txt
echo 1:2:3:4 | awk -F: '{print $1,$2,$3,$4}'
2 matching
awk '/ldb/ {print}' f.txt
awk '!/ldb/ {print}' f.txt
awk '/ldb/ && /LISTEN/ {print}' f.txt
awk '$5 ~ /ldb/ {print}' f.txt
3 built in variables
NR: NR refers to the number of data read according to the record separator after the execution of awk. The default record separator is line feed, so the default is the number of data rows read. NR can be understood as the abbreviation of number of record.
FNR: when awk processing multiple input files, after the first file is processed, the NR does not start from 1, but continues to accumulate. Therefore, FNR appears. Whenever a new file is processed, the FNR counts from 1. FNR can be understood as file number of record.
NF: NF indicates the number of fields divided by the current record. NF can be understood as number of field.
find
sudo -u admin find /home/admin /tmp /usr -name \*.log(多个目录去找)
find . -iname \*.txt(大小写都匹配)
find . -type d(当前目录下的所有子目录)
find /usr -type l(当前目录下所有的符号链接)
find /usr -type l -name "z*" -ls(符号链接的详细信息 eg:inode,目录)
find /home/admin -size +250000k(超过250000k的文件,当然+改成-就是小于了)
find /home/admin f -perm 777 -exec ls -l {} \; (按照权限查询文件)
find /home/admin -atime -1 1天内访问过的文件
find /home/admin -ctime -1 1天内状态改变过的文件
find /home/admin -mtime -1 1天内修改过的文件
find /home/admin -amin -1 1分钟内访问过的文件
find /home/admin -cmin -1 1分钟内状态改变过的文件
find /home/admin -mmin -1 1分钟内修改过的文件
PGM batch queries the qualified logs of VM shopbase
pgm -A -f vm-shopbase 'cat /home/admin/shopbase/logs/shopbase.log.2017-01-17|grep 2069861630'
Tsar Tsar is our company's own collection tool. It's easy to use. The data collected in the history is persisted on the disk, so we can quickly query the historical system data. Of course, real-time applications can also be queried. It is installed on most machines.
tsar
tsar --live
tsar -d 20161218
tsar --mem
tsar --load
tsar --cpu
ps -ef | grep java
top -H -p pid
After the thread is converted from hexadecimal to hexadecimal, jstack grabs it to see what the thread is doing
other
netstat -nat|awk '{print $6}'|sort|uniq -c|sort -rn
Check sharp weapon
Btrace is the first thing to say. It's really a problem killer in the production environment. I won't say anything about the introduction. Direct code dry
1. Check who has called the add method of ArrayList, and print only the thread call stack with the size greater than 500 of the current ArrayList
@OnMethod(clazz = "java.util.ArrayList",method="add",location = @Location(value = Kind.CALL,clazz = "/.*/",method = "/.*/"))
public static void m(@ProbeClassName String probeClass,@ProbeMethodName String probeMethod,@TargetInstance Object instance,@TargetmethodOrField String method) {
if(getInt(field("java.util.ArrayList","size"),instance) > 479){
println("check who ArrayList.add method:" + probeClass + "#" + probeMethod + ",method:" + method + ",size:" + getInt(field("java.util.ArrayList",instance));
jstack();
println();
println("===========================");
println();
}
}
2. Monitor the value returned when the current service method is called and the requested parameters
@OnMethod(clazz = "com.taobao.sellerhome.transfer.biz.impl.C2CApplyerServiceImpl",method="nav",location = @Location(value = Kind.RETURN))
public static void mt(long userId,int current,int relation,String check,String redirectUrl,@Return AnyType result) {
println("parameter# userId:" + userId + ",current:" + current + ",relation:" + relation + ",check:" + check + ",redirectUrl:" + redirectUrl + ",result:" + result);
}
For more information, please move to: https://github.com/btraceio/btrace
be careful:
Greys said several great functions (some of which coincide with btrace):
SC - DF XXX: output the details of the current class, including the source location and classloader structure
Trace class method: I really like this function! Jpprofiler can see this function a long time ago. Print out the time consumption of the current method call and subdivide it into each method.
Javosize says one function classes: by modifying the bytecode and changing the content of the class, it takes effect immediately. So you can quickly log somewhere to see the output. The disadvantage is that it is too intrusive to the code. But if you know what you're doing, it's a good thing.
Other functions greys and btrace can easily do, no more.
Jpprofiler used to judge many problems through jpprofiler, but now greys and btrace can basically solve them. In addition, the problem is basically the production environment (network isolation), so it is not used much, but it still needs to be marked. Please move to the official website https://www.ej-technologies.com/products/jprofiler/overview.html
Big killer
Eclipse mat can be opened as a plug-in to eclipse or as a separate program. Please move for details http://www.eclipse.org/mat/
Java three board axe, oh, no, it's seven
JPS I only use one command:
sudo -u admin /opt/taobao/java/bin/jps -mlvV
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack 2815
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstack -m 2815
Jinfo can see the system startup parameters as follows
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jinfo -flags 2815
1. Check the heap
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -heap 2815
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:live,format=b,file=/tmp/heap2.bin 2815
perhaps
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -dump:format=b,file=/tmp/heap3.bin 2815
3. Look who occupied the pile? With zprofiler and btrace, troubleshooting is like a tiger's wings
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jmap -histo 2815 | head -10
sudo -u admin /opt/taobao/install/ajdk-8_1_1_fp1-b52/bin/jstat -gcutil 2815 1000
JDB today, JDB is often used. JDB can be used to pre send debug, assuming that you pre send Java_ Home is / opt / Taobao / Java /, and the remote debugging port is 8000 Then sudo - U admin / opt / Taobao / Java / bin / JDB - attach 8000
Chlsdb chlsdb feels that more interesting things can be seen in many cases. I won't describe them in detail. It is said that tools such as jstack and jmap are based on it.
sudo -u admin /opt/taobao/java/bin/java -classpath /opt/taobao/java/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB
For more details, see this post http://rednaxelafx.iteye.com/blog/1847971
VM options
1. Which file is your class loaded from?
-XX:+TraceClassLoading
结果形如[Loaded java.lang.invoke.MethodHandleImpl$Lazy from D:\programme\jdk\jdk8U74\jre\lib\rt.jar]
2. The application hung the output dump file
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/admin/logs/java.hprof
Jar package conflict
Isn't it too much to write this in a separate headline? Everyone has dealt with this annoying case more or less. I have so many plans below. Can't you believe it?
mvn dependency:tree > ~/dependency.txt
Hit all dependencies
mvn dependency:tree -Dverbose -Dincludes=groupId:artifactId
Only the dependencies of the specified groupid and artifactid are displayed
-XX:+TraceClassLoading
VM startup script is added. The details of the loaded class can be seen in the Tomcat startup script
-verbose
VM startup script is added. The details of the loaded class can be seen in the Tomcat startup script
greys:sc
The SC command of greys can also clearly see where the current class is loaded from
tomcat-classloader-locate
You can find out from the following URL where the current class is loaded curl http://localhost:8006/classloader/locate?class=org.apache.xerces.xs.XSObjec
other
dmesg
If you find that your Java process has quietly disappeared without leaving any clues, dmesg is likely to have what you want.
sudo dmesg|grep -i kill|less
Find the keyword OOM_ killer。 The results found are similar to the following:
[6710782.021013] java invoked oom-killer: gfp_mask=0xd0,order=0,oom_adj=0,oom_scoe_adj=0
[6710782.070639] [] ? oom_kill_process+0x68/0x140
[6710782.257588] Task in /LXC011175068174 killed as a result of limit of /LXC011175068174
[6710784.698347] Memory cgroup out of memory: Kill process 215701 (java) score 854 or sacrifice child
[6710784.707978] Killed process 215701,UID 679,(java) total-vm:11017300kB,anon-RSS:7152432kB,file-RSS:1232kB
The above shows that the corresponding java process was killed by the system's oom killer, with a score of 854 Explain the oom killer (out of memory killer). This mechanism will monitor the memory resource consumption of the machine. Before the machine runs out of memory, this mechanism will scan all processes (calculate the memory occupation, time, etc. according to certain rules), select the process with the highest score, and then kill it to protect the machine.
Dmesg log time conversion formula: log actual time = Greenwich 1970-01-01 + (current time seconds - seconds since system startup + log time printed by dmesg) seconds:
date -d "1970-01-01 UTC `echo "$(date +%s)-$(cat /proc/uptime|cut -f 1 -d' ')+12288812.926194"|bc ` seconds"
The rest is to see why the memory is so large that the oom killer is triggered.
New skill get
Does ratelimiter want fine control of QPS? For example, in such a scenario, you call an interface, and the other party clearly needs you to limit your QPS within 400. How do you control it? At this time, ratelimiter has a place to play. Details can be moved http://ifeve.com/guava-ratelimite